Combining fast text embeddings with neural networks for short text classification
| dc.contributor.author | Jayakody, J.R.K.C. | |
| dc.contributor.author | Vidanagama, V.G.T.N. | |
| dc.date.accessioned | 2025-11-18T03:04:20Z | |
| dc.date.available | 2025-11-18T03:04:20Z | |
| dc.date.issued | 2023-11-03 | |
| dc.description.abstract | Using embedding representation is a critical step to improve the classification accuracy of a text dataset. Even though Bag of Word (BOW) models are used with past research work, usage of word2vec, Glove and FastText as embedding techniques helps to represent the features of text documents in a distributed manner, hence improving the accuracy of such models. The latest research work used a combination of embedding techniques and enhanced neural network models to improve the classification accuracy of text documents. FastText as an embedding unsupervised model and CNN, LSTM, and RNN as neural models were used extensively in the latest research work. However, comprehensive analysis with FastText and neural models with text documents has not been undertaken thus far. As a result, it is hard to compare the existing research work, and it is unclear which combination of neural model with FastText performs well over the other techniques. Therefore, it is necessary to investigate the impact of neural networks when the features were represented with the FastText embedding model. A famous movie review dataset was used for the experiment. CNN, LSTM, RNN, NN, and variations of those neural networks were used as neural networks. Hold out stratified Training and testing set was taken with 70 % to 30% split. Seventy per cent of training data was split as 80% of training and 20% of validation set. We compare classification accuracy across a range of neural network models, and our results show that the RNN model outperforms other neural network models with FastText embeddings with 86% accuracy. Moreover, out of various neural networks, the combination CNN-LSTM outperforms all other neural network models with 88% accuracy. The outcomes of this study can be a baseline for future research. | |
| dc.identifier.citation | Proceedings of the Postgraduate Institute of Science Research Congress (RESCON) -2023, University of Peradeniya, P41 | |
| dc.identifier.isbn | 978-955-8787-09-0 | |
| dc.identifier.uri | https://ir.lib.pdn.ac.lk/handle/20.500.14444/6742 | |
| dc.language.iso | en_US | |
| dc.publisher | Postgraduate Institute of Science (PGIS), University of Peradeniya, Sri Lanka | |
| dc.subject | Classification | |
| dc.subject | CNN | |
| dc.subject | FastText | |
| dc.subject | LSTM | |
| dc.subject | RNN | |
| dc.title | Combining fast text embeddings with neural networks for short text classification | |
| dc.title.alternative | ICT, mathematics, and statistics | |
| dc.type | Article |