Classification of Sinhala news using machine learning approaches

dc.contributor.authorNawarathna, S.P.D.H.
dc.contributor.authorMahesan, S.
dc.date.accessioned2025-11-26T09:42:27Z
dc.date.available2025-11-26T09:42:27Z
dc.date.issued2023-11-03
dc.description.abstractWith the advent of internet technology, the popularity of Sinhala text-based news portals has witnessed a significant escalation. To aid users in efficiently locating news articles relevant to their interests, this study introduces a systematic approach for classifying Sinhala news headlines, leveraging machine learning methodologies. The system curates a dataset of 25,400 news articles from Sinhala news websites, meticulously labelled for training and evaluation purposes. It explores various text embedding techniques, including term frequency-inverse document frequency, Word2Vec, and FastText, while employing classification algorithms such as support vector machines, Naive Bayes, Logistic Regression with multi-class classification, and long short-term memory (LSTM) networks. The experimental outcomes underscore that the most effective combination for classifying Sinhala news headlines is the integration of FastText and LSTM, achieving an impressive accuracy rate of 93.8% for news headlines alone and 95.8% when applied to a mixed dataset encompassing both news headlines and news content. Furthermore, the LSTM classifier demonstrates its ability to capture long-term dependencies within the text, a crucial factor in ensuring the precise classification of Sinhala news headlines. This research highlights that the LSTM + FastText combination yields superior accuracy in classifying Sinhala news, thus making it a noteworthy approach for this purpose.
dc.identifier.citationProceedings of the Postgraduate Institute of Science Research Congress (RESCON) -2023, University of Peradeniya, P 45
dc.identifier.isbn978-955-8787-09-0
dc.identifier.urihttps://ir.lib.pdn.ac.lk/handle/20.500.14444/7016
dc.language.isoen_US
dc.publisherPostgraduate Institute of Science (PGIS), University of Peradeniya, Sri Lanka
dc.subjectFastText
dc.subjectLogistic regression
dc.subjectText embedding
dc.titleClassification of Sinhala news using machine learning approaches
dc.title.alternativeICT, mathematics, and statistics
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Nawarathna.pdf
Size:
93.51 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description:

Collections