Clustering English news articles based on relevant domains: comparative study using three clustering algorithms

dc.contributor.authorDisayiram, N.
dc.contributor.authorRupasingha, R.A.H.M.
dc.date.accessioned2026-03-12T05:37:24Z
dc.date.available2026-03-12T05:37:24Z
dc.date.issued2022-10-28
dc.description.abstractThe news tells us about what happens around us. Nowadays, people use news sites to read exciting news. News has many categories. The preferable choice of the news category differs for each newsreader. In the end, every news category is important. Every day lots of news is published on news websites. Typically, news sites categorize the news, but all the categories are not included on that site. Most news sites prioritise some categories, and other categories get lower media coverage. It is, therefore, difficult to find the relevant types of news. These problems give complexity to the newsreaders and relevant content seekers to find the relevant section on the news sites. The clustering of English news based on the relative category gives solutions to overcome those problems. This study aims to cluster news articles based on the relevant domain using machine-learning algorithms. We consider five domains: politics, sports, health, technology, and business. The online collected data was converted into vector format by using the term frequency-inverse document frequency vectorization. Then, the three clustering algorithms: Expectation Maximization, Simple Kmeans, and Hierarchical Clustering based on agglomerative technique, were separately applied to the body of the news and the news headline. The accuracy is calculated through the classes to clusters evaluation model in the WEKA tool. The results show that the Expectation Maximization algorithm achieved the highest accuracy of 87.9%, while it was 83.8% for the Simple Kmeans algorithm. Further, the Hierarchical Clustering method achieved the minimum accuracy results. The comparison results between the heading of news and the body of news show that the body of news performed better than the heading of news to cluster the news articles.
dc.identifier.citationProceedings of the Postgraduate Institute of Science Research Congress (RESCON) -2022, University of Peradeniya, P 102
dc.identifier.isbn978-955-8787-09-0
dc.identifier.urihttps://ir.lib.pdn.ac.lk/handle/20.500.14444/7632
dc.language.isoen_US
dc.publisherPostgraduate Institute of Science (PGIS), University of Peradeniya, Sri Lanka
dc.subjectClustering
dc.subjectDomain
dc.subjectMachine learning
dc.subjectNews article
dc.titleClustering English news articles based on relevant domains: comparative study using three clustering algorithms
dc.title.alternativeICT, Mathematics and Statistics
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Disayiram, N..pdf
Size:
148.2 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description:

Collections