Clustering English news articles based on relevant domains: comparative study using three clustering algorithms

Disayiram, N.; Rupasingha, R.A.H.M.

Clustering English news articles based on relevant domains: comparative study using three clustering algorithms

dc.contributor.author	Disayiram, N.
dc.contributor.author	Rupasingha, R.A.H.M.
dc.date.accessioned	2026-03-12T05:37:24Z
dc.date.available	2026-03-12T05:37:24Z
dc.date.issued	2022-10-28
dc.description.abstract	The news tells us about what happens around us. Nowadays, people use news sites to read exciting news. News has many categories. The preferable choice of the news category differs for each newsreader. In the end, every news category is important. Every day lots of news is published on news websites. Typically, news sites categorize the news, but all the categories are not included on that site. Most news sites prioritise some categories, and other categories get lower media coverage. It is, therefore, difficult to find the relevant types of news. These problems give complexity to the newsreaders and relevant content seekers to find the relevant section on the news sites. The clustering of English news based on the relative category gives solutions to overcome those problems. This study aims to cluster news articles based on the relevant domain using machine-learning algorithms. We consider five domains: politics, sports, health, technology, and business. The online collected data was converted into vector format by using the term frequency-inverse document frequency vectorization. Then, the three clustering algorithms: Expectation Maximization, Simple Kmeans, and Hierarchical Clustering based on agglomerative technique, were separately applied to the body of the news and the news headline. The accuracy is calculated through the classes to clusters evaluation model in the WEKA tool. The results show that the Expectation Maximization algorithm achieved the highest accuracy of 87.9%, while it was 83.8% for the Simple Kmeans algorithm. Further, the Hierarchical Clustering method achieved the minimum accuracy results. The comparison results between the heading of news and the body of news show that the body of news performed better than the heading of news to cluster the news articles.
dc.identifier.citation	Proceedings of the Postgraduate Institute of Science Research Congress (RESCON) -2022, University of Peradeniya, P 102
dc.identifier.isbn	978-955-8787-09-0
dc.identifier.uri	https://ir.lib.pdn.ac.lk/handle/20.500.14444/7632
dc.language.iso	en_US
dc.publisher	Postgraduate Institute of Science (PGIS), University of Peradeniya, Sri Lanka
dc.subject	Clustering
dc.subject	Domain
dc.subject	Machine learning
dc.subject	News article
dc.title	Clustering English news articles based on relevant domains: comparative study using three clustering algorithms
dc.title.alternative	ICT, Mathematics and Statistics
dc.type	Article

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Disayiram, N..pdf
Size:: 148.2 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

RESCON 2022