Comparative analysis of dimension reduction techniques for k-means clustering In highdimensional data

Loading...
Thumbnail Image
Date
2015
Authors
Meerigama, K. M. W. G. T.
Journal Title
Journal ISSN
Volume Title
Publisher
University of Peradeniya
Abstract
A computation including high dimensional data is cumbersome due to high demand for computational power. Additionally, high dimensional data is hard to visualize. Converting the high dimensional data into lower dimensions greatly reduces the computational burden and will allow the data to be visualized easily. However, the efficiency of the processed data after reducing the dimensionality which is measured by Sum of Squared Error(SSE) method in k-means clustering algorithm should still be explored. This research study focuses on the use of dimensionality reduction methods as a preprocessing step to improve the accuracy and the efficiency of clustering high- dimensional data. This is demonstrated by considering principal component analysis (PCA), independent component analysis (lCA) and combination of PCA and ICA (in this method , first apply PCA then ICA) as dimensionality reduction methods for k-means clustering algorithm. The observed results from PCA and ICA methods yield lower error rates compared to the use of original dimensions when clustering text data with k-means clustering algorithm.
Description
Keywords
High-dimensional data , K-means clustering
Citation
Collections