WSEAS Transactions on Information Science and Applications
Print ISSN: 1790-0832, E-ISSN: 2224-3402
Volume 9, 2012
MLK-Means - A Hybrid Machine Learning based K-Means Clustering Algorithms for Document Clustering
Authors: ,
Abstract: Document clustering is useful in many information retrieval tasks such as document browsing, organization and viewing of retrieval results. They are very much and currently the subject of significant global research. Generative models based on the multivariate Bernoulli and multinomial distributions have been widely used for text classification. In this work, address a new hybrid algorithm called MLK-Means for clustering TMG format document data, in which, the normal Euclidean distance based metric of the k-mean process is replaced by a machine learning technique. The results of the proposed algorithm were compared with the probabilistic model namely, von Mises-Fisher model-based clustering (vMF-based k-means) and the standard k-mean with L-2 normalized data method. In this proposed work, the MLK-Means algorithm has been implemented and its performance is compared with other algorithms mentioned above. The improvements in the proposed algorithm are more significant and comparable.
Search Articles
Keywords: Document Clustering, Model Based Clustering, Term Document Matrix, Text to Matrix Generator (TMG), k-means, Machine Learning, Bernoulli, Multinomial and von Mises-Fisher Clustering