Performance Comparison of Clustering Algorithms on Scientific Publications
The enormous increase of scientific papers in digital form has increased document management complexity. The development of effective and efficient methods to sort and organize the documents is thus very crucial. Clustering is one of data mining techniques widely applied in various field that may be used to resolve the issue. This paper presents the performance comparison of partitioning-based clustering algorithms, namely random clustering, k-means, x-means, and k-medoids in an unsupervised classification of scientific publications based on topic similarity. Rapidminer is utilized to preprocess and analyze the data. Afterwards, the purity value and processing time of each algorithm are investigated. The results show that k-means performs the best purity value, although its run time is not the fastest. Meanwhile random clustering offers the fastest processing time with the lowest purity value trade-off. None of the observed algorithms produce best purity and processing time at once. It may due to the complex of parameters that affect the clustering results, inter alia, the type of data, selected algorithm, distance measures, and preprocessing methods.
No Reference information available - sign in for access.
No Citation information available - sign in for access.
No Supplementary Data.
No Article Media
Document Type: Research Article
Affiliations: Department of Electrical Engineering, Universitas Indonesia, Depok 16424, Indonesia
Publication date: April 1, 2017
More about this publication?
- ADVANCED SCIENCE LETTERS is an international peer-reviewed journal with a very wide-ranging coverage, consolidates research activities in all areas of (1) Physical Sciences, (2) Biological Sciences, (3) Mathematical Sciences, (4) Engineering, (5) Computer and Information Sciences, and (6) Geosciences to publish original short communications, full research papers and timely brief (mini) reviews with authors photo and biography encompassing the basic and applied research and current developments in educational aspects of these scientific areas.
- Editorial Board
- Information for Authors
- Subscribe to this Title
- Ingenta Connect is not responsible for the content or availability of external websites