Abstract
K-means type clustering algorithms for mixed data that consists of numeric and categorical attributes suffer from cluster center initialization problem. The final clustering results depend upon the initial cluster centers. Random cluster center initialization is a popular initialization technique. However, clustering results are not consistent with different cluster center initializations. K-Harmonic means clustering algorithm tries to overcome this problem for pure numeric data. In this paper, we extend the K-Harmonic means clustering algorithm for mixed datasets. We propose a definition for a cluster center and a distance measure. These cluster centers and the distance measure are used with the cost function of K-Harmonic means clustering algorithm in the proposed algorithm. Experiments were carried out with pure categorical datasets and mixed datasets. Results suggest that the proposed clustering algorithm is quite insensitive to the cluster center initialization problem. Comparative studies with other clustering algorithms show that the proposed algorithm produce better clustering results.
Original language | English |
---|---|
Pages (from-to) | 39-49 |
Number of pages | 11 |
Journal | Applied Soft Computing Journal |
Volume | 48 |
DOIs | |
Publication status | Published - Nov 1 2016 |
Keywords
- Categorical attributes
- Clustering
- K-Harmonic means clustering
- Mixed data
- Numeric attributes
ASJC Scopus subject areas
- Software