K-Harmonic means type clustering algorithm for mixed datasets

Amir Ahmad, Sarosh Hashmi

Research output: Contribution to journalArticlepeer-review

45 Citations (Scopus)

Abstract

K-means type clustering algorithms for mixed data that consists of numeric and categorical attributes suffer from cluster center initialization problem. The final clustering results depend upon the initial cluster centers. Random cluster center initialization is a popular initialization technique. However, clustering results are not consistent with different cluster center initializations. K-Harmonic means clustering algorithm tries to overcome this problem for pure numeric data. In this paper, we extend the K-Harmonic means clustering algorithm for mixed datasets. We propose a definition for a cluster center and a distance measure. These cluster centers and the distance measure are used with the cost function of K-Harmonic means clustering algorithm in the proposed algorithm. Experiments were carried out with pure categorical datasets and mixed datasets. Results suggest that the proposed clustering algorithm is quite insensitive to the cluster center initialization problem. Comparative studies with other clustering algorithms show that the proposed algorithm produce better clustering results.

Original languageEnglish
Pages (from-to)39-49
Number of pages11
JournalApplied Soft Computing Journal
Volume48
DOIs
Publication statusPublished - Nov 1 2016

Keywords

  • Categorical attributes
  • Clustering
  • K-Harmonic means clustering
  • Mixed data
  • Numeric attributes

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'K-Harmonic means type clustering algorithm for mixed datasets'. Together they form a unique fingerprint.

Cite this