A k-mean clustering algorithm for mixed numeric and categorical data

Amir Ahmad, Lipika Dey

Research output: Contribution to journalArticlepeer-review

552 Citations (Scopus)

Abstract

Use of traditional k-mean type algorithm is limited to numeric data. This paper presents a clustering algorithm based on k-mean paradigm that works well for data with mixed numeric and categorical features. We propose new cost function and distance measure based on co-occurrence of values. The measures also take into account the significance of an attribute towards the clustering process. We present a modified description of cluster center to overcome the numeric data only limitation of k-mean algorithm and provide a better characterization of clusters. The performance of this algorithm has been studied on real world data sets. Comparisons with other clustering algorithms illustrate the effectiveness of this approach.

Original languageEnglish
Pages (from-to)503-527
Number of pages25
JournalData and Knowledge Engineering
Volume63
Issue number2
DOIs
Publication statusPublished - Nov 2007
Externally publishedYes

Keywords

  • Clustering
  • Co-occurrences
  • Cost function
  • Distance measure
  • Significance of attributes
  • k-Mean clustering

ASJC Scopus subject areas

  • Information Systems and Management

Fingerprint

Dive into the research topics of 'A k-mean clustering algorithm for mixed numeric and categorical data'. Together they form a unique fingerprint.

Cite this