A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets

Amir Ahmad, Lipika Dey

Research output: Contribution to journalArticlepeer-review

47 Citations (Scopus)

Abstract

Almost all subspace clustering algorithms proposed so far are designed for numeric datasets. In this paper, we present a k-means type clustering algorithm that finds clusters in data subspaces in mixed numeric and categorical datasets. In this method, we compute attributes contribution to different clusters. We propose a new cost function for a k-means type algorithm. One of the advantages of this algorithm is its complexity which is linear with respect to the number of the data points. This algorithm is also useful in describing the cluster formation in terms of attributes contribution to different clusters. The algorithm is tested on various synthetic and real datasets to show its effectiveness. The clustering results are explained by using attributes weights in the clusters. The clustering results are also compared with published results.

Original languageEnglish
Pages (from-to)1062-1069
Number of pages8
JournalPattern Recognition Letters
Volume32
Issue number7
DOIs
Publication statusPublished - May 1 2011
Externally publishedYes

Keywords

  • Categorical data
  • Clustering
  • Mixed data
  • Subspace clustering

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets'. Together they form a unique fingerprint.

Cite this