A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set

Amir Ahmad, Lipika Dey

Research output: Contribution to journalArticlepeer-review

103 Citations (Scopus)

Abstract

Computation of similarity between categorical data objects in unsupervised learning is an important data mining problem. We propose a method to compute distance between two attribute values of same attribute for unsupervised learning. This approach is based on the fact that similarity of two attribute values is dependent on their relationship with other attributes. Computational cost of this method is linear with respect to number of data objects in data set. To see the effectiveness of our proposed distance measure, we use proposed distance measure with K-mode clustering algorithm to cluster various categorical data sets. Significant improvement in clustering accuracy is observed as compared to clustering results obtained using traditional K-mode clustering algorithm.

Original languageEnglish
Pages (from-to)110-118
Number of pages9
JournalPattern Recognition Letters
Volume28
Issue number1
DOIs
Publication statusPublished - Jan 1 2007
Externally publishedYes

Keywords

  • Categorical data
  • Co-occurrences
  • Similarity
  • Unsupervised learning

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set'. Together they form a unique fingerprint.

Cite this