Abstract
Use of traditional k-mean type algorithm is limited to numeric data. This paper presents a clustering algorithm based on k-mean paradigm that works well for data with mixed numeric and categorical features. We propose new cost function and distance measure based on co-occurrence of values. The measures also take into account the significance of an attribute towards the clustering process. We present a modified description of cluster center to overcome the numeric data only limitation of k-mean algorithm and provide a better characterization of clusters. The performance of this algorithm has been studied on real world data sets. Comparisons with other clustering algorithms illustrate the effectiveness of this approach.
| Original language | English |
|---|---|
| Pages (from-to) | 503-527 |
| Number of pages | 25 |
| Journal | Data and Knowledge Engineering |
| Volume | 63 |
| Issue number | 2 |
| DOIs | |
| Publication status | Published - Nov 2007 |
| Externally published | Yes |
Keywords
- Clustering
- Co-occurrences
- Cost function
- Distance measure
- Significance of attributes
- k-Mean clustering
ASJC Scopus subject areas
- Information Systems and Management