TY - JOUR
T1 - initKmix-A novel initial partition generation algorithm for clustering mixed data using k-means-based clustering
AU - Ahmad, Amir
AU - Khan, Shehroz S.
N1 - Publisher Copyright:
© 2020 Elsevier Ltd
PY - 2021/4/1
Y1 - 2021/4/1
N2 - Mixed datasets consist of both numeric and categorical attributes. Various k-means-based clustering algorithms have been developed for these datasets. Generally, these algorithms use random partition as a starting point, which tends to produce different clustering results for different runs. In this paper, we propose, initKmix, a novel algorithm for finding an initial partition for k-means-based clustering algorithms for mixed datasets. In the initKmix algorithm, a k-means-based clustering algorithm is run many times, and in each run, one of the attributes is used to create initial clusters for that run. The clustering results of various runs are combined to produce the initial partition. This initial partition is then used as a seed to a k-means-based clustering algorithm to cluster mixed data. Experiments with various categorical and mixed datasets showed that initKmix produced accurate and consistent results, and outperformed the random initial partition method and other state-of-the-art initialization methods. Experiments also showed that k-means-based clustering for mixed datasets with initKmix performed similar to or better than many state-of-the-art clustering algorithms for categorical and mixed datasets.
AB - Mixed datasets consist of both numeric and categorical attributes. Various k-means-based clustering algorithms have been developed for these datasets. Generally, these algorithms use random partition as a starting point, which tends to produce different clustering results for different runs. In this paper, we propose, initKmix, a novel algorithm for finding an initial partition for k-means-based clustering algorithms for mixed datasets. In the initKmix algorithm, a k-means-based clustering algorithm is run many times, and in each run, one of the attributes is used to create initial clusters for that run. The clustering results of various runs are combined to produce the initial partition. This initial partition is then used as a seed to a k-means-based clustering algorithm to cluster mixed data. Experiments with various categorical and mixed datasets showed that initKmix produced accurate and consistent results, and outperformed the random initial partition method and other state-of-the-art initialization methods. Experiments also showed that k-means-based clustering for mixed datasets with initKmix performed similar to or better than many state-of-the-art clustering algorithms for categorical and mixed datasets.
KW - Clustering
KW - Initialization
KW - Mixed data
KW - Random
KW - k-means
UR - http://www.scopus.com/inward/record.url?scp=85094894518&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094894518&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2020.114149
DO - 10.1016/j.eswa.2020.114149
M3 - Article
AN - SCOPUS:85094894518
SN - 0957-4174
VL - 167
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 114149
ER -