The early diagnosis of breast cancer is an important step in a fight against the disease. Machine learning techniques have shown promise in improving our understanding of the disease. As medical datasets consist of data points which cannot be precisely assigned to a class, fuzzy methods have been useful for studying of these datasets. Sometimes breast cancer datasets are described by categorical features. Many fuzzy clustering algorithms have been developed for categorical datasets. However, in most of these methods Hamming distance is used to define the distance between the two categorical feature values. In this paper, we use a probabilistic distance measure for the distance computation among a pair of categorical feature values. Experiments demonstrate that the distance measure performs better than Hamming distance for Wisconsin breast cancer data.
ASJC Scopus subject areas
- Environmental Science(all)
- Agricultural and Biological Sciences(all)