Monotonicity of the χ2 -statistic and Feature Selection

Firuz Kamalov, Ho Hon Leung, Sherif Moussa

Research output: Contribution to journalArticlepeer-review

Abstract

Feature selection is an important preprocessing step in analyzing large scale data. In this paper, we prove the monotonicity property of the χ2-statistic and use it to construct a more robust feature selection method. In particular, we show that χY,X12≤χY,(X1,X2)2. This result indicates that a new feature should be added to an existing feature set only if it increases the χ2-statistic beyond a certain threshold. Our stepwise feature selection algorithm significantly reduces the number of features considered at each stage making it more efficient than other similar methods. In addition, the selection process has a natural stopping point thus eliminating the need for user input. Numerical experiments confirm that the proposed algorithm can significantly reduce the number of features required for classification and improve classifier accuracy.

Original languageEnglish
Pages (from-to)1223-1241
Number of pages19
JournalAnnals of Data Science
Volume9
Issue number6
DOIs
Publication statusPublished - Dec 2022

Keywords

  • Big data
  • Feature selection
  • Machine learning
  • χ-statistic

ASJC Scopus subject areas

  • Business, Management and Accounting (miscellaneous)
  • Computer Science Applications
  • Statistics, Probability and Uncertainty
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Monotonicity of the χ2 -statistic and Feature Selection'. Together they form a unique fingerprint.

Cite this