Abstract
Gene expression datasets often contain many features/genes that do not contribute to classifying sampled tissues. This is particularly challenging in high-dimensional datasets, where the number of genes is much higher than the number of samples. Therefore, it is important to identify the most influential features. This study presents a new approach called weighted Fisher score (WFISH) for selecting features in gene expression data. WFISH uses gene expression differences between classes to assign weights to features, prioritizing informative ones and reducing the impact of less useful ones. By incorporating these weights in the traditional Fisher score, WFISH aims to select the most informative and biologically significant genes in high-dimensional classification problems. Our experiments show that WFISH outperforms other feature selection techniques in accurately classifying gene expression data. When using random forest (RF) and k nearest neighbors (kNN) classifiers on five benchmark datasets, WFISH consistently achieved lower classification errors compared to the existing techniques.
| Original language | English |
|---|---|
| Article number | 113329 |
| Journal | Applied Soft Computing |
| Volume | 180 |
| DOIs | |
| Publication status | Published - Aug 2025 |
Keywords
- Binary classification
- Deferentially expressed genes
- Feature selection
- Gene expression data
- High-dimensionality
ASJC Scopus subject areas
- Software
Fingerprint
Dive into the research topics of 'Optimized feature selection in high-dimensional gene expression data using weighted differential gene expression analysis'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS