Skip to main navigation Skip to search Skip to main content

Optimized feature selection in high-dimensional gene expression data using weighted differential gene expression analysis

Research output: Contribution to journalArticlepeer-review

Abstract

Gene expression datasets often contain many features/genes that do not contribute to classifying sampled tissues. This is particularly challenging in high-dimensional datasets, where the number of genes is much higher than the number of samples. Therefore, it is important to identify the most influential features. This study presents a new approach called weighted Fisher score (WFISH) for selecting features in gene expression data. WFISH uses gene expression differences between classes to assign weights to features, prioritizing informative ones and reducing the impact of less useful ones. By incorporating these weights in the traditional Fisher score, WFISH aims to select the most informative and biologically significant genes in high-dimensional classification problems. Our experiments show that WFISH outperforms other feature selection techniques in accurately classifying gene expression data. When using random forest (RF) and k nearest neighbors (kNN) classifiers on five benchmark datasets, WFISH consistently achieved lower classification errors compared to the existing techniques.

Original languageEnglish
Article number113329
JournalApplied Soft Computing
Volume180
DOIs
Publication statusPublished - Aug 2025

Keywords

  • Binary classification
  • Deferentially expressed genes
  • Feature selection
  • Gene expression data
  • High-dimensionality

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Optimized feature selection in high-dimensional gene expression data using weighted differential gene expression analysis'. Together they form a unique fingerprint.

Cite this