This study proposes a supervised feature selection technique for classification in high dimensional binary class problems by adding robustness in the conventional Fisher Score. The proposed method utilizes the more robust measure of location i.e. the Median and measure of dispersion known as Rousseeuw and Croux statistic (Qn). Initially minimum subset of genes is identified by the Greedy search approach, which is then combined with the top ranked genes obtained via the proposed Robust Fisher Score (RFish). Finally to remove redundancy in the selected genes, Least Absolute Shrinkage and Selection Operator (LASSO) has been applied. The proposed method is validated on five publicly available datasets. The results of the proposed method are compared with six well known feature selection methods based on prediction performance via Random Forest (RF), Support Vector Machine (SVM) and k Nearest Neighbour (k-NN) classifiers. Box-plots and Bar-plots of the results of the proposed method and all the other methods considered in the manuscript are also given. The Results show that the proposed method (RFish) performs better than all the other methods in majority of the cases. The paper gives a detailed simulation study to further assess the proposed method.
- Feature selection
- Fisher Score
- High dimensional gene expression datasets
- Rousseeuw and Croux statistic
ASJC Scopus subject areas
- Computer Science(all)
- Materials Science(all)