Double weighted k nearest neighbours for binary classification of high dimensional genomic data

Amjad Ali, Zardad Khan, Hailiang Du, Saeed Aldahmani

Research output: Contribution to journalArticlepeer-review

Abstract

High dimensional gene expression datasets consist of a large number of genes, many of which do not play a significant role in classifying tissue samples. The high dimensional nature of this type of data, characterized by a large number of gene features substantially exceeding its sample size, makes it challenging for existing methods to work efficiently in terms of prediction accuracy and execution time. To address this issue, a new classification procedure called double weighted k nearest neighbours () is proposed. is specifically designed for gene expression data and incorporates feature weights derived from genes’ ability to express deferentially between classes. Features weights are derived in a manner that automatically increase the impact of informative features while decreasing it for features that are less/non informative. To achieve this goal, the estimated weighted distances from the observations in the k nearest neighbourhood to the test point are used in an exponential function. The outputs of the function are summed for both the classes separately and the test point is assigned the class label with the largest sum. By utilizing the proposed weighting method based on the differential capability of genes, the method aims to achieve robust and efficient classification by allowing only the most informative features/genes to contribute to the classification task. Experimental evaluations, in comparison with several methods, i.e., standard, weighted k nearest neighbours classifier (), random k nearest neighbour (), extended neighbourhood rule ensemble (ExNRule), k conditional nearest neighbour (), ensemble and support vector machines (SVM), demonstrate the effectiveness of in accurately classifying gene expression datasets. Overall, presents a promising approach for gene expression data analysis through the two fold weighted distance calculation strategy using classification accuracy, Cohen’s kappa, sensitivity and score as performance metrics.

Original languageEnglish
Article number12681
JournalScientific reports
Volume15
Issue number1
DOIs
Publication statusPublished - Dec 2025

ASJC Scopus subject areas

  • General

Fingerprint

Dive into the research topics of 'Double weighted k nearest neighbours for binary classification of high dimensional genomic data'. Together they form a unique fingerprint.

Cite this