Outlier Detection in High Dimensional Data

Firuz Kamalov, Ho Hon Leung

Research output: Contribution to journalArticlepeer-review

29 Citations (Scopus)

Abstract

High-dimensional data poses unique challenges in outlier detection process. Most of the existing algorithms fail to properly address the issues stemming from a large number of features. In particular, outlier detection algorithms perform poorly on dataset of small size with a large number of features. In this paper, we propose a novel outlier detection algorithm based on principal component analysis and kernel density estimation. The proposed method is designed to address the challenges of dealing with high-dimensional data by projecting the original data onto a smaller space and using the innate structure of the data to calculate anomaly scores for each data point. Numerical experiments on synthetic and real-life data show that our method performs well on high-dimensional data. In particular, the proposed method outperforms the benchmark methods as measured by F1-score. Our method also produces better-than-average execution times compared with the benchmark methods.

Original languageEnglish
Article number2040013
JournalJournal of Information and Knowledge Management
Volume19
Issue number1
DOIs
Publication statusPublished - Mar 1 2020

Keywords

  • KDE
  • Outlier detection
  • PCA
  • high dimensional data

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Networks and Communications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Outlier Detection in High Dimensional Data'. Together they form a unique fingerprint.

Cite this