Detection of highly correlated live data streams

Rakan Alseghayer, Daniel Petrov, Panos K. Chrysanthis, Mohamed Sharaf, Alexandros Labrinidis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

More and more organizations (commercial, health, government and security) currently base their decisions on real-time analysis of fast arriving, large volumes of data streams. For such analysis to lead to actionable information in real-time and at the right time, the most recent data needs to be processed within a specified delay target. Effective solutions for analysis of such data streams rely on two techniques, (1) incremental sliding-window computation of aggregates, to avoid unnecessary recomputations and (2) intelligent scheduling of computational steps and operations. In this paper, we propose a solution that combines both of these techniques to find highly correlated data streams in real-time, using the Pearson Correlation Coefficient as a correlation metric for two windows of data streams. Specifically, we propose to partition a set of data streams into micro-batches that capture the delay target, use sliding windows within a range as the subsequences of values exhibiting a certain level of correlation, utilize the idea of sufficient statistics to incrementally compute the Pearson Correlation Coefficient of pairs of sliding windows, and adopt a deadline-aware priority scheduling to detect the highly correlated pairs of data streams.Our experimental results show that our scheme and in particular our Price-DCS with warm start scheduling algorithm outperform existing ones and enable high degree of interactivity in correlating live data streams micro-batches.

Original languageEnglish
Title of host publicationProceedings of the International Workshop on Real-Time Business Intelligence and Analytics, BIRTE 2017
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450354257
DOIs
Publication statusPublished - Aug 28 2017
Externally publishedYes
Event11th International Workshop on Real-Time Business Intelligence and Analytics, BIRTE 2017 - Munich, Germany
Duration: Aug 28 2017 → …

Publication series

NameACM International Conference Proceeding Series
VolumePart F130527

Conference

Conference11th International Workshop on Real-Time Business Intelligence and Analytics, BIRTE 2017
Country/TerritoryGermany
CityMunich
Period8/28/17 → …

Keywords

  • Correlation
  • Data exploration
  • Data streams
  • Search
  • Subsequence

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Detection of highly correlated live data streams'. Together they form a unique fingerprint.

Cite this