TY - GEN
T1 - DCS
T2 - 9th International Workshop on Business Intelligence for the Real-Time Enterprise, BIRTE 2015, 10th International Workshop on Enabling Real-Time Business Intelligence, BIRTE 2016 and 11th International Workshop on Real-Time Business Intelligence and Analytics, BIRTE 2017 held in conjunction with the International Conference on Very Large Data Bases, VLDB 2017
AU - Alseghayer, Rakan
AU - Petrov, Daniel
AU - Chrysanthis, Panos K.
AU - Sharaf, Mohamed
AU - Labrinidis, Alexandros
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2019.
PY - 2019
Y1 - 2019
N2 - There is an increasing demand for real-time analysis of large volumes of data streams that are produced at high velocity. The most recent data needs to be processed within a specified delay target in order for the analysis to lead to actionable result. To this end, in this paper, we present an effective solution for detecting the correlation of such data streams within a micro-batch of a fixed time interval. Our solution, coined DCS, for Detection of Correlated Data Streams, combines (1) incremental sliding-window computation of aggregates, to avoid unnecessary re-computations, (2) intelligent scheduling of computation steps and operations, driven by a utility function within a micro-batch, and (3) an exploration policy that tunes the utility function. Specifically, we propose nine policies that explore correlated pairs of live data streams across consecutive micro-batches. Our experimental evaluation on a real world dataset shows that some policies are more suitable to identifying high numbers of correlated pairs of live data streams, already known from previous micro-batches, while others are more suitable to identifying previously unseen pairs of live data streams across consecutive micro-batches.
AB - There is an increasing demand for real-time analysis of large volumes of data streams that are produced at high velocity. The most recent data needs to be processed within a specified delay target in order for the analysis to lead to actionable result. To this end, in this paper, we present an effective solution for detecting the correlation of such data streams within a micro-batch of a fixed time interval. Our solution, coined DCS, for Detection of Correlated Data Streams, combines (1) incremental sliding-window computation of aggregates, to avoid unnecessary re-computations, (2) intelligent scheduling of computation steps and operations, driven by a utility function within a micro-batch, and (3) an exploration policy that tunes the utility function. Specifically, we propose nine policies that explore correlated pairs of live data streams across consecutive micro-batches. Our experimental evaluation on a real world dataset shows that some policies are more suitable to identifying high numbers of correlated pairs of live data streams, already known from previous micro-batches, while others are more suitable to identifying previously unseen pairs of live data streams across consecutive micro-batches.
UR - http://www.scopus.com/inward/record.url?scp=85075668881&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075668881&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-24124-7_12
DO - 10.1007/978-3-030-24124-7_12
M3 - Conference contribution
AN - SCOPUS:85075668881
SN - 9783030241230
T3 - Lecture Notes in Business Information Processing
SP - 191
EP - 210
BT - Real-Time Business Intelligence and Analytics - International Workshops, BIRTE 2015, BIRTE 2016, BIRTE 2017, Revised Selected Papers
A2 - Castellanos, Malu
A2 - Chrysanthis, Panos K.
A2 - Pelechrinis, Konstantinos
PB - Springer
Y2 - 28 August 2017 through 1 September 2017
ER -