TY - GEN
T1 - Big Data Pre-processing
T2 - 4th IEEE International Congress on Big Data, BigData Congress 2015
AU - Taleb, Ikbal
AU - Dssouli, Rachida
AU - Serhani, Mohamed Adel
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/8/17
Y1 - 2015/8/17
N2 - With the abundance of raw data generated from various sources, Big Data has become a preeminent approach in acquiring, processing, and analyzing large amounts of heterogeneous data to derive valuable evidences. The size, speed, and formats in which data is generated and processed affect the overall quality of information. Therefore, Quality of Big Data (QBD) has become an important factor to ensure that the quality of data is maintained at all Big data processing phases. This paper addresses the QBD at the pre-processing phase, which includes sub-processes like cleansing, integration, filtering, and normalization. We propose a QBD model incorporating processes to support Data quality profile selection and adaptation. In addition, it tracks and registers on a data provenance repository the effect of every data transformation happened in the pre-processing phase. We evaluate the data quality selection module using large EEG dataset. The obtained results illustrate the importance of addressing QBD at an early phase of Big Data processing lifecycle since it significantly save on costs and perform accurate data analysis.
AB - With the abundance of raw data generated from various sources, Big Data has become a preeminent approach in acquiring, processing, and analyzing large amounts of heterogeneous data to derive valuable evidences. The size, speed, and formats in which data is generated and processed affect the overall quality of information. Therefore, Quality of Big Data (QBD) has become an important factor to ensure that the quality of data is maintained at all Big data processing phases. This paper addresses the QBD at the pre-processing phase, which includes sub-processes like cleansing, integration, filtering, and normalization. We propose a QBD model incorporating processes to support Data quality profile selection and adaptation. In addition, it tracks and registers on a data provenance repository the effect of every data transformation happened in the pre-processing phase. We evaluate the data quality selection module using large EEG dataset. The obtained results illustrate the importance of addressing QBD at an early phase of Big Data processing lifecycle since it significantly save on costs and perform accurate data analysis.
KW - Big Data
KW - Data Quality
KW - pre-processing
UR - http://www.scopus.com/inward/record.url?scp=84953448994&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84953448994&partnerID=8YFLogxK
U2 - 10.1109/BigDataCongress.2015.35
DO - 10.1109/BigDataCongress.2015.35
M3 - Conference contribution
AN - SCOPUS:84953448994
T3 - Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015
SP - 191
EP - 198
BT - Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015
A2 - Khan, Latifur
A2 - Barbara, Carminati
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 27 June 2015 through 2 July 2015
ER -