TY - GEN
T1 - Big Data Pre-Processing
T2 - 6th IEEE International Congress on Big Data, BigData Congress 2017
AU - Taleb, Ikbal
AU - Serhani, Mohamed Adel
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/9/7
Y1 - 2017/9/7
N2 - In the Big Data Era, data is the core for any governmental, institutional, and private organization. Efforts were geared towards extracting highly valuable insights that cannot happen if data is of poor quality. Therefore, data quality (DQ) is considered as a key element in Big data processing phase. In this stage, low quality data is not penetrated to the Big Data value chain. This paper, addresses the data quality rules discovery (DQR) after the evaluation of quality and prior to Big Data pre-processing. We propose a DQR discovery model to enhance and accurately target the pre-processing activities based on quality requirements. We defined, a set of pre-processing activities associated with data quality dimensions (DQD's) to automatize the DQR generation process. Rules optimization are applied on validated rules to avoid multi-passes pre-processing activities and eliminates duplicate rules. Conducted experiments showed an increased quality scores after applying the discovered and optimized DQR's on data.
AB - In the Big Data Era, data is the core for any governmental, institutional, and private organization. Efforts were geared towards extracting highly valuable insights that cannot happen if data is of poor quality. Therefore, data quality (DQ) is considered as a key element in Big data processing phase. In this stage, low quality data is not penetrated to the Big Data value chain. This paper, addresses the data quality rules discovery (DQR) after the evaluation of quality and prior to Big Data pre-processing. We propose a DQR discovery model to enhance and accurately target the pre-processing activities based on quality requirements. We defined, a set of pre-processing activities associated with data quality dimensions (DQD's) to automatize the DQR generation process. Rules optimization are applied on validated rules to avoid multi-passes pre-processing activities and eliminates duplicate rules. Conducted experiments showed an increased quality scores after applying the discovered and optimized DQR's on data.
KW - Big Data
KW - Big Data Pre-Processing
KW - Data Quality Evaluation
KW - Data Quality Rules Discovery
UR - http://www.scopus.com/inward/record.url?scp=85032359531&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85032359531&partnerID=8YFLogxK
U2 - 10.1109/BigDataCongress.2017.73
DO - 10.1109/BigDataCongress.2017.73
M3 - Conference contribution
AN - SCOPUS:85032359531
T3 - Proceedings - 2017 IEEE 6th International Congress on Big Data, BigData Congress 2017
SP - 498
EP - 501
BT - Proceedings - 2017 IEEE 6th International Congress on Big Data, BigData Congress 2017
A2 - Karypis, George
A2 - Zhang, Jia
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 25 June 2017 through 30 June 2017
ER -