TY - GEN
T1 - An hybrid approach to quality evaluation across big data value chain
AU - Serhani, Mohamed Adel
AU - El Kassabi, Hadeel T.
AU - Taleb, Ikbal
AU - Nujum, Alramzana
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/10/5
Y1 - 2016/10/5
N2 - While the potential benefits of Big Data adoption are significant, and some initial successes have already been realized, there remain many research and technical challenges that must be addressed to fully realize this potential. The Big Data processing, storage and analytics, of course, are major challenges that are most easily recognized. However, there are additional challenges related for instance to Big Data collection, integration, and quality enforcement. This paper proposes a hybrid approach to Big Data quality evaluation across the Big Data value chain. It consists of assessing first the quality of Big Data itself, which involve processes such as cleansing, filtering and approximation. Then, assessing the quality of process handling this Big Data, which involve for example processing and analytics process. We conduct a set of experiments to evaluate Quality of Data prior and after its pre-processing, and the Quality of the pre-processing and processing on a large dataset. Quality metrics have been measured to access three Big Data quality dimensions: accuracy, completeness, and consistency. The results proved that combination of data-driven and process-driven quality evaluation lead to improved quality enforcement across the Big Data value chain. Hence, we recorded high prediction accuracy and low processing time after we evaluate 6 well-known classification algorithms as part of processing and analytics phase of Big Data value chain.
AB - While the potential benefits of Big Data adoption are significant, and some initial successes have already been realized, there remain many research and technical challenges that must be addressed to fully realize this potential. The Big Data processing, storage and analytics, of course, are major challenges that are most easily recognized. However, there are additional challenges related for instance to Big Data collection, integration, and quality enforcement. This paper proposes a hybrid approach to Big Data quality evaluation across the Big Data value chain. It consists of assessing first the quality of Big Data itself, which involve processes such as cleansing, filtering and approximation. Then, assessing the quality of process handling this Big Data, which involve for example processing and analytics process. We conduct a set of experiments to evaluate Quality of Data prior and after its pre-processing, and the Quality of the pre-processing and processing on a large dataset. Quality metrics have been measured to access three Big Data quality dimensions: accuracy, completeness, and consistency. The results proved that combination of data-driven and process-driven quality evaluation lead to improved quality enforcement across the Big Data value chain. Hence, we recorded high prediction accuracy and low processing time after we evaluate 6 well-known classification algorithms as part of processing and analytics phase of Big Data value chain.
KW - Big Data
KW - Hybrid quality assessment
KW - Metadata
KW - Quality Metadata
KW - Quality assessment
KW - Quality metrics
KW - Quality of process
UR - http://www.scopus.com/inward/record.url?scp=84994613539&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84994613539&partnerID=8YFLogxK
U2 - 10.1109/BigDataCongress.2016.65
DO - 10.1109/BigDataCongress.2016.65
M3 - Conference contribution
AN - SCOPUS:84994613539
T3 - Proceedings - 2016 IEEE International Congress on Big Data, BigData Congress 2016
SP - 418
EP - 425
BT - Proceedings - 2016 IEEE International Congress on Big Data, BigData Congress 2016
A2 - Pu, Calton
A2 - Fox, Geoffrey
A2 - Damiani, Ernesto
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th IEEE International Congress on Big Data, BigData Congress 2016
Y2 - 27 June 2016 through 2 July 2016
ER -