TY - GEN
T1 - Big Data Quality
T2 - 13th IEEE International Conference on Ubiquitous Intelligence and Computing, 13th IEEE International Conference on Advanced and Trusted Computing, 16th IEEE International Conference on Scalable Computing and Communications, IEEE International Conference on Cloud and Big Data Computing, IEEE International Conference on Internet of People and IEEE Smart World Congress and Workshops, UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld 2016
AU - Taleb, Ikbal
AU - Kassabi, Hadeel T.El
AU - Serhani, Mohamed Adel
AU - Dssouli, Rachida
AU - Bouhaddioui, Chafik
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2017/1/12
Y1 - 2017/1/12
N2 - Data is the most valuable asset companies are proud of. When its quality degrades, the consequences are unpredictable, can lead to complete wrong insights. In Big Data context, evaluating the data quality is challenging, must be done prior to any Big data analytics by providing some data quality confidence. Given the huge data size, its fast generation, it requires mechanisms, strategies to evaluate, assess data quality in a fast, efficient way. However, checking the Quality of Big Data is a very costly process if it is applied on the entire data. In this paper, we propose an efficient data quality evaluation scheme by applying sampling strategies on Big data sets. The Sampling will reduce the data size to a representative population samples for fast quality evaluation. The evaluation targeted some data quality dimensions like completeness, consistency. The experimentations have been conducted on Sleep disorder's data set by applying Big data bootstrap sampling techniques. The results showed that the mean quality score of samples is representative for the original data, illustrate the importance of sampling to reduce computing costs when Big data quality evaluation is concerned. We applied the Quality results generated as quality proposals on the original data to increase its quality.
AB - Data is the most valuable asset companies are proud of. When its quality degrades, the consequences are unpredictable, can lead to complete wrong insights. In Big Data context, evaluating the data quality is challenging, must be done prior to any Big data analytics by providing some data quality confidence. Given the huge data size, its fast generation, it requires mechanisms, strategies to evaluate, assess data quality in a fast, efficient way. However, checking the Quality of Big Data is a very costly process if it is applied on the entire data. In this paper, we propose an efficient data quality evaluation scheme by applying sampling strategies on Big data sets. The Sampling will reduce the data size to a representative population samples for fast quality evaluation. The evaluation targeted some data quality dimensions like completeness, consistency. The experimentations have been conducted on Sleep disorder's data set by applying Big data bootstrap sampling techniques. The results showed that the mean quality score of samples is representative for the original data, illustrate the importance of sampling to reduce computing costs when Big data quality evaluation is concerned. We applied the Quality results generated as quality proposals on the original data to increase its quality.
KW - Big Data
KW - Big data sampling
KW - Data quality dimensions
KW - Data quality evaluation
UR - http://www.scopus.com/inward/record.url?scp=85013149567&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85013149567&partnerID=8YFLogxK
U2 - 10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0122
DO - 10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0122
M3 - Conference contribution
AN - SCOPUS:85013149567
T3 - Proceedings - 13th IEEE International Conference on Ubiquitous Intelligence and Computing, 13th IEEE International Conference on Advanced and Trusted Computing, 16th IEEE International Conference on Scalable Computing and Communications, IEEE International Conference on Cloud and Big Data Computing, IEEE International Conference on Internet of People and IEEE Smart World Congress and Workshops, UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld 2016
SP - 759
EP - 765
BT - Proceedings - 13th IEEE International Conference on Ubiquitous Intelligence and Computing, 13th IEEE International Conference on Advanced and Trusted Computing, 16th IEEE International Conference on Scalable Computing and Communications, IEEE International Conference on Cloud and Big Data Computing, IEEE International Conference on Internet of People and IEEE Smart World Congress and Workshops, UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld 2016
A2 - El Baz, Didier
A2 - Bourgeois, Julien
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 July 2016 through 21 July 2016
ER -