Big Data Quality: A Quality Dimensions Evaluation

Research output: Chapter in Book/Report/Conference proceedingConference contribution

51 Citations (Scopus)

Abstract

Data is the most valuable asset companies are proud of. When its quality degrades, the consequences are unpredictable, can lead to complete wrong insights. In Big Data context, evaluating the data quality is challenging, must be done prior to any Big data analytics by providing some data quality confidence. Given the huge data size, its fast generation, it requires mechanisms, strategies to evaluate, assess data quality in a fast, efficient way. However, checking the Quality of Big Data is a very costly process if it is applied on the entire data. In this paper, we propose an efficient data quality evaluation scheme by applying sampling strategies on Big data sets. The Sampling will reduce the data size to a representative population samples for fast quality evaluation. The evaluation targeted some data quality dimensions like completeness, consistency. The experimentations have been conducted on Sleep disorder's data set by applying Big data bootstrap sampling techniques. The results showed that the mean quality score of samples is representative for the original data, illustrate the importance of sampling to reduce computing costs when Big data quality evaluation is concerned. We applied the Quality results generated as quality proposals on the original data to increase its quality.

Original languageEnglish
Title of host publicationProceedings - 13th IEEE International Conference on Ubiquitous Intelligence and Computing, 13th IEEE International Conference on Advanced and Trusted Computing, 16th IEEE International Conference on Scalable Computing and Communications, IEEE International Conference on Cloud and Big Data Computing, IEEE International Conference on Internet of People and IEEE Smart World Congress and Workshops, UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld 2016
EditorsDidier El Baz, Julien Bourgeois
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages759-765
Number of pages7
ISBN (Electronic)9781509027705
DOIs
Publication statusPublished - Jan 12 2017
Event13th IEEE International Conference on Ubiquitous Intelligence and Computing, 13th IEEE International Conference on Advanced and Trusted Computing, 16th IEEE International Conference on Scalable Computing and Communications, IEEE International Conference on Cloud and Big Data Computing, IEEE International Conference on Internet of People and IEEE Smart World Congress and Workshops, UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld 2016 - Toulouse, France
Duration: Jul 18 2016Jul 21 2016

Publication series

NameProceedings - 13th IEEE International Conference on Ubiquitous Intelligence and Computing, 13th IEEE International Conference on Advanced and Trusted Computing, 16th IEEE International Conference on Scalable Computing and Communications, IEEE International Conference on Cloud and Big Data Computing, IEEE International Conference on Internet of People and IEEE Smart World Congress and Workshops, UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld 2016

Other

Other13th IEEE International Conference on Ubiquitous Intelligence and Computing, 13th IEEE International Conference on Advanced and Trusted Computing, 16th IEEE International Conference on Scalable Computing and Communications, IEEE International Conference on Cloud and Big Data Computing, IEEE International Conference on Internet of People and IEEE Smart World Congress and Workshops, UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld 2016
Country/TerritoryFrance
CityToulouse
Period7/18/167/21/16

Keywords

  • Big Data
  • Big data sampling
  • Data quality dimensions
  • Data quality evaluation

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Big Data Quality: A Quality Dimensions Evaluation'. Together they form a unique fingerprint.

Cite this