Big Data Pre-Processing: Closing the Data Quality Enforcement Loop

Research output: Chapter in Book/Report/Conference proceedingConference contribution

27 Citations (Scopus)

Abstract

In the Big Data Era, data is the core for any governmental, institutional, and private organization. Efforts were geared towards extracting highly valuable insights that cannot happen if data is of poor quality. Therefore, data quality (DQ) is considered as a key element in Big data processing phase. In this stage, low quality data is not penetrated to the Big Data value chain. This paper, addresses the data quality rules discovery (DQR) after the evaluation of quality and prior to Big Data pre-processing. We propose a DQR discovery model to enhance and accurately target the pre-processing activities based on quality requirements. We defined, a set of pre-processing activities associated with data quality dimensions (DQD's) to automatize the DQR generation process. Rules optimization are applied on validated rules to avoid multi-passes pre-processing activities and eliminates duplicate rules. Conducted experiments showed an increased quality scores after applying the discovered and optimized DQR's on data.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE 6th International Congress on Big Data, BigData Congress 2017
EditorsGeorge Karypis, Jia Zhang
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages498-501
Number of pages4
ISBN (Electronic)9781538619964
DOIs
Publication statusPublished - Sept 7 2017
Event6th IEEE International Congress on Big Data, BigData Congress 2017 - Honolulu, United States
Duration: Jun 25 2017Jun 30 2017

Publication series

NameProceedings - 2017 IEEE 6th International Congress on Big Data, BigData Congress 2017

Other

Other6th IEEE International Congress on Big Data, BigData Congress 2017
Country/TerritoryUnited States
CityHonolulu
Period6/25/176/30/17

Keywords

  • Big Data
  • Big Data Pre-Processing
  • Data Quality Evaluation
  • Data Quality Rules Discovery

ASJC Scopus subject areas

  • Information Systems
  • Artificial Intelligence
  • Computer Networks and Communications
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Big Data Pre-Processing: Closing the Data Quality Enforcement Loop'. Together they form a unique fingerprint.

Cite this