Big Data Pre-processing: A Quality Framework

Ikbal Taleb, Rachida Dssouli, Mohamed Adel Serhani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

100 Citations (Scopus)

Abstract

With the abundance of raw data generated from various sources, Big Data has become a preeminent approach in acquiring, processing, and analyzing large amounts of heterogeneous data to derive valuable evidences. The size, speed, and formats in which data is generated and processed affect the overall quality of information. Therefore, Quality of Big Data (QBD) has become an important factor to ensure that the quality of data is maintained at all Big data processing phases. This paper addresses the QBD at the pre-processing phase, which includes sub-processes like cleansing, integration, filtering, and normalization. We propose a QBD model incorporating processes to support Data quality profile selection and adaptation. In addition, it tracks and registers on a data provenance repository the effect of every data transformation happened in the pre-processing phase. We evaluate the data quality selection module using large EEG dataset. The obtained results illustrate the importance of addressing QBD at an early phase of Big Data processing lifecycle since it significantly save on costs and perform accurate data analysis.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015
EditorsLatifur Khan, Carminati Barbara
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages191-198
Number of pages8
ISBN (Electronic)9781467372787
DOIs
Publication statusPublished - Aug 17 2015
Event4th IEEE International Congress on Big Data, BigData Congress 2015 - New York City, United States
Duration: Jun 27 2015Jul 2 2015

Publication series

NameProceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015

Other

Other4th IEEE International Congress on Big Data, BigData Congress 2015
Country/TerritoryUnited States
CityNew York City
Period6/27/157/2/15

Keywords

  • Big Data
  • Data Quality
  • pre-processing

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Big Data Pre-processing: A Quality Framework'. Together they form a unique fingerprint.

Cite this