TY - GEN
T1 - Big Data Quality Assessment Model for Unstructured Data
AU - Taleb, Ikbal
AU - Serhani, Mohamed Adel
AU - Dssouli, Rachida
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2019/1/8
Y1 - 2019/1/8
N2 - Big Data has gained an enormous momentum the past few years because of the tremendous volume of generated and processed Data from diverse application domains. Nowadays, it is estimated that 80% of all the generated data is unstructured. Evaluating the quality of Big data has been identified to be essential to guarantee data quality dimensions including for example completeness, and accuracy. Current initiatives for unstructured data quality evaluation are still under investigations. In this paper, we propose a quality evaluation model to handle quality of Unstructured Big Data (UBD). The later captures and discover first key properties of unstructured big data and its characteristics, provides some comprehensive mechanisms to sample, profile the UBD dataset and extract features and characteristics from heterogeneous data types in different formats. A Data Quality repository manage relationships between Data quality dimensions, quality Metrics, features extraction methods, mining methodologies, data types and data domains. An analysis of the samples provides a data profile of UBD. This profile is extended to a quality profile that contains the quality mapping with selected features for quality assessment. We developed an UBD quality assessment model that handles all the processes from the UBD profiling exploration to the Quality report. The model provides an initial blueprint for quality estimation of unstructured Big data. It also, states a set of quality characteristics and indicators that can be used to outline an initial data quality schema of UBD.
AB - Big Data has gained an enormous momentum the past few years because of the tremendous volume of generated and processed Data from diverse application domains. Nowadays, it is estimated that 80% of all the generated data is unstructured. Evaluating the quality of Big data has been identified to be essential to guarantee data quality dimensions including for example completeness, and accuracy. Current initiatives for unstructured data quality evaluation are still under investigations. In this paper, we propose a quality evaluation model to handle quality of Unstructured Big Data (UBD). The later captures and discover first key properties of unstructured big data and its characteristics, provides some comprehensive mechanisms to sample, profile the UBD dataset and extract features and characteristics from heterogeneous data types in different formats. A Data Quality repository manage relationships between Data quality dimensions, quality Metrics, features extraction methods, mining methodologies, data types and data domains. An analysis of the samples provides a data profile of UBD. This profile is extended to a quality profile that contains the quality mapping with selected features for quality assessment. We developed an UBD quality assessment model that handles all the processes from the UBD profiling exploration to the Quality report. The model provides an initial blueprint for quality estimation of unstructured Big data. It also, states a set of quality characteristics and indicators that can be used to outline an initial data quality schema of UBD.
KW - Big Data
KW - Data Quality
KW - Quality of Unstructured Big Data
KW - Unstructured Data
UR - http://www.scopus.com/inward/record.url?scp=85062423308&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062423308&partnerID=8YFLogxK
U2 - 10.1109/INNOVATIONS.2018.8605945
DO - 10.1109/INNOVATIONS.2018.8605945
M3 - Conference contribution
AN - SCOPUS:85062423308
T3 - Proceedings of the 2018 13th International Conference on Innovations in Information Technology, IIT 2018
SP - 69
EP - 74
BT - Proceedings of the 2018 13th International Conference on Innovations in Information Technology, IIT 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 13th International Conference on Innovations in Information Technology, IIT 2018
Y2 - 18 November 2018 through 19 November 2018
ER -