Abstract
The issue of data quality is increasingly important as individuals as well as corporations are relying on multiple, often external sources of data to make decisions. Traditional query systems do not factor in data quality considerations in their response. Further, studies into the diverse interpretations of data quality indicate that fitness for use is a fundamental criterion in the evaluation of data quality. In this paper we address the issue of data quality aware query systems by developing a query answering framework that considers user data quality preferences over a collaborative information systems architecture. Our work is motivated by an extensive study of data quality literature that revealed a lack of holistic solutions that encompass both business and technological aspects of data quality management. Accordingly the developed framework for data quality aware query systems takes an end-to-end view of the problem. In this paper we have focused on three major aspects relating to quality aware query systems, namely measuring data quality, modeling of user's data quality preferences, and answering the query in consideration of the defined preferences and measures. We then address each of these issues by introducing data quality profiling, data quality aware SQL, and data quality aware query answering methods. Contributions of this paper have been evaluated on real and simulated data. The individual components have also been assembled into a running prototype.
Original language | English |
---|---|
Pages (from-to) | 24-44 |
Number of pages | 21 |
Journal | Information Systems |
Volume | 46 |
DOIs | |
Publication status | Published - Nov 2014 |
Externally published | Yes |
Keywords
- Data profiling
- Data quality
- Query systems
- User preference modeling
ASJC Scopus subject areas
- Software
- Information Systems
- Hardware and Architecture