Progressive diversification for column-based data exploration platforms

Hina A. Khan, Mohamed A. Sharaf

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)


In Data Exploration platforms, diversification has become an essential method for extracting representative data, which provide users with a concise and meaningful view of the results to their queries. However, the benefits of diversification are achieved at the expense of an additional cost for the post-processing of query results. For high dimensional large result sets, the cost of diversification is further escalated due to massive distance computations required to evaluate the similarity between results. To address that challenge, in this paper we propose the Progressive Data Diversification (pDiverse) scheme. The main idea underlying pDiverse is to utilize partial distance computation to reduce the amount of processed data. Our extensive experimental results on both synthetic and real data sets show that our proposed scheme outperforms existing diversification methods in terms of both I/O and CPU costs.

Original languageEnglish
Title of host publication2015 IEEE 31st International Conference on Data Engineering, ICDE 2015
PublisherIEEE Computer Society
Number of pages12
ISBN (Electronic)9781479979639
Publication statusPublished - May 26 2015
Externally publishedYes
Event2015 31st IEEE International Conference on Data Engineering, ICDE 2015 - Seoul, Korea, Republic of
Duration: Apr 13 2015Apr 17 2015

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627


Conference2015 31st IEEE International Conference on Data Engineering, ICDE 2015
Country/TerritoryKorea, Republic of

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems


Dive into the research topics of 'Progressive diversification for column-based data exploration platforms'. Together they form a unique fingerprint.

Cite this