TY - GEN
T1 - Progressive diversification for column-based data exploration platforms
AU - Khan, Hina A.
AU - Sharaf, Mohamed A.
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/5/26
Y1 - 2015/5/26
N2 - In Data Exploration platforms, diversification has become an essential method for extracting representative data, which provide users with a concise and meaningful view of the results to their queries. However, the benefits of diversification are achieved at the expense of an additional cost for the post-processing of query results. For high dimensional large result sets, the cost of diversification is further escalated due to massive distance computations required to evaluate the similarity between results. To address that challenge, in this paper we propose the Progressive Data Diversification (pDiverse) scheme. The main idea underlying pDiverse is to utilize partial distance computation to reduce the amount of processed data. Our extensive experimental results on both synthetic and real data sets show that our proposed scheme outperforms existing diversification methods in terms of both I/O and CPU costs.
AB - In Data Exploration platforms, diversification has become an essential method for extracting representative data, which provide users with a concise and meaningful view of the results to their queries. However, the benefits of diversification are achieved at the expense of an additional cost for the post-processing of query results. For high dimensional large result sets, the cost of diversification is further escalated due to massive distance computations required to evaluate the similarity between results. To address that challenge, in this paper we propose the Progressive Data Diversification (pDiverse) scheme. The main idea underlying pDiverse is to utilize partial distance computation to reduce the amount of processed data. Our extensive experimental results on both synthetic and real data sets show that our proposed scheme outperforms existing diversification methods in terms of both I/O and CPU costs.
UR - http://www.scopus.com/inward/record.url?scp=84940841921&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84940841921&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2015.7113295
DO - 10.1109/ICDE.2015.7113295
M3 - Conference contribution
AN - SCOPUS:84940841921
T3 - Proceedings - International Conference on Data Engineering
SP - 327
EP - 338
BT - 2015 IEEE 31st International Conference on Data Engineering, ICDE 2015
PB - IEEE Computer Society
T2 - 2015 31st IEEE International Conference on Data Engineering, ICDE 2015
Y2 - 13 April 2015 through 17 April 2015
ER -