TY - GEN
T1 - REQUEST
T2 - 4th IEEE International Conference on Big Data, Big Data 2016
AU - Ge, Xiaoyu
AU - Xue, Yanbing
AU - Luo, Zhipeng
AU - Sharaf, Mohamed A.
AU - Chrysanthis, Panos K.
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016
Y1 - 2016
N2 - Exploration over large datasets is a key first step in data analysis, as users may be unfamiliar with the underlying database schema and unable to construct precise queries that represent their interests. Such data exploration task usually involves executing numerous ad-hoc queries, which requires a considerable amount of time and human effort. In this paper, we present REQUEST, a novel framework that is designed to minimize the human effort and enable both effective and efficient data exploration. REQUEST supports the query-from-examples style of data exploration by integrating two key components: 1) Data Reduction, and 2) Query Selection. As instances of the REQUEST framework, we propose several highly scalable schemes, which employ active learning techniques and provide different levels of efficiency and effectiveness as guided by the user's preferences. Our results, on real-world datasets from Sloan Digital Sky Survey, show that our schemes on average require 1-2 orders of magnitude fewer feedback questions than the random baseline, and 3-16× fewer questions than the state-of-the-art, while maintaining interactive response time. Moreover, our schemes are able to construct, with high accuracy, queries that are often undetectable by current techniques.
AB - Exploration over large datasets is a key first step in data analysis, as users may be unfamiliar with the underlying database schema and unable to construct precise queries that represent their interests. Such data exploration task usually involves executing numerous ad-hoc queries, which requires a considerable amount of time and human effort. In this paper, we present REQUEST, a novel framework that is designed to minimize the human effort and enable both effective and efficient data exploration. REQUEST supports the query-from-examples style of data exploration by integrating two key components: 1) Data Reduction, and 2) Query Selection. As instances of the REQUEST framework, we propose several highly scalable schemes, which employ active learning techniques and provide different levels of efficiency and effectiveness as guided by the user's preferences. Our results, on real-world datasets from Sloan Digital Sky Survey, show that our schemes on average require 1-2 orders of magnitude fewer feedback questions than the random baseline, and 3-16× fewer questions than the state-of-the-art, while maintaining interactive response time. Moreover, our schemes are able to construct, with high accuracy, queries that are often undetectable by current techniques.
UR - http://www.scopus.com/inward/record.url?scp=85015249674&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85015249674&partnerID=8YFLogxK
U2 - 10.1109/BigData.2016.7840657
DO - 10.1109/BigData.2016.7840657
M3 - Conference contribution
AN - SCOPUS:85015249674
T3 - Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
SP - 646
EP - 655
BT - Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
A2 - Ak, Ronay
A2 - Karypis, George
A2 - Xia, Yinglong
A2 - Hu, Xiaohua Tony
A2 - Yu, Philip S.
A2 - Joshi, James
A2 - Ungar, Lyle
A2 - Liu, Ling
A2 - Sato, Aki-Hiro
A2 - Suzumura, Toyotaro
A2 - Rachuri, Sudarsan
A2 - Govindaraju, Rama
A2 - Xu, Weijia
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 5 December 2016 through 8 December 2016
ER -