TY - GEN
T1 - QoS-aware data replications and placements for query evaluation of big data analytics
AU - Xia, Qiufen
AU - Liang, Weifa
AU - Xu, Zichuan
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/28
Y1 - 2017/7/28
N2 - Enterprise users at different geographic locations generate large-volume data and store their data at different geographic datacenters. These users may also issue ad hoc queries of big data analytics on the stored data to identify valuable information in order to help them make strategic decisions. However, it is well known that querying such large-volume big data usually is time-consuming and costly. Sometimes, users are only interested in timely approximate rather than exact query results. When this approximation is the case, applications must sacrifice either timeliness or accuracy by allowing either the latency of delivering more accurate results or the accuracy error of delivered results based on the samples of the data, rather than the entire set of data itself. In this paper, we study the QoS-aware data replications and placements for approximate query evaluation of big data analytics in a distributed cloud, where the original (source) data of a query is distributed at different geo-distributed datacenters. We focus on placing the samples of the source data with certain error bounds at some strategic datacenters to meet users' stringent query response time. We propose an efficient algorithm for evaluating a set of big data analytic queries with the aim to minimize the evaluation cost of the queries while meeting their response time requirements. We demonstrate the effectiveness of the proposed algorithm through experimental simulations. Experimental results show that the proposed algorithm is promising.
AB - Enterprise users at different geographic locations generate large-volume data and store their data at different geographic datacenters. These users may also issue ad hoc queries of big data analytics on the stored data to identify valuable information in order to help them make strategic decisions. However, it is well known that querying such large-volume big data usually is time-consuming and costly. Sometimes, users are only interested in timely approximate rather than exact query results. When this approximation is the case, applications must sacrifice either timeliness or accuracy by allowing either the latency of delivering more accurate results or the accuracy error of delivered results based on the samples of the data, rather than the entire set of data itself. In this paper, we study the QoS-aware data replications and placements for approximate query evaluation of big data analytics in a distributed cloud, where the original (source) data of a query is distributed at different geo-distributed datacenters. We focus on placing the samples of the source data with certain error bounds at some strategic datacenters to meet users' stringent query response time. We propose an efficient algorithm for evaluating a set of big data analytic queries with the aim to minimize the evaluation cost of the queries while meeting their response time requirements. We demonstrate the effectiveness of the proposed algorithm through experimental simulations. Experimental results show that the proposed algorithm is promising.
UR - http://www.scopus.com/inward/record.url?scp=85028305537&partnerID=8YFLogxK
U2 - 10.1109/ICC.2017.7997238
DO - 10.1109/ICC.2017.7997238
M3 - Conference contribution
T3 - IEEE International Conference on Communications
BT - 2017 IEEE International Conference on Communications, ICC 2017
A2 - Debbah, Merouane
A2 - Gesbert, David
A2 - Mellouk, Abdelhamid
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 IEEE International Conference on Communications, ICC 2017
Y2 - 21 May 2017 through 25 May 2017
ER -