Efficient Data Placement and Replication for QoS-Aware Approximate Query Evaluation of Big Data Analytics

Qiufen Xia, Zichuan Xu*, Weifa Liang, Shui Yu, Song Guo, Albert Y. Zomaya

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    21 Citations (Scopus)

    Abstract

    Enterprise users at different geographic locations generate large-volume data that is stored at different geographic datacenters. These users may also perform big data analytics on the stored data to identify valuable information in order to make strategic decisions. However, it is well known that performing big data analytics on data in geographical-located datacenters usually is time-consuming and costly. In some delay-sensitive applications, the query result may become useless if answering a query takes too long time. Instead, sometimes users may only be interested in timely approximate rather than exact query results. When such approximate query evaluation is the case, applications must sacrifice timeliness to get more accurate evaluation results or tolerate evaluation result with a guaranteed error bound obtained from analyzing the samples of the data to meet their stringent timeline. In this paper, we study quality-of-service (QoS)-aware data replication and placement for approximate query evaluation of big data analytics in a distributed cloud, where the original (source) data of a query is distributed at different geo-distributed datacenters. We focus on the problems of placing data samples of the source data at some strategic datacenters to meet stringent query delay requirements of users, by exploring a non-trivial trade-off between the cost of query evaluation and the error bound of the evaluation result. We first propose an approximation algorithm with a provable approximation ratio for a single approximate query. We then develop an efficient heuristic algorithm for evaluating a set of approximate queries with the aim to minimize the evaluation cost while meeting the delay requirements of these queries. We finally demonstrate the effectiveness and efficiency of the proposed algorithms through both experimental simulations and implementations in a real test-bed, real datasets are employed. Experimental results show that the proposed algorithms are promising.

    Original languageEnglish
    Article number8732398
    Pages (from-to)2677-2691
    Number of pages15
    JournalIEEE Transactions on Parallel and Distributed Systems
    Volume30
    Issue number12
    DOIs
    Publication statusPublished - 1 Dec 2019

    Fingerprint

    Dive into the research topics of 'Efficient Data Placement and Replication for QoS-Aware Approximate Query Evaluation of Big Data Analytics'. Together they form a unique fingerprint.

    Cite this