Data Locality-Aware Big Data Query Evaluation in Distributed Clouds

Qiufen Xia, Weifa Liang*, Zichuan Xu

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    9 Citations (Scopus)

    Abstract

    With more and more businesses and organizations outsourcing their IT services to distributed clouds for cost savings, historical and operational data generated by the services have been growing exponentially. The generated data that are referred to as big data, stored at different geographic datacenters, now become an invaluable asset to these businesses and organizations, as they can make use of the data through analysis to identify business advantages and make strategic decisions. Big data analytics thus has been emerged as a main research topic in cloud computing. To efficiently evaluate a big data analytic query in a distributed cloud consisting of multiple datacenters at different geographic locations interconnected by the Internet, it poses great challenges: (i) the source data of the query typically are located at different datacenters; and (ii) the resource demands of the query may be beyond the supplies of any single datacenter at that moment. In this paper, we formulate an online query evaluation problem for big data analytic queries in distributed clouds, with an objective to maximize the query acceptance ratio while minimizing the accumulative query evaluation cost, for which we first propose a novel metric to model the usages of different resources in the distributed cloud, by incorporating the capacities and workloads of different datacenters and links, as well as resource demands of different queries. We then devise efficient online algorithms for query evaluations under both unsplittable and splittable source data assumptions. We finally conduct extensive experiments by simulations to evaluate the performance of the proposed algorithms. Experimental results demonstrate that the proposed algorithms are promising, and outperform other heuristics at 95% confidence intervals.

    Original languageEnglish
    Pages (from-to)791-809
    Number of pages19
    JournalComputer Journal
    Volume60
    Issue number6
    DOIs
    Publication statusPublished - 1 Jun 2017

    Fingerprint

    Dive into the research topics of 'Data Locality-Aware Big Data Query Evaluation in Distributed Clouds'. Together they form a unique fingerprint.

    Cite this