Abstract
With more and more businesses and organizations outsourcing their IT services to distributed clouds for cost savings, historical and operational data generated by the services have been growing exponentially. The generated data that are referred to as big data, stored at different geographic datacenters, now become an invaluable asset to these businesses and organizations, as they can make use of the data through analysis to identify business advantages and make strategic decisions. Big data analytics thus has been emerged as a main research topic in cloud computing. To efficiently evaluate a big data analytic query in a distributed cloud consisting of multiple datacenters at different geographic locations interconnected by the Internet, it poses great challenges: (i) the source data of the query typically are located at different datacenters; and (ii) the resource demands of the query may be beyond the supplies of any single datacenter at that moment. In this paper, we formulate an online query evaluation problem for big data analytic queries in distributed clouds, with an objective to maximize the query acceptance ratio while minimizing the accumulative query evaluation cost, for which we first propose a novel metric to model the usages of different resources in the distributed cloud, by incorporating the capacities and workloads of different datacenters and links, as well as resource demands of different queries. We then devise efficient online algorithms for query evaluations under both unsplittable and splittable source data assumptions. We finally conduct extensive experiments by simulations to evaluate the performance of the proposed algorithms. Experimental results demonstrate that the proposed algorithms are promising, and outperform other heuristics at 95% confidence intervals.
Original language | English |
---|---|
Pages (from-to) | 791-809 |
Number of pages | 19 |
Journal | Computer Journal |
Volume | 60 |
Issue number | 6 |
DOIs | |
Publication status | Published - 1 Jun 2017 |