TY - JOUR

T1 - Query-by-Sketch

T2 - 2021 International Conference on Management of Data, SIGMOD 2021

AU - Wang, Ye

AU - Wang, Qing

AU - Koehler, Henning

AU - Lin, Yu

N1 - Publisher Copyright:
© 2021 ACM.

PY - 2021

Y1 - 2021

N2 - Computing shortest paths is a fundamental operation in processing graph data. In many real-world applications, discovering shortest paths between two vertices empowers us to make full use of the underlying structure to understand how vertices are related in a graph, e.g. the strength of social ties between individuals in a social network. In this paper, we study the shortest-path-graph problem that aims to efficiently compute a shortest path graph containing exactly all shortest paths between any arbitrary pair of vertices on complex networks. Our goal is to design an exact solution that can scale to graphs with millions or billions of vertices and edges. To achieve high scalability, we propose a novel method, Query-by-Sketch (QbS), which efficiently leverages offline labelling (i.e., precomputed labels) to guide online searching through a fast sketching process that summarizes the important structural aspects of shortest paths in answering shortest-path-graph queries. We theoretically prove the correctness of this method and analyze its computational complexity. To empirically verify the efficiency of QbS, we conduct experiments on 12 real-world datasets, among which the largest dataset has 1.7 billion vertices and 7.8 billion edges. The experimental results show that QbS can answer shortest-path-graph queries in microseconds for million-scale graphs and less than half a second for billion-scale graphs.

AB - Computing shortest paths is a fundamental operation in processing graph data. In many real-world applications, discovering shortest paths between two vertices empowers us to make full use of the underlying structure to understand how vertices are related in a graph, e.g. the strength of social ties between individuals in a social network. In this paper, we study the shortest-path-graph problem that aims to efficiently compute a shortest path graph containing exactly all shortest paths between any arbitrary pair of vertices on complex networks. Our goal is to design an exact solution that can scale to graphs with millions or billions of vertices and edges. To achieve high scalability, we propose a novel method, Query-by-Sketch (QbS), which efficiently leverages offline labelling (i.e., precomputed labels) to guide online searching through a fast sketching process that summarizes the important structural aspects of shortest paths in answering shortest-path-graph queries. We theoretically prove the correctness of this method and analyze its computational complexity. To empirically verify the efficiency of QbS, we conduct experiments on 12 real-world datasets, among which the largest dataset has 1.7 billion vertices and 7.8 billion edges. The experimental results show that QbS can answer shortest-path-graph queries in microseconds for million-scale graphs and less than half a second for billion-scale graphs.

KW - 2-hop cover

KW - algorithms

KW - breadth-first search

KW - distance labelling

KW - graph sketch

KW - graphs

KW - pruned landmark labelling

KW - shortest paths

UR - http://www.scopus.com/inward/record.url?scp=85108960732&partnerID=8YFLogxK

U2 - 10.1145/3448016.3452826

DO - 10.1145/3448016.3452826

M3 - Conference article

AN - SCOPUS:85108960732

SN - 0730-8078

SP - 1946

EP - 1958

JO - Proceedings of the ACM SIGMOD International Conference on Management of Data

JF - Proceedings of the ACM SIGMOD International Conference on Management of Data

Y2 - 20 June 2021 through 25 June 2021

ER -