TY - GEN
T1 - Semantic-aware blocking for entity resolution
AU - Wang, Qing
AU - Cui, Mingyuan
AU - Liang, Huizhi
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/6/22
Y1 - 2016/6/22
N2 - In this work we propose a semantic-aware blocking framework for entity resolution (ER). The proposed framework is built using locality-sensitive hashing (LSH) techniques to efficiently unify both textual and semantic features into an ER blocking process. In order to understand how similarity metrics may affect the effectiveness of ER blocking we study the robustness of similarity metrics and their properties in terms of LSH families. We further discuss how the semantic similarity of records can be captured, measured, and integrated with LSH techniques over multiple similarity spaces. We have evaluated our proposed framework over two real-world data sets, and compared it with the state-of-the-art blocking techniques. The experimental study shows that using a combination of semantic features and textual features can considerably improve the quality of blocking. Due to the probabilistic nature of LSH, this semantic-aware blocking framework also enables us to build fast and reliable blocking for performing entity resolution tasks in a large-scale data environment.
AB - In this work we propose a semantic-aware blocking framework for entity resolution (ER). The proposed framework is built using locality-sensitive hashing (LSH) techniques to efficiently unify both textual and semantic features into an ER blocking process. In order to understand how similarity metrics may affect the effectiveness of ER blocking we study the robustness of similarity metrics and their properties in terms of LSH families. We further discuss how the semantic similarity of records can be captured, measured, and integrated with LSH techniques over multiple similarity spaces. We have evaluated our proposed framework over two real-world data sets, and compared it with the state-of-the-art blocking techniques. The experimental study shows that using a combination of semantic features and textual features can considerably improve the quality of blocking. Due to the probabilistic nature of LSH, this semantic-aware blocking framework also enables us to build fast and reliable blocking for performing entity resolution tasks in a large-scale data environment.
UR - http://www.scopus.com/inward/record.url?scp=84980372020&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2016.7498378
DO - 10.1109/ICDE.2016.7498378
M3 - Conference contribution
AN - SCOPUS:84980372020
T3 - 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016
SP - 1468
EP - 1469
BT - 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 32nd IEEE International Conference on Data Engineering, ICDE 2016
Y2 - 16 May 2016 through 20 May 2016
ER -