TY - GEN
T1 - Repairing of record linkage
T2 - 22nd International Conference on Extending Database Technology, EDBT 2019
AU - Bui-Nguyen, Quyen
AU - Wang, Qing
AU - Shao, Jingyu
AU - Vatsalan, Dinusha
N1 - Publisher Copyright:
© 2019 Copyright held by the owner/author(s).
PY - 2019
Y1 - 2019
N2 - Linking records from different data sources, referred to as record linkage, is a longstanding but not yet satisfactorily resolved question in many fields of science. For practitioners, it is difficult to ensure the quality of linkage at the time of applying linkage techniques in real world applications. Instead, linkage errors are often detected later on, mostly by users of the applications. This not only requires us to repair errors, but also provides us with opportunities to observe the linkage quality and uncover why such errors occur. In viewing that record linkage is a complex and evolving process, we study how to acquire insights from linkage errors for achieving high-quality linkage. We propose a generic repairing framework which allows us to start with imperfect linkage models, and dynamically repair linkage models and errors for improved linkage quality. We have evaluated our repairing framework over three real-world datasets and the experimental results show that the performance of the proposed tree-structured classifier SVM-tree outperforms the baseline methods.
AB - Linking records from different data sources, referred to as record linkage, is a longstanding but not yet satisfactorily resolved question in many fields of science. For practitioners, it is difficult to ensure the quality of linkage at the time of applying linkage techniques in real world applications. Instead, linkage errors are often detected later on, mostly by users of the applications. This not only requires us to repair errors, but also provides us with opportunities to observe the linkage quality and uncover why such errors occur. In viewing that record linkage is a complex and evolving process, we study how to acquire insights from linkage errors for achieving high-quality linkage. We propose a generic repairing framework which allows us to start with imperfect linkage models, and dynamically repair linkage models and errors for improved linkage quality. We have evaluated our repairing framework over three real-world datasets and the experimental results show that the performance of the proposed tree-structured classifier SVM-tree outperforms the baseline methods.
UR - http://www.scopus.com/inward/record.url?scp=85064940073&partnerID=8YFLogxK
U2 - 10.5441/002/edbt.2019.75
DO - 10.5441/002/edbt.2019.75
M3 - Conference contribution
T3 - Advances in Database Technology - EDBT
SP - 638
EP - 641
BT - Advances in Database Technology - EDBT 2019
A2 - Herschel, Melanie
A2 - Binnig, Carsten
A2 - Kaoudi, Zoi
A2 - Galhardas, Helena
A2 - Fundulaki, Irini
A2 - Reinwald, Berthold
PB - OpenProceedings.org
Y2 - 26 March 2019 through 29 March 2019
ER -