A clustering-based framework for incrementally repairing entity resolution

Qing Wang*, Jingyi Gao, Peter Christen

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    3 Citations (Scopus)

    Abstract

    Although entity resolution (ER) is known to be an important problem that has wide-spread applications in many areas, including e-commerce, health-care, social science, and crime and fraud detection, one aspect that has largely been neglected is to monitor the quality of entity resolution and repair erroneous matching decisions over time. In this paper we develop an efficient method for incrementally repairing ER, i.e., fix detected erroneous matches and non-matches. Our method is based on an efficient clustering algorithm that eliminates inconsistencies among matching decisions, and an efficient provenance indexing data structure that allows us to trace the evidence of clustering for supporting ER repairing. We have evaluated our method over real-world databases, and our experimental results show that the quality of entity resolution can be significantly improved through repairing over time.

    Original languageEnglish
    Title of host publicationAdvances in Knowledge Discovery and Data Mining - 20th Pacific-Asia Conference, PAKDD 2016, Proceedings
    EditorsJames Bailey, Latifur Khan, Takashi Washio, Gillian Dobbie, Joshua Zhexue Huang, Ruili Wang
    PublisherSpringer Verlag
    Pages283-295
    Number of pages13
    ISBN (Print)9783319317496
    DOIs
    Publication statusPublished - 2016
    Event20th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2016 - Auckland, New Zealand
    Duration: 19 Apr 201622 Apr 2016

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume9652 LNAI
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference20th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2016
    Country/TerritoryNew Zealand
    CityAuckland
    Period19/04/1622/04/16

    Fingerprint

    Dive into the research topics of 'A clustering-based framework for incrementally repairing entity resolution'. Together they form a unique fingerprint.

    Cite this