Provenance-aware entity resolution: Leveraging provenance to improve quality

Qing Wang*, Klaus Dieter Schewe, Woods Wang

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    5 Citations (Scopus)

    Abstract

    Entity resolution (ER) - the process of identifying records that refer to the same real-world entity - pervasively exists in many application areas. Nevertheless, resolving entities is hardly ever completely accurate. In this paper, we investigate a provenance-aware framework for ER. We first propose an indexing structure that can be efficiently built for provenance storage in support of an ER process. Then a generic repairing strategy, called coordinate-split-merge (CSM), is developed to control the interaction between repairs driven by must-link and cannot link constraints. Our experimental results show that the proposed indexing structure is efficient for capturing the provenance of ER both in time and space, which is also linearly scalable over the number of matches. Our repairing algorithms can significantly reduce human efforts in leveraging the provenance of ER for identifying erroneous matches.

    Original languageEnglish
    Title of host publicationDatabase Systems for Advanced Applications - 20th International Conference, DASFAA 2015, Proceedings Hanoi, Vietnam, April 20-23, 2015 Proceedings, Part I
    EditorsCyrus Shahabi, Muhammad Aamir Cheema, Matthias Renz, Xiaofang Zhou
    PublisherSpringer Verlag
    Pages474-490
    Number of pages17
    ISBN (Print)9783319181196
    DOIs
    Publication statusPublished - 2015
    Event20th International Conference on Database Systems for Advanced Applications, DASFAA 2015 - Hanoi, Viet Nam
    Duration: 20 Apr 201523 Apr 2015

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume9049
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference20th International Conference on Database Systems for Advanced Applications, DASFAA 2015
    Country/TerritoryViet Nam
    CityHanoi
    Period20/04/1523/04/15

    Fingerprint

    Dive into the research topics of 'Provenance-aware entity resolution: Leveraging provenance to improve quality'. Together they form a unique fingerprint.

    Cite this