Unsupervised Measuring of Entity Resolution Consistency

Jeffrey Fisher, Qing Wang

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    3 Citations (Scopus)

    Abstract

    Entity resolution (ER) is a common data cleaning and data-integration task that aims to determine which records in one or more data sets refer to the same real-world entities. In most cases no training data exists and the ER process involves considerable trial and error, with an often time-consuming manual evaluation required to determine whether the obtained results are good enough. We propose a method that makes use of transitive closure within triples of records to provide an early indication of inconsistency in an ER result in an unsupervised fashion. We test our approach on three real-world data sets with different similarity calculations and blocking approaches and show that our approach can detect problems with ER resultsearly on without a manual evaluation.

    Original languageEnglish
    Title of host publicationProceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015
    EditorsXindong Wu, Alexander Tuzhilin, Hui Xiong, Jennifer G. Dy, Charu Aggarwal, Zhi-Hua Zhou, Peng Cui
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages218-221
    Number of pages4
    ISBN (Electronic)9781467384926
    DOIs
    Publication statusPublished - 29 Jan 2016
    Event15th IEEE International Conference on Data Mining Workshop, ICDMW 2015 - Atlantic City, United States
    Duration: 14 Nov 201517 Nov 2015

    Publication series

    NameProceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015

    Conference

    Conference15th IEEE International Conference on Data Mining Workshop, ICDMW 2015
    Country/TerritoryUnited States
    CityAtlantic City
    Period14/11/1517/11/15

    Fingerprint

    Dive into the research topics of 'Unsupervised Measuring of Entity Resolution Consistency'. Together they form a unique fingerprint.

    Cite this