Informativeness-Based Active Learning for Entity Resolution

Victor Christen*, Peter Christen, Erhard Rahm

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    7 Citations (Scopus)

    Abstract

    Entity Resolution is a crucial task to integrate data from different sources to identify records that represent the same entity. Entity resolution commonly employs supervised learning techniques based on training data of matching and non-matching pairs of records and their attribute similarities as represented by similarity vectors. To reduce the amount of manual labelling to generate suitable training data, we propose a novel active learning approach that does not require any prior knowledge about true matches and that is independent of the learning method used. Our approach successively identifies new training examples based on an informativeness measure for similarity vectors by considering their relationship to already classified vectors and the uncertainty in the similarity vector space covered by the current training set. Experiments on several data sets show that even for a small labelling effort our approach achieves comparable results to fully supervised approaches and it can outperform previous active learning approaches for entity resolution.

    Original languageEnglish
    Title of host publicationMachine Learning and Knowledge Discovery in Databases - International Workshops of ECML PKDD 2019, Proceedings
    EditorsPeggy Cellier, Kurt Driessens
    PublisherSpringer
    Pages125-141
    Number of pages17
    ISBN (Print)9783030438869
    DOIs
    Publication statusPublished - 2020
    Event19th Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2019 - Wurzburg, Germany
    Duration: 16 Sept 201920 Sept 2019

    Publication series

    NameCommunications in Computer and Information Science
    Volume1168 CCIS
    ISSN (Print)1865-0929
    ISSN (Electronic)1865-0937

    Conference

    Conference19th Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2019
    Country/TerritoryGermany
    CityWurzburg
    Period16/09/1920/09/19

    Fingerprint

    Dive into the research topics of 'Informativeness-Based Active Learning for Entity Resolution'. Together they form a unique fingerprint.

    Cite this