Active blocking scheme learning for entity resolution

Jingyu Shao, Qing Wang*

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    3 Citations (Scopus)

    Abstract

    Blocking is an important part of entity resolution. It aims to improve time efficiency by grouping potentially matched records into the same block. In the past, both supervised and unsupervised approaches have been proposed. Nonetheless, existing approaches have some limitations: either a large amount of labels are required or blocking quality is hard to be guaranteed. To address these issues, we propose a blocking scheme learning approach based on active learning techniques. With a limited label budget, our approach can learn a blocking scheme to generate high quality blocks. Two strategies called active sampling and active branching are proposed to select samples and generate blocking schemes efficiently. We experimentally verify that our approach outperforms several baseline approaches over four real-world datasets.

    Original languageEnglish
    Title of host publicationAdvances in Knowledge Discovery and Data Mining - 22nd Pacific-Asia Conference, PAKDD 2018, Proceedings
    EditorsBao Ho, Dinh Phung, Geoffrey I. Webb, Vincent S. Tseng, Mohadeseh Ganji, Lida Rashidi
    PublisherSpringer Verlag
    Pages350-362
    Number of pages13
    ISBN (Print)9783319930367
    DOIs
    Publication statusPublished - 2018
    Event22nd Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2018 - Melbourne, Australia
    Duration: 3 Jun 20186 Jun 2018

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume10938 LNAI
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference22nd Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2018
    Country/TerritoryAustralia
    CityMelbourne
    Period3/06/186/06/18

    Fingerprint

    Dive into the research topics of 'Active blocking scheme learning for entity resolution'. Together they form a unique fingerprint.

    Cite this