Skyblocking for entity resolution

Jingyu Shao*, Qing Wang, Yu Lin

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    8 Citations (Scopus)

    Abstract

    In this paper, we introduce a novel framework for entity resolution blocking, called skyblocking, which aims to learn scheme skylines. In this skyblocking framework, each blocking scheme is mapped as a point to a multi-dimensional scheme space where each blocking measure represents one dimension. A scheme skyline contains blocking schemes that are not dominated by any other blocking schemes in the scheme space. To efficiently learn scheme skylines, two challenges exist: one is the class imbalance problem and the other is the search space problem. We tackle these two challenges by developing an active sampling strategy and a scheme extension strategy. Based on these two strategies, we develop three scheme skyline learning algorithms for efficiently learning scheme skylines under a given number of blocking measures and within a label budget limit. We experimentally verify that our algorithms outperform the baseline approaches in all of the following aspects: label efficiency, blocking quality and learning efficiency, over five real-world datasets.

    Original languageEnglish
    Pages (from-to)30-43
    Number of pages14
    JournalInformation Systems
    Volume85
    DOIs
    Publication statusPublished - Nov 2019

    Fingerprint

    Dive into the research topics of 'Skyblocking for entity resolution'. Together they form a unique fingerprint.

    Cite this