Scalable block scheduling for efficient multi-database record linkage

Thilina Ranbaduge, Dinusha Vatsalan, Peter Christen

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    7 Citations (Scopus)

    Abstract

    Record linkage (RL) is a task in data integration that aims to identify matching records that refer to the same entity from different databases. When records from more than two databases are to be linked RL is significantly challenged by the intrinsic exponential growth in the number of potential record comparisons to be conducted.We propose a scalable metablocking protocol to be used for Multi-Database RL (MDRL) to significantly reduce the complexity of the matching (comparison and classification) phase. Our approach uses a graph structure to schedule the comparison of pairs of blocks with the aim of minimizing the number of repeated and superfluous comparisons between records. We provide an analysis of our approach and conduct an empirical study on large real-world databases.

    Original languageEnglish
    Title of host publicationProceedings - 16th IEEE International Conference on Data Mining, ICDM 2016
    EditorsFrancesco Bonchi, Josep Domingo-Ferrer, Ricardo Baeza-Yates, Zhi-Hua Zhou, Xindong Wu
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages1161-1166
    Number of pages6
    ISBN (Electronic)9781509054725
    DOIs
    Publication statusPublished - 2 Jul 2016
    Event16th IEEE International Conference on Data Mining, ICDM 2016 - Barcelona, Catalonia, Spain
    Duration: 12 Dec 201615 Dec 2016

    Publication series

    NameProceedings - IEEE International Conference on Data Mining, ICDM
    Volume0
    ISSN (Print)1550-4786

    Conference

    Conference16th IEEE International Conference on Data Mining, ICDM 2016
    Country/TerritorySpain
    CityBarcelona, Catalonia
    Period12/12/1615/12/16

    Fingerprint

    Dive into the research topics of 'Scalable block scheduling for efficient multi-database record linkage'. Together they form a unique fingerprint.

    Cite this