Multiple instance learning for group record linkage

Zhichun Fu*, Jun Zhou, Peter Christen, Mac Boot

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    12 Citations (Scopus)

    Abstract

    Record linkage is the process of identifying records that refer to the same entities from different data sources. While most research efforts are concerned with linking individual records, new approaches have recently been proposed to link groups of records across databases. Group record linkage aims to determine if two groups of records in two databases refer to the same entity or not. One application where group record linkage is of high importance is the linking of census data that contain household information across time. In this paper we propose a novel method to group record linkage based on multiple instance learning. Our method treats group links as bags and individual record links as instances. We extend multiple instance learning from bag to instance classification to reconstruct bags from candidate instances. The classified bag and instance samples lead to a significant reduction in multiple group links, thereby improving the overall quality of linked data. We evaluate our method with both synthetic data and real historical census data.

    Original languageEnglish
    Title of host publicationAdvances in Knowledge Discovery and Data Mining - 16th Pacific-Asia Conference, PAKDD 2012, Proceedings
    Pages171-182
    Number of pages12
    EditionPART 1
    DOIs
    Publication statusPublished - 2012
    Event16th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2012 - Kuala Lumpur, Malaysia
    Duration: 29 May 20121 Jun 2012

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    NumberPART 1
    Volume7301 LNAI
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference16th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2012
    Country/TerritoryMalaysia
    CityKuala Lumpur
    Period29/05/121/06/12

    Fingerprint

    Dive into the research topics of 'Multiple instance learning for group record linkage'. Together they form a unique fingerprint.

    Cite this