Improving temporal record linkage using regression classification

Yichen Hu*, Qing Wang, Dinusha Vatsalan, Peter Christen

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    12 Citations (Scopus)

    Abstract

    Temporal record linkage is the process of identifying groups of records that are collected over a period of time, such as in census or voter registration databases, where records in the same group represent the same real-world entity. Such databases often contain temporal information, such as the time when a record was created or when it was modified. Unlike traditional record linkage, which considers differences between records from the same entity as errors or variations, temporal record linkage aims to capture records from entities where the attribute values are known to change over time. In this paper we propose a novel approach that extends an existing temporal approach called decay model, to categorically calculate probabilities of change for each attribute. Our novel method uses a regression-based machine learning model to predict decays for sets of attributes. Each such set of attributes has a principle attribute and support attributes, where values of the support attributes can affect the decay of the principle attribute. Our experimental results on a real US voter database show that our proposed approach results in better linkage quality compared to the decay model approach.

    Original languageEnglish
    Title of host publicationAdvances in Knowledge Discovery and Data Mining - 21st Pacific-Asia Conference, PAKDD 2017, Proceedings
    EditorsKyuseok Shim, Jae-Gil Lee, Longbing Cao, Xuemin Lin, Jinho Kim, Yang-Sae Moon
    PublisherSpringer Verlag
    Pages561-573
    Number of pages13
    ISBN (Print)9783319574530
    DOIs
    Publication statusPublished - 2017
    Event21st Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2017 - Jeju, Korea, Republic of
    Duration: 23 May 201726 May 2017

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume10234 LNAI
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference21st Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2017
    Country/TerritoryKorea, Republic of
    CityJeju
    Period23/05/1726/05/17

    Fingerprint

    Dive into the research topics of 'Improving temporal record linkage using regression classification'. Together they form a unique fingerprint.

    Cite this