Blind data linkage using n-gram similarity comparisons

Tim Churches, Peter Christen*

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    17 Citations (Scopus)

    Abstract

    Integrating or linking data from different sources is an increasingly important task in the preprocessing stage of many data mining projects. The aim of such linkages is to merge all records relating to the same entity, such as a patient or a customer. If no common unique entity identifiers (keys) are available in all data sources, the linkage needs to be performed using the available identifying attributes, like names and addresses. Data confidentiality often limits or even prohibits successful data linkage, as either no consent can be gained (for example in biomedical studies) or the data holders are not willing to release their data for linkage by other parties. We present methods for confidential data linkage based on hash encoding, public key encryption and n-gram similarity comparison techniques, and show how blind data linkage can be performed.

    Original languageEnglish
    Title of host publicationAdvances in Knowledge Discovery and Data Mining - 8th Pacific-Asia Conference, PAKDD 2004, Proceedings
    EditorsHonghua Dai, Ramakrishnan Srikant, Chengqi Zhang
    PublisherSpringer Verlag
    Pages121-126
    Number of pages6
    ISBN (Print)354022064X, 9783540220640
    DOIs
    Publication statusPublished - 2004
    Event8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2004 - Sydney, Australia
    Duration: 26 May 200428 May 2004

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume3056
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2004
    Country/TerritoryAustralia
    CitySydney
    Period26/05/0428/05/04

    Fingerprint

    Dive into the research topics of 'Blind data linkage using n-gram similarity comparisons'. Together they form a unique fingerprint.

    Cite this