An Efficient Two-Party Protocol for Approximate Matching in Private Record Linkage

Dinusha Vatsalan*, Peter Christen, Vassilios S. Verykios

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    19 Citations (Scopus)

    Abstract

    The task of linking multiple databases with the aim to identify records that refer to the same entity is occurring increasingly in many application areas. If unique identifiers for the entities are not available in all the databases to be linked, techniques that calculate approximate similarities between records must be used for the identification of matching pairs of records. Often, the records to be linked contain personal information such as names and addresses. In many applications, the exchange of attribute values that contain such personal details between organisations is not allowed due to privacy concerns. The linking of records between databases without revealing the actual attribute values in these records is the research problem known as 'privacy-preserving record linkage' (PPRL).While various approaches have been proposed to deal with privacy within the record linkage process, a viable solution that is well applicable to real-world conditions needs to address the major aspect of scalability of linking very large databases while preserving security and linkage quality. We propose a novel two-party protocol for PPRL that addresses scalability, security and quality/ accuracy. The protocol is based on (1) the use of reference values that are available to both database owners, and allows them to individually calculate the similarities between their attribute values and the reference values; and (2) the binning of these calculated similarity values to allow their secure exchange between the two database owners. Experiments on a real-world database with nearly two million records yield linkage results that have a linear scalability to large databases and high linkage accuracy, allowing for approximate matching in the privacy-preserving context. Since the protocol has a low computational burden and allows quality approximate matching while still preserving the privacy of the databases that are matched, the protocol can be useful for many real-world applications requiring PPRL.

    Original languageEnglish
    Title of host publicationVolume 121 - Ninth Australasian Data Mining Conference
    EditorsPeter Vamplew, Andrew Stranieri, KL Ong, Peter Christen and Paul J. Ken
    Place of PublicationSydney Australia
    PublisherAustralian Computer Society Inc.
    Pages125-136
    Number of pages12
    EditionPeer Reviewed
    ISBN (Print)9781921770029
    Publication statusPublished - 2011
    Event9th Australasian Data Mining Conference, AusDM 2011 - Ballarat, VIC, Australia
    Duration: 1 Dec 20112 Dec 2011

    Publication series

    NameConferences in Research and Practice in Information Technology Series
    Volume121
    ISSN (Print)1445-1336

    Conference

    Conference9th Australasian Data Mining Conference, AusDM 2011
    Country/TerritoryAustralia
    CityBallarat, VIC
    Period1/12/112/12/11

    Fingerprint

    Dive into the research topics of 'An Efficient Two-Party Protocol for Approximate Matching in Private Record Linkage'. Together they form a unique fingerprint.

    Cite this