Secure and Accurate Two-Step Hash Encoding for Privacy-Preserving Record Linkage

Thilina Ranbaduge*, Peter Christen, Rainer Schnell

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    11 Citations (Scopus)

    Abstract

    In order to discover new insights from data, there is a growing need to share information that is distributed across multiple databases that are often held by different organisations. One key task in data integration is the calculation of similarities between records to identify pairs or sets of records that correspond to the same real-world entities. Due to privacy and confidentiality concerns, however, the owners of sensitive databases are often not allowed or willing to exchange or share their data with other organisations to allow such similarity calculations. In this paper we propose a novel privacy-preserving encoding technique that can be used to securely calculate similarities between sensitive values held in different databases. Our technique uses two-step hashing to encode values into an integer set representation that provides strong privacy guarantees and allows accurate similarity calculations. We provide a theoretical analysis of the accuracy and privacy of our encoding technique, and conduct an empirical study on large real databases containing several millions records. Our results show that our technique provides high security against privacy attacks and achieves better similarity accuracy compared to two state-of-the-art encoding techniques.

    Original languageEnglish
    Title of host publicationAdvances in Knowledge Discovery and Data Mining - 24th Pacific-Asia Conference, PAKDD 2020, Proceedings
    EditorsHady W. Lauw, Ee-Peng Lim, Raymond Chi-Wing Wong, Alexandros Ntoulas, See-Kiong Ng, Sinno Jialin Pan
    PublisherSpringer
    Pages139-151
    Number of pages13
    ISBN (Print)9783030474355
    DOIs
    Publication statusPublished - 2020
    Event24th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2020 - Singapore, Singapore
    Duration: 11 May 202014 May 2020

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume12085 LNAI
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference24th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2020
    Country/TerritorySingapore
    CitySingapore
    Period11/05/2014/05/20

    Fingerprint

    Dive into the research topics of 'Secure and Accurate Two-Step Hash Encoding for Privacy-Preserving Record Linkage'. Together they form a unique fingerprint.

    Cite this