Tree based scalable indexing for multi-party privacy preserving record linkage

Thilina Ranbaduge, Peter Christen, Dinusha Vatsalan

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    8 Citations (Scopus)

    Abstract

    Recently, the linking of multiple databases to identify common sets of records has gained increasing recognition in application areas such as banking, health, insurance, etc. Often the databases to be linked contain sensitive information, where the owners of the databases do not want to share any details with any other party due to privacy concerns. The linkage of records in different databases without revealing their actual values is an emerging research discipline known as privacy-preserving record linkage. Comparison of records in multiple databases requires significant time and computational resources to produce the resulting matching sets of records. At the same time, preserving the privacy of the data is becoming more problematic with the increase of database sizes. We propose a novel indexing (blocking) approach for privacy-preserving record linkage between multiple (more than two) parties. Our approach is based on Bloom filters to encode attribute values into bit vectors. The Bloom filters are used to construct a singlebit tree, where the encoded records are arranged into different blocks. The approach requires the parties to only participate in a secure summation protocol to find the best bits to construct the trees in a balanced manner. Leaf nodes of the trees will contain the blocks with encoded records. These blocks can finally be compared using private comparison and classi fication techniques to determine the similar record sets in different databases. Experiments conducted with datasets of sizes up-to one million records show that our protocol is scalable with both the size of the datasets and the number of parties, while providing better blocking quality and privacy than a phonetic based indexing approach.

    Original languageEnglish
    Title of host publicationData Mining and Analytics 2014 - Proceedings of the 12th Australasian Data Mining Conference, AusDM 2014
    EditorsYanchang Zhao, Yanchang Zhao, Lin Liu, Kok-Leong Ong, Xue Li
    PublisherAustralian Computer Society
    Pages31-42
    Number of pages12
    ISBN (Electronic)9781921770173
    Publication statusPublished - 2014

    Publication series

    NameConferences in Research and Practice in Information Technology Series
    Volume158
    ISSN (Print)1445-1336

    Fingerprint

    Dive into the research topics of 'Tree based scalable indexing for multi-party privacy preserving record linkage'. Together they form a unique fingerprint.

    Cite this