Scalable Privacy-Preserving Linking of Multiple Databases Using Counting Bloom Filters

Dinusha Vatsalan, Peter Christen, Erhard Rahm

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    20 Citations (Scopus)

    Abstract

    The integration, mining, and analysis of person-specific data can provide enormous opportunities for organizations, governments, and researchers to leverage today's massive data collections. However, the use of personal or otherwise sensitive data also raises concerns about the privacy, confidentiality, and potential discrimination of people. Privacy-preserving record linkage (PPRL) is a growing research area that aims at integrating sensitive information from multiple disparate databases held by different organizations while preserving the privacy of the individuals in these databases by not revealing their identities and thereby preventing re-identification and discrimination. PPRL approaches are increasingly required in many real-world application areas ranging from healthcare to national security. Previous approaches to PPRL have mostly focused on linking only two databases. Scaling PPRL to several databases is an open challenge since privacy threats as well as the computation and communication costs increase significantly with the number of databases involved. We thus propose a new encoding method of sensitive data based on Counting Bloom Filters (CBF) to improve privacy for multi-party PPRL (MP-PPRL). We investigate optimizations to reduce computation and communication costs for CBF-based MP-PPRL. Our empirical evaluation with real datasets demonstrates the viability of our approach in terms of scalability, linkage quality, and privacy.

    Original languageEnglish
    Title of host publicationProceedings - 16th IEEE International Conference on Data Mining Workshops, ICDMW 2016
    EditorsCarlotta Domeniconi, Francesco Gullo, Francesco Bonchi, Francesco Bonchi, Josep Domingo-Ferrer, Ricardo Baeza-Yates, Ricardo Baeza-Yates, Ricardo Baeza-Yates, Zhi-Hua Zhou, Xindong Wu
    PublisherIEEE Computer Society
    Pages882-889
    Number of pages8
    ISBN (Electronic)9781509054725
    DOIs
    Publication statusPublished - 2 Jul 2016
    Event16th IEEE International Conference on Data Mining Workshops, ICDMW 2016 - Barcelona, Spain
    Duration: 12 Dec 201615 Dec 2016

    Publication series

    NameIEEE International Conference on Data Mining Workshops, ICDMW
    Volume0
    ISSN (Print)2375-9232
    ISSN (Electronic)2375-9259

    Conference

    Conference16th IEEE International Conference on Data Mining Workshops, ICDMW 2016
    Country/TerritorySpain
    CityBarcelona
    Period12/12/1615/12/16

    Fingerprint

    Dive into the research topics of 'Scalable Privacy-Preserving Linking of Multiple Databases Using Counting Bloom Filters'. Together they form a unique fingerprint.

    Cite this