Efficient two-party private blocking based on sorted nearest neighborhood clustering

Dinusha Vatsalan, Peter Christen, Vassilios S. Verykios

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    21 Citations (Scopus)

    Abstract

    Integrating data from diverse sources with the aim to identify similar records that refer to the same real-world entities without compromising privacy of these entities is an emerging research problem in various domains. This problem is known as privacy-preserving record linkage (PPRL). Scalability of PPRL is a main challenge due to growing data size in real-world applications. Private blocking techniques have been used in PPRL to address this challenge by reducing the number of record pair comparisons that need to be conducted. Many of these private blocking techniques require a trusted third party to perform the blocking. One main threat with three-party solutions is the collusion between parties to identify the private data of another party. We introduce a novel two-party private blocking technique for PPRL based on sorted nearest neighborhood clustering. Privacy is addressed by a combination of the privacy techniques k-anonymous clustering and public reference values. Experiments conducted on two real-world databases validate that our approach is scalable to large databases and effective in generating candidate record pairs that correspond to true matches, while preserving k-anonymous privacy characteristics. Our approach also performs equal or superior compared to three other state-of-the-art private blocking techniques in terms of scalability, blocking quality, and privacy. It can achieve private blocking up-to two magnitudes faster than other state-of-the art private blocking approaches.

    Original languageEnglish
    Title of host publicationCIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management
    Pages1949-1958
    Number of pages10
    DOIs
    Publication statusPublished - 2013
    Event22nd ACM International Conference on Information and Knowledge Management, CIKM 2013 - San Francisco, CA, United States
    Duration: 27 Oct 20131 Nov 2013

    Publication series

    NameInternational Conference on Information and Knowledge Management, Proceedings

    Conference

    Conference22nd ACM International Conference on Information and Knowledge Management, CIKM 2013
    Country/TerritoryUnited States
    CitySan Francisco, CA
    Period27/10/131/11/13

    Fingerprint

    Dive into the research topics of 'Efficient two-party private blocking based on sorted nearest neighborhood clustering'. Together they form a unique fingerprint.

    Cite this