Privacy-preserving record linkage for big data: Current approaches and research challenges

Dinusha Vatsalan, Ziad Sehili, Peter Christen, Erhard Rahm*

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

    90 Citations (Scopus)

    Abstract

    The growth ofBig Data, especially personal data dispersed inmultiple data sources, presents enormous opportunities and insights for businesses to explore and leverage the value of linked and integrated data. However, privacy concerns impede sharing or exchanging data for linkage across different organizations. Privacypreserving record linkage (PPRL) aims to address this problem by identifying and linking records that correspond to the same real-world entity across several data sources held by different parties without revealing any sensitive information about these entities. PPRL is increasingly being required in many real-world application areas. Examples range from public health surveillance to crime and fraud detection, and national security. PPRL for Big Data poses several challenges, with the three major ones being (1) scalability to multiple large databases, due to their massive volume and the flow of data within Big Data applications, (2) achieving high quality results of the linkage in the presence of variety and veracity of Big Data, and (3) preserving privacy and confidentiality of the entities represented in Big Data collections. In this chapter, we describe the challenges of PPRL in the context of Big Data, survey existing techniques for PPRL, and provide directions for future research.

    Original languageEnglish
    Title of host publicationHandbook of Big Data Technologies
    PublisherSpringer International Publishing Switzerland
    Pages851-895
    Number of pages45
    ISBN (Electronic)9783319493404
    ISBN (Print)9783319493398
    DOIs
    Publication statusPublished - 25 Feb 2017

    Fingerprint

    Dive into the research topics of 'Privacy-preserving record linkage for big data: Current approaches and research challenges'. Together they form a unique fingerprint.

    Cite this