Privacy aspects in big data integration: Challenges and opportunities

Peter Christen*

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    3 Citations (Scopus)

    Abstract

    Big Data projects often require data from several sources to be integrated before they can be used for analysis. Once data have been integrated, they allow more detailed analysis that would otherwise not be possible. Accordingly, recent years have seen an increasing interest in techniques that facilitate the integration of data from diverse sources [2, 3]. Whenever data about individuals, or otherwise sensitive data, are to be integrated across organizations, privacy and confidentiality have to be considered. Domains where privacy preservation during data integration is of importance include business collaborations, health research, national censuses, the social sciences, crime and fraud detection, and homeland security [1]. Increasingly, applications in these domains require data from diverse sources (both internal and external to an organization) to be integrated. Consequently, in the past decade, various techniques have been developed that aim to facilitate data integration without revealing any private or confidential information about the databases and records that are integrated [4]. These techniques either provably prevent leakage of any private information, or they provide some empirical numerical measure of the risk of disclosure of private information. In the first part of this presentation we provide a background on data integration, and illustrate the importance of preserving privacy during data integration with several application scenarios. We then given an overview of the main concepts and techniques that have been developed to facilitate data integration in such ways that no private or confidential information is being revealed. We focus on privacy-preserving record linkage (PPRL), where so far most research has been conducted [4]. We describe the basic protocols used in PPRL, and several key technologies employed in these protocols. Finally, we discuss the challenges privacy poses to data integration in the era of Big Data, and we discuss directions and opportunities in this research area.

    Original languageEnglish
    Title of host publicationPSBD 2014 - Proceedings of the 1st International Workshop on Privacy and Secuirty of Big Data, co-located with CIKM 2014
    PublisherAssociation for Computing Machinery (ACM)
    Pages3-10
    Number of pages8
    ISBN (Electronic)9781450315838
    DOIs
    Publication statusPublished - 7 Nov 2014
    Event1st International Workshop on Privacy and Secuirty of Big Data, PSBD 2014 - Shanghai, China
    Duration: 7 Nov 2014 → …

    Publication series

    NamePSBD 2014 - Proceedings of the 1st International Workshop on Privacy and Secuirty of Big Data, co-located with CIKM 2014

    Conference

    Conference1st International Workshop on Privacy and Secuirty of Big Data, PSBD 2014
    Country/TerritoryChina
    CityShanghai
    Period7/11/14 → …

    Fingerprint

    Dive into the research topics of 'Privacy aspects in big data integration: Challenges and opportunities'. Together they form a unique fingerprint.

    Cite this