TY - GEN
T1 - Privacy aspects in big data integration
T2 - 1st International Workshop on Privacy and Secuirty of Big Data, PSBD 2014
AU - Christen, Peter
N1 - Publisher Copyright:
Copyright © 2014 ACM.
PY - 2014/11/7
Y1 - 2014/11/7
N2 - Big Data projects often require data from several sources to be integrated before they can be used for analysis. Once data have been integrated, they allow more detailed analysis that would otherwise not be possible. Accordingly, recent years have seen an increasing interest in techniques that facilitate the integration of data from diverse sources [2, 3]. Whenever data about individuals, or otherwise sensitive data, are to be integrated across organizations, privacy and confidentiality have to be considered. Domains where privacy preservation during data integration is of importance include business collaborations, health research, national censuses, the social sciences, crime and fraud detection, and homeland security [1]. Increasingly, applications in these domains require data from diverse sources (both internal and external to an organization) to be integrated. Consequently, in the past decade, various techniques have been developed that aim to facilitate data integration without revealing any private or confidential information about the databases and records that are integrated [4]. These techniques either provably prevent leakage of any private information, or they provide some empirical numerical measure of the risk of disclosure of private information. In the first part of this presentation we provide a background on data integration, and illustrate the importance of preserving privacy during data integration with several application scenarios. We then given an overview of the main concepts and techniques that have been developed to facilitate data integration in such ways that no private or confidential information is being revealed. We focus on privacy-preserving record linkage (PPRL), where so far most research has been conducted [4]. We describe the basic protocols used in PPRL, and several key technologies employed in these protocols. Finally, we discuss the challenges privacy poses to data integration in the era of Big Data, and we discuss directions and opportunities in this research area.
AB - Big Data projects often require data from several sources to be integrated before they can be used for analysis. Once data have been integrated, they allow more detailed analysis that would otherwise not be possible. Accordingly, recent years have seen an increasing interest in techniques that facilitate the integration of data from diverse sources [2, 3]. Whenever data about individuals, or otherwise sensitive data, are to be integrated across organizations, privacy and confidentiality have to be considered. Domains where privacy preservation during data integration is of importance include business collaborations, health research, national censuses, the social sciences, crime and fraud detection, and homeland security [1]. Increasingly, applications in these domains require data from diverse sources (both internal and external to an organization) to be integrated. Consequently, in the past decade, various techniques have been developed that aim to facilitate data integration without revealing any private or confidential information about the databases and records that are integrated [4]. These techniques either provably prevent leakage of any private information, or they provide some empirical numerical measure of the risk of disclosure of private information. In the first part of this presentation we provide a background on data integration, and illustrate the importance of preserving privacy during data integration with several application scenarios. We then given an overview of the main concepts and techniques that have been developed to facilitate data integration in such ways that no private or confidential information is being revealed. We focus on privacy-preserving record linkage (PPRL), where so far most research has been conducted [4]. We describe the basic protocols used in PPRL, and several key technologies employed in these protocols. Finally, we discuss the challenges privacy poses to data integration in the era of Big Data, and we discuss directions and opportunities in this research area.
KW - Data matching
KW - Multi-party
KW - Privacy techniques
KW - Privacy-preserving record linkage
KW - Scalability
UR - http://www.scopus.com/inward/record.url?scp=84978699235&partnerID=8YFLogxK
U2 - 10.1145/2663715.2669615
DO - 10.1145/2663715.2669615
M3 - Conference contribution
T3 - PSBD 2014 - Proceedings of the 1st International Workshop on Privacy and Secuirty of Big Data, co-located with CIKM 2014
SP - 3
EP - 10
BT - PSBD 2014 - Proceedings of the 1st International Workshop on Privacy and Secuirty of Big Data, co-located with CIKM 2014
PB - Association for Computing Machinery (ACM)
Y2 - 7 November 2014
ER -