TY - CHAP
T1 - Privacy-preserving record linkage for big data
T2 - Current approaches and research challenges
AU - Vatsalan, Dinusha
AU - Sehili, Ziad
AU - Christen, Peter
AU - Rahm, Erhard
N1 - Publisher Copyright:
© Springer International Publishing AG 2017. All rights reserved.
PY - 2017/2/25
Y1 - 2017/2/25
N2 - The growth ofBig Data, especially personal data dispersed inmultiple data sources, presents enormous opportunities and insights for businesses to explore and leverage the value of linked and integrated data. However, privacy concerns impede sharing or exchanging data for linkage across different organizations. Privacypreserving record linkage (PPRL) aims to address this problem by identifying and linking records that correspond to the same real-world entity across several data sources held by different parties without revealing any sensitive information about these entities. PPRL is increasingly being required in many real-world application areas. Examples range from public health surveillance to crime and fraud detection, and national security. PPRL for Big Data poses several challenges, with the three major ones being (1) scalability to multiple large databases, due to their massive volume and the flow of data within Big Data applications, (2) achieving high quality results of the linkage in the presence of variety and veracity of Big Data, and (3) preserving privacy and confidentiality of the entities represented in Big Data collections. In this chapter, we describe the challenges of PPRL in the context of Big Data, survey existing techniques for PPRL, and provide directions for future research.
AB - The growth ofBig Data, especially personal data dispersed inmultiple data sources, presents enormous opportunities and insights for businesses to explore and leverage the value of linked and integrated data. However, privacy concerns impede sharing or exchanging data for linkage across different organizations. Privacypreserving record linkage (PPRL) aims to address this problem by identifying and linking records that correspond to the same real-world entity across several data sources held by different parties without revealing any sensitive information about these entities. PPRL is increasingly being required in many real-world application areas. Examples range from public health surveillance to crime and fraud detection, and national security. PPRL for Big Data poses several challenges, with the three major ones being (1) scalability to multiple large databases, due to their massive volume and the flow of data within Big Data applications, (2) achieving high quality results of the linkage in the presence of variety and veracity of Big Data, and (3) preserving privacy and confidentiality of the entities represented in Big Data collections. In this chapter, we describe the challenges of PPRL in the context of Big Data, survey existing techniques for PPRL, and provide directions for future research.
KW - Big data
KW - Privacy
KW - Record linkage
KW - Scalability
UR - http://www.scopus.com/inward/record.url?scp=85019870526&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-49340-4_25
DO - 10.1007/978-3-319-49340-4_25
M3 - Chapter
SN - 9783319493398
SP - 851
EP - 895
BT - Handbook of Big Data Technologies
PB - Springer International Publishing Switzerland
ER -