TY - GEN
T1 - Scalable Privacy-Preserving Linking of Multiple Databases Using Counting Bloom Filters
AU - Vatsalan, Dinusha
AU - Christen, Peter
AU - Rahm, Erhard
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/7/2
Y1 - 2016/7/2
N2 - The integration, mining, and analysis of person-specific data can provide enormous opportunities for organizations, governments, and researchers to leverage today's massive data collections. However, the use of personal or otherwise sensitive data also raises concerns about the privacy, confidentiality, and potential discrimination of people. Privacy-preserving record linkage (PPRL) is a growing research area that aims at integrating sensitive information from multiple disparate databases held by different organizations while preserving the privacy of the individuals in these databases by not revealing their identities and thereby preventing re-identification and discrimination. PPRL approaches are increasingly required in many real-world application areas ranging from healthcare to national security. Previous approaches to PPRL have mostly focused on linking only two databases. Scaling PPRL to several databases is an open challenge since privacy threats as well as the computation and communication costs increase significantly with the number of databases involved. We thus propose a new encoding method of sensitive data based on Counting Bloom Filters (CBF) to improve privacy for multi-party PPRL (MP-PPRL). We investigate optimizations to reduce computation and communication costs for CBF-based MP-PPRL. Our empirical evaluation with real datasets demonstrates the viability of our approach in terms of scalability, linkage quality, and privacy.
AB - The integration, mining, and analysis of person-specific data can provide enormous opportunities for organizations, governments, and researchers to leverage today's massive data collections. However, the use of personal or otherwise sensitive data also raises concerns about the privacy, confidentiality, and potential discrimination of people. Privacy-preserving record linkage (PPRL) is a growing research area that aims at integrating sensitive information from multiple disparate databases held by different organizations while preserving the privacy of the individuals in these databases by not revealing their identities and thereby preventing re-identification and discrimination. PPRL approaches are increasingly required in many real-world application areas ranging from healthcare to national security. Previous approaches to PPRL have mostly focused on linking only two databases. Scaling PPRL to several databases is an open challenge since privacy threats as well as the computation and communication costs increase significantly with the number of databases involved. We thus propose a new encoding method of sensitive data based on Counting Bloom Filters (CBF) to improve privacy for multi-party PPRL (MP-PPRL). We investigate optimizations to reduce computation and communication costs for CBF-based MP-PPRL. Our empirical evaluation with real datasets demonstrates the viability of our approach in terms of scalability, linkage quality, and privacy.
KW - Approximate matching
KW - Counting Bloom filters
KW - Multiple databases
KW - Privacy
KW - Record linkage
KW - Scalability
UR - http://www.scopus.com/inward/record.url?scp=85015252976&partnerID=8YFLogxK
U2 - 10.1109/ICDMW.2016.0130
DO - 10.1109/ICDMW.2016.0130
M3 - Conference contribution
T3 - IEEE International Conference on Data Mining Workshops, ICDMW
SP - 882
EP - 889
BT - Proceedings - 16th IEEE International Conference on Data Mining Workshops, ICDMW 2016
A2 - Domeniconi, Carlotta
A2 - Gullo, Francesco
A2 - Bonchi, Francesco
A2 - Bonchi, Francesco
A2 - Domingo-Ferrer, Josep
A2 - Baeza-Yates, Ricardo
A2 - Baeza-Yates, Ricardo
A2 - Baeza-Yates, Ricardo
A2 - Zhou, Zhi-Hua
A2 - Wu, Xindong
PB - IEEE Computer Society
T2 - 16th IEEE International Conference on Data Mining Workshops, ICDMW 2016
Y2 - 12 December 2016 through 15 December 2016
ER -