TY - GEN
T1 - Accurate synthetic generation of realistic personal information
AU - Christen, Peter
AU - Pudjijono, Agus
PY - 2009
Y1 - 2009
N2 - A large portion of data collected by many organisations today is about people, and often contains personal identifying information, such as names and addresses. Privacy and confidentiality are of great concern when such data is being shared between organisations or made publicly available. Research in (privacy-preserving) data mining and data linkage is suffering from a lack of publicly available real-world data sets that contain personal information, and therefore experimental evaluations can be difficult to conduct. In order to overcome this problem, we have developed a data generator that allows flexible creation of synthetic data containing personal information with realistic characteristics, such as frequency distributions, attribute dependencies, and error probabilities. Our generator significantly improves earlier approaches, and allows the generation of data for individuals, families and households.
AB - A large portion of data collected by many organisations today is about people, and often contains personal identifying information, such as names and addresses. Privacy and confidentiality are of great concern when such data is being shared between organisations or made publicly available. Research in (privacy-preserving) data mining and data linkage is suffering from a lack of publicly available real-world data sets that contain personal information, and therefore experimental evaluations can be difficult to conduct. In order to overcome this problem, we have developed a data generator that allows flexible creation of synthetic data containing personal information with realistic characteristics, such as frequency distributions, attribute dependencies, and error probabilities. Our generator significantly improves earlier approaches, and allows the generation of data for individuals, families and households.
KW - Artificial data
KW - Data linkage
KW - Data matching
KW - Data mining pre-processing
KW - Privacy
UR - http://www.scopus.com/inward/record.url?scp=67650700151&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-01307-2_47
DO - 10.1007/978-3-642-01307-2_47
M3 - Conference contribution
SN - 3642013066
SN - 9783642013065
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 507
EP - 514
BT - 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009
T2 - 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009
Y2 - 27 April 2009 through 30 April 2009
ER -