TY - GEN
T1 - GeCo - An online personal data Generator and Corruptor
AU - Tran, Khoi Nguyen
AU - Vatsalan, Dinusha
AU - Christen, Peter
PY - 2013
Y1 - 2013
N2 - We demonstrate GeCo, an online personal data Generator and Corruptor that facilitates the creation of realistic personal data ranging from names, addresses, and dates, to social security and credit card numbers, as well as numerical values such as salary or blood pressure. Using an intuitive Web interface, a user can create records containing such data according to their needs, and apply various corruption functions to generate duplicates of these records. Synthetic personal data are increasingly required in areas such as record de-duplication, fraud detection, cloud computing, and health informatics, where data quality issues can significantly affect the outcomes of data integration, processing, and mining projects. Privacy concerns, however, often make it difficult for researchers to obtain real data that contain personal details. Compared to other data generators that have to be downloaded, installed and customized, GeCo allows the creation of personal data with much less effort. In this demonstration we show (1) how different types of attributes, and dependencies between them, can be specified; (2) how the generated data can be modified using various types of corruption functions; and (3) how a user can contribute to GeCo by providing attribute generation functions and look-up files. We believe GeCo will be a valuable tool for researchers that require realistic personal data to evaluate their algorithms with regard to efficiency and effectiveness. Copyright is held by the author/owner(s).
AB - We demonstrate GeCo, an online personal data Generator and Corruptor that facilitates the creation of realistic personal data ranging from names, addresses, and dates, to social security and credit card numbers, as well as numerical values such as salary or blood pressure. Using an intuitive Web interface, a user can create records containing such data according to their needs, and apply various corruption functions to generate duplicates of these records. Synthetic personal data are increasingly required in areas such as record de-duplication, fraud detection, cloud computing, and health informatics, where data quality issues can significantly affect the outcomes of data integration, processing, and mining projects. Privacy concerns, however, often make it difficult for researchers to obtain real data that contain personal details. Compared to other data generators that have to be downloaded, installed and customized, GeCo allows the creation of personal data with much less effort. In this demonstration we show (1) how different types of attributes, and dependencies between them, can be specified; (2) how the generated data can be modified using various types of corruption functions; and (3) how a user can contribute to GeCo by providing attribute generation functions and look-up files. We believe GeCo will be a valuable tool for researchers that require realistic personal data to evaluate their algorithms with regard to efficiency and effectiveness. Copyright is held by the author/owner(s).
KW - Data generation
KW - Duplicates
KW - Online demo
KW - Synthetic data
UR - http://www.scopus.com/inward/record.url?scp=84889610164&partnerID=8YFLogxK
U2 - 10.1145/2505515.2508207
DO - 10.1145/2505515.2508207
M3 - Conference contribution
SN - 9781450322638
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 2473
EP - 2476
BT - CIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management
T2 - 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013
Y2 - 27 October 2013 through 1 November 2013
ER -