GeCo - An online personal data Generator and Corruptor

Khoi Nguyen Tran, Dinusha Vatsalan, Peter Christen

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    42 Citations (Scopus)

    Abstract

    We demonstrate GeCo, an online personal data Generator and Corruptor that facilitates the creation of realistic personal data ranging from names, addresses, and dates, to social security and credit card numbers, as well as numerical values such as salary or blood pressure. Using an intuitive Web interface, a user can create records containing such data according to their needs, and apply various corruption functions to generate duplicates of these records. Synthetic personal data are increasingly required in areas such as record de-duplication, fraud detection, cloud computing, and health informatics, where data quality issues can significantly affect the outcomes of data integration, processing, and mining projects. Privacy concerns, however, often make it difficult for researchers to obtain real data that contain personal details. Compared to other data generators that have to be downloaded, installed and customized, GeCo allows the creation of personal data with much less effort. In this demonstration we show (1) how different types of attributes, and dependencies between them, can be specified; (2) how the generated data can be modified using various types of corruption functions; and (3) how a user can contribute to GeCo by providing attribute generation functions and look-up files. We believe GeCo will be a valuable tool for researchers that require realistic personal data to evaluate their algorithms with regard to efficiency and effectiveness. Copyright is held by the author/owner(s).

    Original languageEnglish
    Title of host publicationCIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management
    Pages2473-2476
    Number of pages4
    DOIs
    Publication statusPublished - 2013
    Event22nd ACM International Conference on Information and Knowledge Management, CIKM 2013 - San Francisco, CA, United States
    Duration: 27 Oct 20131 Nov 2013

    Publication series

    NameInternational Conference on Information and Knowledge Management, Proceedings

    Conference

    Conference22nd ACM International Conference on Information and Knowledge Management, CIKM 2013
    Country/TerritoryUnited States
    CitySan Francisco, CA
    Period27/10/131/11/13

    Fingerprint

    Dive into the research topics of 'GeCo - An online personal data Generator and Corruptor'. Together they form a unique fingerprint.

    Cite this