Accurate synthetic generation of realistic personal information

Peter Christen*, Agus Pudjijono

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    57 Citations (Scopus)

    Abstract

    A large portion of data collected by many organisations today is about people, and often contains personal identifying information, such as names and addresses. Privacy and confidentiality are of great concern when such data is being shared between organisations or made publicly available. Research in (privacy-preserving) data mining and data linkage is suffering from a lack of publicly available real-world data sets that contain personal information, and therefore experimental evaluations can be difficult to conduct. In order to overcome this problem, we have developed a data generator that allows flexible creation of synthetic data containing personal information with realistic characteristics, such as frequency distributions, attribute dependencies, and error probabilities. Our generator significantly improves earlier approaches, and allows the generation of data for individuals, families and households.

    Original languageEnglish
    Title of host publication13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009
    Pages507-514
    Number of pages8
    DOIs
    Publication statusPublished - 2009
    Event13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009 - Bangkok, Thailand
    Duration: 27 Apr 200930 Apr 2009

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume5476 LNAI
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009
    Country/TerritoryThailand
    CityBangkok
    Period27/04/0930/04/09

    Fingerprint

    Dive into the research topics of 'Accurate synthetic generation of realistic personal information'. Together they form a unique fingerprint.

    Cite this