Towards automated record linkage

Karl Goiser*, Peter Christen

*Corresponding author for this work

    Research output: Contribution to journalConference articlepeer-review

    23 Citations (Scopus)

    Abstract

    The field of Record Linkage is concerned with identifying records from one or more datasets which refer to the same underlying entities. Where entity-unique identifiers are not available and errors occur, the process is non-trivial. Many techniques developed in this field require human intervention to set parameters, manually classify possibly matched records, or provide examples of matched and non-matched records. Whilst of great use and providing high quality results, the requirement of human input, besides being costly, means that if the parameters or examples are not produced or maintained properly, linkage quality will be compromised. The contributions of this paper are a critical discussion on the record linkage process, arguing for a more restrictive use of blocking in research, and evaluating and modifying the farthestfirst clustering technique to produce results close to a supervised technique.

    Original languageEnglish
    Pages (from-to)23-31
    Number of pages9
    JournalConferences in Research and Practice in Information Technology Series
    Volume61
    Publication statusPublished - 2006
    Event5th Australasian Data Mining Conference, AusDM 2006 - Sydney, NSW, Australia
    Duration: 29 Nov 200630 Nov 2006

    Fingerprint

    Dive into the research topics of 'Towards automated record linkage'. Together they form a unique fingerprint.

    Cite this