A comparison of personal name matching: Techniques and practical issues

Peter Christen*

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    202 Citations (Scopus)

    Abstract

    Finding and matching personal names is at the core of an increasing number of applications: from text and Web mining, search engines, to information extraction, deduplication and data linkage systems. Variations and errors in names make exact string matching problematic, and approximate matching techniques have to be applied. When compared to general text, however, personal names have different characteristics that need to be considered. In this paper we discuss the characteristics of personal names and present potential sources of variations and errors. We then overview a comprehensive number of commonly used, as well as some recently developed name matching techniques. Experimental comparisons using four large name data sets indicate that there is no clear best matching technique.

    Original languageEnglish
    Title of host publicationProceedings - ICDM Workshops 2006 - 6th IEEE International Conference on Data Mining - Workshops
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages290-294
    Number of pages5
    ISBN (Print)0769527027, 9780769527024
    DOIs
    Publication statusPublished - 2006

    Publication series

    NameProceedings - IEEE International Conference on Data Mining, ICDM
    ISSN (Print)1550-4786

    Fingerprint

    Dive into the research topics of 'A comparison of personal name matching: Techniques and practical issues'. Together they form a unique fingerprint.

    Cite this