TY - GEN
T1 - A comparison of personal name matching
T2 - Techniques and practical issues
AU - Christen, Peter
PY - 2006
Y1 - 2006
N2 - Finding and matching personal names is at the core of an increasing number of applications: from text and Web mining, search engines, to information extraction, deduplication and data linkage systems. Variations and errors in names make exact string matching problematic, and approximate matching techniques have to be applied. When compared to general text, however, personal names have different characteristics that need to be considered. In this paper we discuss the characteristics of personal names and present potential sources of variations and errors. We then overview a comprehensive number of commonly used, as well as some recently developed name matching techniques. Experimental comparisons using four large name data sets indicate that there is no clear best matching technique.
AB - Finding and matching personal names is at the core of an increasing number of applications: from text and Web mining, search engines, to information extraction, deduplication and data linkage systems. Variations and errors in names make exact string matching problematic, and approximate matching techniques have to be applied. When compared to general text, however, personal names have different characteristics that need to be considered. In this paper we discuss the characteristics of personal names and present potential sources of variations and errors. We then overview a comprehensive number of commonly used, as well as some recently developed name matching techniques. Experimental comparisons using four large name data sets indicate that there is no clear best matching technique.
UR - http://www.scopus.com/inward/record.url?scp=78449293191&partnerID=8YFLogxK
U2 - 10.1109/icdmw.2006.2
DO - 10.1109/icdmw.2006.2
M3 - Conference contribution
SN - 0769527027
SN - 9780769527024
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 290
EP - 294
BT - Proceedings - ICDM Workshops 2006 - 6th IEEE International Conference on Data Mining - Workshops
PB - Institute of Electrical and Electronics Engineers Inc.
ER -