Automated probabilistic address standardisation and verification

Peter Christen*, Daniel Belacic

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    23 Citations (Scopus)

    Abstract

    Addresses are a key part of many records containing information about people and organisations, and it is therefore important that accurate address information is available before such data is mined or stored in data warehouses. Unfortunately, addresses are often captured in non-standard and free-text formats, usually with some degree of spelling and typographical errors. Additionally, addresses change over time, for example when people move, when streets are renamed, or when new suburbs are built. Cleaning and standardising addresses, as well as verifying if they really exist, are therefore important steps in data mining pre-processing. In this paper we present an automated probabilistic approach based on a hidden Markov model (HMM), which uses national address guidelines and a comprehensive national address database to clean, standardise and verify raw input addresses. Initial experiments show that our system can correctly standardise even complex and unusual addresses.

    Original languageEnglish
    Title of host publicationAusDM 2005 Proc. - 4th Australasian Data Mining Conf. - Collocated with the 18th Australian Joint Conf. on Artificial Intelligence, AI 2005 and the 2nd Australian Conf. on Artificial Life, ACAL 2005
    Pages53-67
    Number of pages15
    Publication statusPublished - 2005
    Event4th Australasian Data Mining Conference, AusDM 2005 - Collocated with the 18th Australian Joint Conference on Artificial Intelligence, AI 2005 and the 2nd Australian Conference on Artificial Life, ACAL 2005 - Sydney, NSW, Australia
    Duration: 5 Dec 20056 Dec 2005

    Publication series

    NameAusDM 2005 Proc. - 4th Australasian Data Mining Conf. - Collocated with the 18th Australian Joint Conf. on Artificial Intelligence, AI 2005 and the 2nd Australian Conf. on Artifical Life, ACAL 2005

    Conference

    Conference4th Australasian Data Mining Conference, AusDM 2005 - Collocated with the 18th Australian Joint Conference on Artificial Intelligence, AI 2005 and the 2nd Australian Conference on Artificial Life, ACAL 2005
    Country/TerritoryAustralia
    CitySydney, NSW
    Period5/12/056/12/05

    Fingerprint

    Dive into the research topics of 'Automated probabilistic address standardisation and verification'. Together they form a unique fingerprint.

    Cite this