Outlier detection based accurate geocoding of historical addresses

Nishadi Kirielle*, Peter Christen, Thilina Ranbaduge

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    4 Citations (Scopus)

    Abstract

    Research in the social sciences is increasingly based on large and complex databases, such as historical birth, marriage, death, and census records. Such databases can be analyzed individually to investigate, for example, changes in education, health, and emigration over time. Many of these historical databases contain addresses, and assigning geographical locations (latitude and longitude), the process known as geocoding, will provide the foundation to facilitate a wide range of studies based on spatial data analysis. Furthermore, geocoded records can be employed to enhance record linkage processes, where family trees for whole populations can be constructed. However, a challenging aspect when geocoding historical addresses is that these might have changed over time and therefore are only partially or not at all available in modern geocoding systems. In this paper, we present a novel method to geocode historical addresses where we use an online geocoding service to initially retrieve geocodes for historical addresses. For those addresses where multiple geocodes are returned, we employ outlier detection to improve the accuracy of locations assigned to addresses, while for addresses where no geocode was found, for example due to spelling variations, we employ approximate string matching to identify the most likely correct spelling along with the corresponding geocode. Experiments on two real historical data sets, one from Scotland and the other from Finland, show that our method can reduce the number of addresses with multiple geocodes by over 80% and increase the number of addresses from no to a single geocode by up to 31% compared to an online geocoding service.

    Original languageEnglish
    Title of host publicationData Mining - 17th Australasian Conference, AusDM 2019, Proceedings
    EditorsThuc D. Le, Lin Liu, Kok-Leong Ong, Yanchang Zhao, Warren H. Jin, Sebastien Wong, Graham Williams
    PublisherSpringer
    Pages41-53
    Number of pages13
    ISBN (Print)9789811516986
    DOIs
    Publication statusPublished - 2019
    Event17th Australasian Conference on Data Mining, AusDM 2019 - Adelaide, Australia
    Duration: 2 Dec 20195 Dec 2019

    Publication series

    NameCommunications in Computer and Information Science
    Volume1127 CCIS
    ISSN (Print)1865-0929
    ISSN (Electronic)1865-0937

    Conference

    Conference17th Australasian Conference on Data Mining, AusDM 2019
    Country/TerritoryAustralia
    CityAdelaide
    Period2/12/195/12/19

    Fingerprint

    Dive into the research topics of 'Outlier detection based accurate geocoding of historical addresses'. Together they form a unique fingerprint.

    Cite this