Maximum likelihood estimation for contingency tables and logistic regression with incorrectly linked data

James O. Chipperfield, Glenys R. Bishop, Paul Campbell

    Research output: Contribution to journalArticlepeer-review

    23 Citations (Scopus)

    Abstract

    Data linkage is the act of bringing together records that are believed to belong to the same unit (e.g., person or business) from two or more files. It is a very common way to enhance dimensions such as time and breadth or depth of detail. Data linkage is often not an error-free process and can lead to linking a pair of records that do not belong to the same unit. There is an explosion of record linkage applications, yet there has been little work on assuring the quality of analyses using such linked files. Naively treating such a linked file as if it were linked without errors will, in general, lead to biased estimates. This paper develops a maximum likelihood estimator for contingency tables and logistic regression with incorrectly linked records. The estimation technique is simple and is implemented using the well-known EM algorithm. A well known method of linking records in the present context is probabilistic data linking. The paper demonstrates the effectiveness of the proposed estimators in an empirical study which uses probabilistic data linkage.

    Original languageEnglish
    Pages (from-to)13-24
    Number of pages12
    JournalSurvey Methodology
    Volume37
    Issue number1
    Publication statusPublished - 29 Jun 2011

    Fingerprint

    Dive into the research topics of 'Maximum likelihood estimation for contingency tables and logistic regression with incorrectly linked data'. Together they form a unique fingerprint.

    Cite this