Abstract
Linking historical census data across time is a challenging task due to various reasons, including data quality, limited individual information, and changes to households over time. Although most census data linking methods link records that correspond to individual household members, recent advances show that linking households as a whole provide more accurate results and less multiple household links. In this paper, we introduce a graph-based method to link households, which takes the structural relationship between household members into consideration. Based on individual record linking results, our method builds a graph for each household, so that the matches are determined by both attribute-level and record-relationship similarity. Our experimental results on both synthetic and real historical census data have validated the effectiveness of this method. The proposed method achieves an F-measure of 0.937 on data extracted from real UK census datasets, outperforming all alternative methods being compared.
Original language | English |
---|---|
Pages (from-to) | 485-496 |
Number of pages | 12 |
Journal | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Volume | 8443 LNAI |
Issue number | PART 1 |
DOIs | |
Publication status | Published - 2014 |
Event | 18th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2014 - Tainan, Taiwan Duration: 13 May 2014 → 16 May 2014 |