TY - GEN
T1 - A supervised learning and group linking method for historical census household linkage
AU - Fu, Zhichun
AU - Christen, Peter
AU - Boot, Mac
PY - 2010
Y1 - 2010
N2 - Historical census data provide a snapshot of the era when our ancestors lived. Such data contain valuable information that allows the reconstruction of households and the tracking of family changes across time, allows the analysis of family diseases, and facilitates a variety of social science research. One particular topic of interest in historical census data analysis are households and linking them across time. This enables tracking of the majority of members in a household over a certain period of time, which facilitates the extraction of information that is hidden in the data, such as fertility, occupations, changes in family structures, immigration and movements, and so on. Such information normally cannot be easily acquired by only linking records that correspond to individuals. In this paper, we propose a novel method to link households in historical census data. Our method first computes the attribute-wise similarity of individual record pairs. A support vector machine classifier is then trained on limited data and used to classify these individual record pairs into matches and nonmatches. In a second step, a group linking approach is employed to link households based on the matched individual record pairs. Experimental results on real census data from the United Kingdom from 1851 to 1901 show that the proposed method can greatly reduce the number of multiple household matches compared with a traditional linkage of individual record pairs only.
AB - Historical census data provide a snapshot of the era when our ancestors lived. Such data contain valuable information that allows the reconstruction of households and the tracking of family changes across time, allows the analysis of family diseases, and facilitates a variety of social science research. One particular topic of interest in historical census data analysis are households and linking them across time. This enables tracking of the majority of members in a household over a certain period of time, which facilitates the extraction of information that is hidden in the data, such as fertility, occupations, changes in family structures, immigration and movements, and so on. Such information normally cannot be easily acquired by only linking records that correspond to individuals. In this paper, we propose a novel method to link households in historical census data. Our method first computes the attribute-wise similarity of individual record pairs. A support vector machine classifier is then trained on limited data and used to classify these individual record pairs into matches and nonmatches. In a second step, a group linking approach is employed to link households based on the matched individual record pairs. Experimental results on real census data from the United Kingdom from 1851 to 1901 show that the proposed method can greatly reduce the number of multiple household matches compared with a traditional linkage of individual record pairs only.
KW - Classification
KW - Group linking
KW - Historical census data
KW - Household linkage
KW - Support vector machine
UR - http://www.scopus.com/inward/record.url?scp=84870553639&partnerID=8YFLogxK
M3 - Conference contribution
SN - 9781921770029
T3 - Conferences in Research and Practice in Information Technology Series
SP - 153
EP - 162
BT - AusDM'11 - Conferences in Research and Practice in Information TechnologyConferences in Research and Practice in Information Technology
T2 - 9th Australasian Data Mining Conference, AusDM 2011
Y2 - 1 December 2011 through 2 December 2011
ER -