TY - GEN
T1 - A bag reconstruction method for multiple instance classification and group record linkage
AU - Fu, Zhichun
AU - Zhou, Jun
AU - Peng, Furong
AU - Christen, Peter
PY - 2012
Y1 - 2012
N2 - Record linking is the task of detecting records in several databases that refer to the same entity. This task aims at exploring the relationship between entities, which normally lack common identifiers in heterogeneous datasets. When entities contain multiple relational records, linking them across datasets can be more accurate by treating the records as groups, which leads to group linking methods. Even so, individual record links may still be needed for the final group linking step. This problem can be solved by multiple instance learning, in which group links are modelled as bags, and record links are considered as instances. In this paper, we propose a novel method for instance classification and group record linkage via bag reconstruction from instances. The bag reconstruction is based on the modeling of the distribution of negative instances in the training bags via kernel density estimation. We evaluate this approach on both synthetic and real-world data. Our results show that the proposed method can outperform several baseline methods.
AB - Record linking is the task of detecting records in several databases that refer to the same entity. This task aims at exploring the relationship between entities, which normally lack common identifiers in heterogeneous datasets. When entities contain multiple relational records, linking them across datasets can be more accurate by treating the records as groups, which leads to group linking methods. Even so, individual record links may still be needed for the final group linking step. This problem can be solved by multiple instance learning, in which group links are modelled as bags, and record links are considered as instances. In this paper, we propose a novel method for instance classification and group record linkage via bag reconstruction from instances. The bag reconstruction is based on the modeling of the distribution of negative instances in the training bags via kernel density estimation. We evaluate this approach on both synthetic and real-world data. Our results show that the proposed method can outperform several baseline methods.
KW - Bag reconstruction
KW - Group linkage
KW - Historical census data
KW - Instance classification
KW - Multiple instance learning
KW - Record linkage
UR - http://www.scopus.com/inward/record.url?scp=84872716283&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-35527-1_21
DO - 10.1007/978-3-642-35527-1_21
M3 - Conference contribution
SN - 9783642355264
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 247
EP - 259
BT - Advanced Data Mining and Applications - 8th International Conference, ADMA 2012, Proceedings
T2 - 8th International Conference on Advanced Data Mining and Applications, ADMA 2012
Y2 - 15 December 2012 through 18 December 2012
ER -