TY - GEN
T1 - Adaptive temporal entity resolution on dynamic databases
AU - Christen, Peter
AU - Gayler, Ross W.
PY - 2013
Y1 - 2013
N2 - Entity resolution is the process of matching records that refer to the same entities from one or several databases in situations where the records to be matched do not include unique entity identifiers. Matching therefore has to rely upon partially identifying information, such as names and addresses. Traditionally, entity resolution has been applied in batch-mode and on static databases. However, increasingly organisations are challenged by the task of having a stream of query records that need to be matched to a database of known entities. As these query records are matched, they are inserted into the database as either representing a new entity, or as the latest embodiment of an existing entity. We investigate how temporal and dynamic aspects, such as time differences between query and database records and changes in database content, affect matching quality. We propose an approach that adaptively adjusts similarities between records depending upon the values of the records' attributes and the time differences between records. We evaluate our approach on synthetic data and a large real US voter database, with results showing that our approach can outperform static matching approaches. Keywords: Data matching, record linkage, dynamic data, real-time matching.
AB - Entity resolution is the process of matching records that refer to the same entities from one or several databases in situations where the records to be matched do not include unique entity identifiers. Matching therefore has to rely upon partially identifying information, such as names and addresses. Traditionally, entity resolution has been applied in batch-mode and on static databases. However, increasingly organisations are challenged by the task of having a stream of query records that need to be matched to a database of known entities. As these query records are matched, they are inserted into the database as either representing a new entity, or as the latest embodiment of an existing entity. We investigate how temporal and dynamic aspects, such as time differences between query and database records and changes in database content, affect matching quality. We propose an approach that adaptively adjusts similarities between records depending upon the values of the records' attributes and the time differences between records. We evaluate our approach on synthetic data and a large real US voter database, with results showing that our approach can outperform static matching approaches. Keywords: Data matching, record linkage, dynamic data, real-time matching.
UR - http://www.scopus.com/inward/record.url?scp=84893631965&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-37456-2_47
DO - 10.1007/978-3-642-37456-2_47
M3 - Conference contribution
SN - 9783642374555
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 558
EP - 569
BT - Advances in Knowledge Discovery and Data Mining - 17th Pacific-Asia Conference, PAKDD 2013, Proceedings
T2 - 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013
Y2 - 14 April 2013 through 17 April 2013
ER -