Similarity-aware indexing for real-time entity resolution

Peter Christen*, Ross Gayler, David Hawking

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    41 Citations (Scopus)

    Abstract

    Entity resolution, also known as data matching or record linkage, is the task of identifying and matching records from several databases that refer to the same entities. Traditionally, entity resolution has been applied in batch-mode and on static databases. However, many organisations are increasingly faced with the challenge of having large databases containing entities that need to be matched in real-time with a stream of query records also containing entities, such that the best matching records are retrieved. Example applications include online law enforcement and national security databases, public health surveillance and emergency response systems, financial verification systems, online retail stores, eGovernment services, and digital libraries. A novel inverted index based approach for real-time entity resolution is presented in this paper. At build time, similarities between attribute values are computed and stored to support the fast matching of records at query time. The presented approach differs from other approaches to approximate query matching in that it allows any similarity comparison function, and any 'blocking' (encoding) function, both possibly domain specific, to be incorporated. Experimental results on a real-world database indicate that the total size of all data structures of this novel index approach grows sub-linearly with the size of the database, and that it allows matching of query records in sub-second time, more than two orders of magnitude faster than a traditional entity resolution index approach. The interested reader is referred to the longer version of this paper [5].

    Original languageEnglish
    Title of host publicationACM 18th International Conference on Information and Knowledge Management, CIKM 2009
    Pages1565-1568
    Number of pages4
    DOIs
    Publication statusPublished - 2009
    EventACM 18th International Conference on Information and Knowledge Management, CIKM 2009 - Hong Kong, China
    Duration: 2 Nov 20096 Nov 2009

    Publication series

    NameInternational Conference on Information and Knowledge Management, Proceedings

    Conference

    ConferenceACM 18th International Conference on Information and Knowledge Management, CIKM 2009
    Country/TerritoryChina
    CityHong Kong
    Period2/11/096/11/09

    Fingerprint

    Dive into the research topics of 'Similarity-aware indexing for real-time entity resolution'. Together they form a unique fingerprint.

    Cite this