Dynamic multimodal fusion in video search

Lexing Xie*, Apostol Natsev, Jelena Tešić

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

16 Citations (Scopus)

Abstract

We propose effective multimodal fusion strategies for video search. Multimodal search is a widely applicable information-retrieval problem, and fusion strategies are essential to the system in order to utilize all available retrieval experts and to boost the performance. Prior work has focused on hard- and soft- modeling of query classes and learning weights for each class, while the class partition is either manually defined or learned from data but still insensitive to the testing query. We propose a query-dependent fusion strategy that dynamically generates a class among the training queries that are closest to the testing query, based on light-weight query features defined on the outcome of semantic analysis on the query text. A set of optimal weights are then learned on the dynamic class, which aims to model both the co-occurring query features and unusual test queries. Used in conjunction with the rest of our multimodal retrieval system, dynamic query classes performs favorably with hard and soft query classes, and the system performance improves upon the best automatic search run of TRECVID05 and TRECVID06 by 34% and 8%, respectively.

Original languageEnglish
Title of host publicationProceedings of the 2007 IEEE International Conference on Multimedia and Expo, ICME 2007
PublisherIEEE Computer Society
Pages1499-1502
Number of pages4
ISBN (Print)1424410177, 9781424410170
DOIs
Publication statusPublished - 2007
Externally publishedYes
EventIEEE International Conference onMultimedia and Expo, ICME 2007 - Beijing, China
Duration: 2 Jul 20075 Jul 2007

Publication series

NameProceedings of the 2007 IEEE International Conference on Multimedia and Expo, ICME 2007

Conference

ConferenceIEEE International Conference onMultimedia and Expo, ICME 2007
Country/TerritoryChina
CityBeijing
Period2/07/075/07/07

Fingerprint

Dive into the research topics of 'Dynamic multimodal fusion in video search'. Together they form a unique fingerprint.

Cite this