TY - GEN
T1 - DORi
T2 - 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021
AU - Rodriguez-Opazo, Cristian
AU - Marrese-Taylor, Edison
AU - Fernando, Basura
AU - Li, Hongdong
AU - Gould, Stephen
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/1
Y1 - 2021/1
N2 - This paper studies the task of temporal moment localization in long untrimmed videos using natural language queries. Given a query sentence, the goal is to determine the start and end of the relevant segment within the video. Our key innovation is to learn a video feature embedding through a language-conditioned message-passing algorithm suitable for temporal moment localization which captures the relationships between humans, objects and activities in the video. These relationships are obtained by a spatial sub-graph that contextualizes the scene representation using detected objects and human features conditioned in the language query. Moreover, a temporal sub-graph captures the activities within the video through time. Our method is evaluated on three standard benchmark datasets, and we also introduce YouCookII as a new benchmark for this task. Experiments show our method outperforms state-of-the-art methods on these datasets, confirming the effectiveness of our approach.
AB - This paper studies the task of temporal moment localization in long untrimmed videos using natural language queries. Given a query sentence, the goal is to determine the start and end of the relevant segment within the video. Our key innovation is to learn a video feature embedding through a language-conditioned message-passing algorithm suitable for temporal moment localization which captures the relationships between humans, objects and activities in the video. These relationships are obtained by a spatial sub-graph that contextualizes the scene representation using detected objects and human features conditioned in the language query. Moreover, a temporal sub-graph captures the activities within the video through time. Our method is evaluated on three standard benchmark datasets, and we also introduce YouCookII as a new benchmark for this task. Experiments show our method outperforms state-of-the-art methods on these datasets, confirming the effectiveness of our approach.
UR - http://www.scopus.com/inward/record.url?scp=85106097711&partnerID=8YFLogxK
U2 - 10.1109/WACV48630.2021.00112
DO - 10.1109/WACV48630.2021.00112
M3 - Conference contribution
T3 - Proceedings - 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021
SP - 1078
EP - 1087
BT - Proceedings - 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 5 January 2021 through 9 January 2021
ER -