TY - GEN
T1 - Ordered pooling of optical flow sequences for action recognition
AU - Wang, Jue
AU - Cherian, Anoop
AU - Porikli, Fatih
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/5/11
Y1 - 2017/5/11
N2 - Training of Convolutional Neural Networks (CNNs) on long video sequences is computationally expensive due to the substantial memory requirements and the massive number of parameters that deep architectures demand. Early fusion of video frames is thus a standard technique, in which several consecutive frames are first agglomerated into a compact representation, and then fed into the CNN as an input sample. For this purpose, a summarization approach that represents a set of consecutive RGB frames by a single dynamic image to capture pixel dynamics is proposed recently. In this paper, we introduce a novel ordered representation of consecutive optical flow frames as an alternative and argue that this representation captures the action dynamics more efficiently than RGB frames. We provide intuitions on why such a representation is better for action recognition. We validate our claims on standard benchmark datasets and demonstrate that using summaries of flow images lead to significant improvements over RGB frames while achieving accuracy comparable to the stateof-The-Art on UCF101 and HMDB datasets.
AB - Training of Convolutional Neural Networks (CNNs) on long video sequences is computationally expensive due to the substantial memory requirements and the massive number of parameters that deep architectures demand. Early fusion of video frames is thus a standard technique, in which several consecutive frames are first agglomerated into a compact representation, and then fed into the CNN as an input sample. For this purpose, a summarization approach that represents a set of consecutive RGB frames by a single dynamic image to capture pixel dynamics is proposed recently. In this paper, we introduce a novel ordered representation of consecutive optical flow frames as an alternative and argue that this representation captures the action dynamics more efficiently than RGB frames. We provide intuitions on why such a representation is better for action recognition. We validate our claims on standard benchmark datasets and demonstrate that using summaries of flow images lead to significant improvements over RGB frames while achieving accuracy comparable to the stateof-The-Art on UCF101 and HMDB datasets.
UR - http://www.scopus.com/inward/record.url?scp=85020207080&partnerID=8YFLogxK
U2 - 10.1109/WACV.2017.26
DO - 10.1109/WACV.2017.26
M3 - Conference contribution
T3 - Proceedings - 2017 IEEE Winter Conference on Applications of Computer Vision, WACV 2017
SP - 168
EP - 176
BT - Proceedings - 2017 IEEE Winter Conference on Applications of Computer Vision, WACV 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 17th IEEE Winter Conference on Applications of Computer Vision, WACV 2017
Y2 - 24 March 2017 through 31 March 2017
ER -