TY - JOUR
T1 - Knowledge memorization and generation for action recognition in still images
AU - Dong, Jian
AU - Yang, Wankou
AU - Yao, Yazhou
AU - Porikli, Fatih
N1 - Publisher Copyright:
© 2021 Elsevier Ltd
PY - 2021/12
Y1 - 2021/12
N2 - Human action recognition in visual data is one of the most fundamental challenges in computer vision. Existing approaches for this primary goal have been based on video data, often incorporating both color and dynamic flow information. Nevertheless, the majority of the visual data constitute still images, and for this reason, being able to recognize actions in still image is an ultimate objective of visual understanding with an extended list of applications. In this paper, we present a novel method that transfers the knowledge learned from action videos onto images to allow recognition of the principal action depicted in still image. Our intuition is that a generative model for knowledge transfer can be learned by taking advantage of the available action videos in the training stage to bridge images to videos. Based on this, we propose two complementary knowledge-transfer models utilizing fully connected networks to deliver the knowledge extracted from color and motion flow sequences to still images. We introduce a weighted reconstruction and classification loss to steer the generation procedure of the networks. In addition, we describe and analyze the influence of different data augmentation techniques, initialization strategies, and weighting coefficients for improving the performance. We observe that: both the transferred knowledge from color sequences and motion flow sequences can improve the performance of still image based human action recognition; the latter one which provides complementary dynamic information improves the performance a lot. We evaluate our models on two publicly available video based human action recognition datasets: UCF101 and HMDB51. To further validate the generalization ability of the proposed solution, we test the learned models from UCF101 dataset on two still image based human action recognition benchmarks: Willow7 Actions and the Sports. Our results demonstrate that the proposed method outperforms the baseline approaches with more than 2% accuracy, 3% accuracy, 3% accuracy and 5% mAP on UCF101, HMDB51, Sports and Willow 7 Actions datasets, respectively.
AB - Human action recognition in visual data is one of the most fundamental challenges in computer vision. Existing approaches for this primary goal have been based on video data, often incorporating both color and dynamic flow information. Nevertheless, the majority of the visual data constitute still images, and for this reason, being able to recognize actions in still image is an ultimate objective of visual understanding with an extended list of applications. In this paper, we present a novel method that transfers the knowledge learned from action videos onto images to allow recognition of the principal action depicted in still image. Our intuition is that a generative model for knowledge transfer can be learned by taking advantage of the available action videos in the training stage to bridge images to videos. Based on this, we propose two complementary knowledge-transfer models utilizing fully connected networks to deliver the knowledge extracted from color and motion flow sequences to still images. We introduce a weighted reconstruction and classification loss to steer the generation procedure of the networks. In addition, we describe and analyze the influence of different data augmentation techniques, initialization strategies, and weighting coefficients for improving the performance. We observe that: both the transferred knowledge from color sequences and motion flow sequences can improve the performance of still image based human action recognition; the latter one which provides complementary dynamic information improves the performance a lot. We evaluate our models on two publicly available video based human action recognition datasets: UCF101 and HMDB51. To further validate the generalization ability of the proposed solution, we test the learned models from UCF101 dataset on two still image based human action recognition benchmarks: Willow7 Actions and the Sports. Our results demonstrate that the proposed method outperforms the baseline approaches with more than 2% accuracy, 3% accuracy, 3% accuracy and 5% mAP on UCF101, HMDB51, Sports and Willow 7 Actions datasets, respectively.
KW - Action recognition
KW - Deep learning
KW - Knowledge-transfer
UR - http://www.scopus.com/inward/record.url?scp=85111023779&partnerID=8YFLogxK
U2 - 10.1016/j.patcog.2021.108188
DO - 10.1016/j.patcog.2021.108188
M3 - Article
SN - 0031-3203
VL - 120
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 108188
ER -