Knowledge memorization and generation for action recognition in still images

Jian Dong, Wankou Yang*, Yazhou Yao, Fatih Porikli

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    12 Citations (Scopus)

    Abstract

    Human action recognition in visual data is one of the most fundamental challenges in computer vision. Existing approaches for this primary goal have been based on video data, often incorporating both color and dynamic flow information. Nevertheless, the majority of the visual data constitute still images, and for this reason, being able to recognize actions in still image is an ultimate objective of visual understanding with an extended list of applications. In this paper, we present a novel method that transfers the knowledge learned from action videos onto images to allow recognition of the principal action depicted in still image. Our intuition is that a generative model for knowledge transfer can be learned by taking advantage of the available action videos in the training stage to bridge images to videos. Based on this, we propose two complementary knowledge-transfer models utilizing fully connected networks to deliver the knowledge extracted from color and motion flow sequences to still images. We introduce a weighted reconstruction and classification loss to steer the generation procedure of the networks. In addition, we describe and analyze the influence of different data augmentation techniques, initialization strategies, and weighting coefficients for improving the performance. We observe that: both the transferred knowledge from color sequences and motion flow sequences can improve the performance of still image based human action recognition; the latter one which provides complementary dynamic information improves the performance a lot. We evaluate our models on two publicly available video based human action recognition datasets: UCF101 and HMDB51. To further validate the generalization ability of the proposed solution, we test the learned models from UCF101 dataset on two still image based human action recognition benchmarks: Willow7 Actions and the Sports. Our results demonstrate that the proposed method outperforms the baseline approaches with more than 2% accuracy, 3% accuracy, 3% accuracy and 5% mAP on UCF101, HMDB51, Sports and Willow 7 Actions datasets, respectively.

    Original languageEnglish
    Article number108188
    JournalPattern Recognition
    Volume120
    DOIs
    Publication statusPublished - Dec 2021

    Fingerprint

    Dive into the research topics of 'Knowledge memorization and generation for action recognition in still images'. Together they form a unique fingerprint.

    Cite this