Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN

Bo Li, Yuchao Dai, Xuelian Cheng, Huahui Chen, Yi Lin, Mingyi He

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    222 Citations (Scopus)

    Abstract

    We present an image classification based approach to large scale action recognition from 3D skeleton videos. Firstly, we map the 3D skeleton videos to color images, where the transformed action images are translation-scale invariance and dataset independent. Secondly, we propose a multi-scale deep convolutional neural network (CNN) for the image classification task, which could enhance the temporal frequency adjustment of our model. Even though the action images are very different from natural images, the fine-tune strategy still works well. Finally, we exploit various kinds of data augmentation methods to improve the generalization ability of the network. Experimental results on the largest and most challenging benchmark NTU RGB-D dataset show that our method achieves the state-of-the-art performance and outperforms other methods by a large margin.

    Original languageEnglish
    Title of host publication2017 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2017
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages601-604
    Number of pages4
    ISBN (Electronic)9781538605608
    DOIs
    Publication statusPublished - 5 Sept 2017
    Event2017 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2017 - Hong Kong, Hong Kong
    Duration: 10 Jul 201714 Jul 2017

    Publication series

    Name2017 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2017

    Conference

    Conference2017 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2017
    Country/TerritoryHong Kong
    CityHong Kong
    Period10/07/1714/07/17

    Fingerprint

    Dive into the research topics of 'Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN'. Together they form a unique fingerprint.

    Cite this