Higher-order pooling of cnn features via kernel linearization for action recognition

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    29 Citations (Scopus)

    Abstract

    Most successful deep learning algorithms for action recognition extend models designed for image-based tasks such as object recognition to video. Such extensions are typically trained for actions on single video frames or very short clips, and then their predictions from sliding-windows over the video sequence are pooled for recognizing the action at the sequence level. Usually this pooling step uses the first-order statistics of frame-level action predictions. In this paper, we explore the advantages of using higherorder correlations, specifically, we introduce Higher-order Kernel (HOK) descriptors generated from the late fusion of CNN classifier scores from all the frames in a sequence. To generate these descriptors, we use the idea of kernel linearization. Specifically, a similarity kernel matrix, which captures the temporal evolution of deep classifier scores, is first linearized into kernel feature maps. The HOK descriptors are then generated from the higher-order cooccurrences of these feature maps, and are then used as input to a video-level classifier. We provide experiments on two fine-grained action recognition datasets, and show that our scheme leads to state-of-The-Art results.

    Original languageEnglish
    Title of host publicationProceedings - 2017 IEEE Winter Conference on Applications of Computer Vision, WACV 2017
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages130-138
    Number of pages9
    ISBN (Electronic)9781509048229
    DOIs
    Publication statusPublished - 11 May 2017
    Event17th IEEE Winter Conference on Applications of Computer Vision, WACV 2017 - Santa Rosa, United States
    Duration: 24 Mar 201731 Mar 2017

    Publication series

    NameProceedings - 2017 IEEE Winter Conference on Applications of Computer Vision, WACV 2017

    Conference

    Conference17th IEEE Winter Conference on Applications of Computer Vision, WACV 2017
    Country/TerritoryUnited States
    CitySanta Rosa
    Period24/03/1731/03/17

    Fingerprint

    Dive into the research topics of 'Higher-order pooling of cnn features via kernel linearization for action recognition'. Together they form a unique fingerprint.

    Cite this