Few-Shot Action Recognition with Permutation-Invariant Attention

Hongguang Zhang*, Li Zhang, Xiaojuan Qi, Hongdong Li, Philip H.S. Torr, Piotr Koniusz

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    72 Citations (Scopus)

    Abstract

    Many few-shot learning models focus on recognising images. In contrast, we tackle a challenging task of few-shot action recognition from videos. We build on a C3D encoder for spatio-temporal video blocks to capture short-range action patterns. Such encoded blocks are aggregated by permutation-invariant pooling to make our approach robust to varying action lengths and long-range temporal dependencies whose patterns are unlikely to repeat even in clips of the same class. Subsequently, the pooled representations are combined into simple relation descriptors which encode so-called query and support clips. Finally, relation descriptors are fed to the comparator with the goal of similarity learning between query and support clips. Importantly, to re-weight block contributions during pooling, we exploit spatial and temporal attention modules and self-supervision. In naturalistic clips (of the same class) there exists a temporal distribution shift–the locations of discriminative temporal action hotspots vary. Thus, we permute blocks of a clip and align the resulting attention regions with similarly permuted attention regions of non-permuted clip to train the attention mechanism invariant to block (and thus long-term hotspot) permutations. Our method outperforms the state of the art on the HMDB51, UCF101, miniMIT datasets.

    Original languageEnglish
    Title of host publicationComputer Vision – ECCV 2020 - 16th European Conference, Proceedings
    EditorsAndrea Vedaldi, Horst Bischof, Thomas Brox, Jan-Michael Frahm
    PublisherSpringer Science and Business Media Deutschland GmbH
    Pages525-542
    Number of pages18
    ISBN (Print)9783030585570
    DOIs
    Publication statusPublished - 2020
    Event16th European Conference on Computer Vision, ECCV 2020 - Glasgow, United Kingdom
    Duration: 23 Aug 202028 Aug 2020

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume12350 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference16th European Conference on Computer Vision, ECCV 2020
    Country/TerritoryUnited Kingdom
    CityGlasgow
    Period23/08/2028/08/20

    Fingerprint

    Dive into the research topics of 'Few-Shot Action Recognition with Permutation-Invariant Attention'. Together they form a unique fingerprint.

    Cite this