Learning end-to-end video classification with rank-pooling

Basura Fernando, Stephen Gould

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    33 Citations (Scopus)

    Abstract

    We introduce a new model for representation learning and classification of video sequences. Our model is based on a convolutional neural network coupled with a novel temporal pooling layer. The temporal pooling layer relies on an inner-optimization problem to efficiently encode temporal semantics over arbitrarily long video clips into a fixed-length vector representation. Importantly, the representation and classification parameters of our model can be estimated jointly in an end-to-end manner by formulating learning as a bilevel optimization problem. Furthermore, the model can make use of any existing convolutional neural network architecture (e.g., AlexNet or VGG) without modification or introduction of additional parameters. We demonstrate our approach on action and activity recognition tasks.

    Original languageEnglish
    Title of host publication33rd International Conference on Machine Learning, ICML 2016
    EditorsKilian Q. Weinberger, Maria Florina Balcan
    PublisherInternational Machine Learning Society (IMLS)
    Pages1823-1832
    Number of pages10
    ISBN (Electronic)9781510829008
    Publication statusPublished - 2016
    Event33rd International Conference on Machine Learning, ICML 2016 - New York City, United States
    Duration: 19 Jun 201624 Jun 2016

    Publication series

    Name33rd International Conference on Machine Learning, ICML 2016
    Volume3

    Conference

    Conference33rd International Conference on Machine Learning, ICML 2016
    Country/TerritoryUnited States
    CityNew York City
    Period19/06/1624/06/16

    Fingerprint

    Dive into the research topics of 'Learning end-to-end video classification with rank-pooling'. Together they form a unique fingerprint.

    Cite this