Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

Damien Teney, Peter Anderson, Xiaodong He, Anton Van Den Hengel

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    258 Citations (Scopus)

    Abstract

    Deep Learning has had a transformative impact on Computer Vision, but for all of the success there is also a significant cost. This is that the models and procedures used are so complex and intertwined that it is often impossible to distinguish the impact of the individual design and engineering choices each model embodies. This ambiguity diverts progress in the field, and leads to a situation where developing a state-of-the-art model is as much an art as a science. As a step towards addressing this problem we present a massive exploration of the effects of the myriad architectural and hyperparameter choices that must be made in generating a state-of-the-art model. The model is of particular interest because it won the 2017 Visual Question Answering Challenge. We provide a detailed analysis of the impact of each choice on model performance, in the hope that it will inform others in developing models, but also that it might set a precedent that will accelerate scientific progress in the field.

    Original languageEnglish
    Title of host publicationProceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018
    PublisherIEEE Computer Society
    Pages4223-4232
    Number of pages10
    ISBN (Electronic)9781538664209
    DOIs
    Publication statusPublished - 14 Dec 2018
    Event31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018 - Salt Lake City, United States
    Duration: 18 Jun 201822 Jun 2018

    Publication series

    NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
    ISSN (Print)1063-6919

    Conference

    Conference31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018
    Country/TerritoryUnited States
    CitySalt Lake City
    Period18/06/1822/06/18

    Fingerprint

    Dive into the research topics of 'Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge'. Together they form a unique fingerprint.

    Cite this