Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer

Frederic Z. Zhang, Dylan Campbell, Stephen Gould

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    45 Citations (Scopus)

    Abstract

    Recent developments in transformer models for visual data have led to significant improvements in recognition and detection tasks. In particular, using learnable queries in place of region proposals has given rise to a new class of one-stage detection models, spearheaded by the Detection Transformer (DETR). Variations on this one-stage approach have since dominated human-object interaction (HOI) detection. However, the success of such one-stage HOI detectors can largely be attributed to the representation power of transformers. We discovered that when equipped with the same transformer, their two-stage counterparts can be more performant and memory-efficient, while taking a fraction of the time to train. In this work, we propose the Unary-Pairwise Transformer, a two-stage detector that exploits unary and pairwise representations for HOIs. We observe that the unary and pairwise parts of our transformer network specialise, with the former preferentially increasing the scores of positive examples and the latter decreasing the scores of negative examples. We evaluate our method on the HICO-DET and V-COCO datasets, and significantly outperform state-of-the-art approaches. At inference time, our model with ResNet50 approaches real-time performance on a single GPU.

    Original languageEnglish
    Title of host publicationProceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
    PublisherIEEE Computer Society
    Pages20072-20080
    Number of pages9
    ISBN (Electronic)9781665469463
    ISBN (Print)978-1-6654-6947-0
    DOIs
    Publication statusPublished - 27 Sept 2022
    Event2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 - New Orleans, United States
    Duration: 19 Jun 202224 Jun 2022

    Publication series

    NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
    Volume2022-June
    ISSN (Print)1063-6919

    Conference

    Conference2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
    Country/TerritoryUnited States
    CityNew Orleans
    Period19/06/2224/06/22

    Fingerprint

    Dive into the research topics of 'Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer'. Together they form a unique fingerprint.

    Cite this