Bilinear attention networks for person retrieval

Pengfei Fang, Jieming Zhou, Soumava Roy, Lars Petersson, Mehrtash Harandi

    Research output: Chapter in Book/Report/Conference proceedingConference Paperpeer-review

    169 Citations (Scopus)

    Abstract

    This paper investigates a novel Bilinear attention (Bi-attention) block, which discovers and uses second order statistical information in an input feature map, for the purpose of person retrieval. The Bi-attention block uses bilinear pooling to model the local pairwise feature interactions along each channel, while preserving the spatial structural information. We propose an Attention in Attention (AiA) mechanism to build inter-dependency among the second order local and global features with the intent to make better use of, or pay more attention to, such higher order statistical relationships. The proposed network, equipped with the proposed Bi-attention is referred to as Bilinear ATtention network (BAT-net). Our approach outperforms current state-of-the-art by a considerable margin across the standard benchmark datasets (e.g., CUHK03, Market-1501, DukeMTMC-reID and MSMT17).

    Original languageEnglish
    Title of host publicationProceedings - 2019 International Conference on Computer Vision, ICCV 2019
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages8029-8038
    Number of pages10
    ISBN (Electronic)9781728148038
    DOIs
    Publication statusPublished - Oct 2019
    Event17th IEEE/CVF International Conference on Computer Vision, ICCV 2019 - Seoul, Korea, Republic of
    Duration: 27 Oct 20192 Nov 2019

    Publication series

    NameProceedings of the IEEE International Conference on Computer Vision
    Volume2019-October
    ISSN (Print)1550-5499

    Conference

    Conference17th IEEE/CVF International Conference on Computer Vision, ICCV 2019
    Country/TerritoryKorea, Republic of
    CitySeoul
    Period27/10/192/11/19

    Fingerprint

    Dive into the research topics of 'Bilinear attention networks for person retrieval'. Together they form a unique fingerprint.

    Cite this