Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)

Yunzhong Hou, Liang Zheng*

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    35 Citations (Scopus)

    Abstract

    Multiview detection incorporates multiple camera views to deal with occlusions, and its central problem is multiview aggregation. Given feature map projections from multiple views onto a common ground plane, the state-of-the-art method addresses this problem via convolution, which applies the same calculation regardless of object locations. However, such translation-invariant behaviors might not be the best choice, as object features undergo various projection distortions according to their positions and cameras. In this paper, we propose a novel multiview detector, MVDeTr, that adopts a newly introduced shadow transformer to aggregate multiview information. Unlike convolutions, shadow transformer attends differently at different positions and cameras to deal with various shadow-like distortions. We propose an effective training scheme that includes a new view-coherent data augmentation method, which applies random augmentations while maintaining multiview consistency. On two multiview detection benchmarks, we report new state-of-the-art accuracy with the proposed system. Code is available at https://github.com/hou-yz/MVDeTr.

    Original languageEnglish
    Title of host publicationMM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
    PublisherAssociation for Computing Machinery, Inc
    Pages1673-1682
    Number of pages10
    ISBN (Electronic)9781450386517
    DOIs
    Publication statusPublished - 17 Oct 2021
    Event29th ACM International Conference on Multimedia, MM 2021 - Virtual, Online, China
    Duration: 20 Oct 202124 Oct 2021

    Publication series

    NameMM 2021 - Proceedings of the 29th ACM International Conference on Multimedia

    Conference

    Conference29th ACM International Conference on Multimedia, MM 2021
    Country/TerritoryChina
    CityVirtual, Online
    Period20/10/2124/10/21

    Fingerprint

    Dive into the research topics of 'Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)'. Together they form a unique fingerprint.

    Cite this