Multiview Detection with Feature Perspective Transformation

Yunzhong Hou, Liang Zheng*, Stephen Gould

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    56 Citations (Scopus)

    Abstract

    Incorporating multiple camera views for detection alleviates the impact of occlusions in crowded scenes. In a multiview detection system, we need to answer two important questions. First, how should we aggregate cues from multiple views? Second, how should we aggregate information from spatially neighboring locations? To address these questions, we introduce a novel multiview detector, MVDet. During multiview aggregation, for each location on the ground, existing methods use multiview anchor box features as representation, which potentially limits performance as pre-defined anchor boxes can be inaccurate. In contrast, via feature map perspective transformation, MVDet employs anchor-free representations with feature vectors directly sampled from corresponding pixels in multiple views. For spatial aggregation, different from previous methods that require design and operations outside of neural networks, MVDet takes a fully convolutional approach with large convolutional kernels on the multiview aggregated feature map. The proposed model is end-to-end learnable and achieves 88.2% MODA on Wildtrack dataset, outperforming the state-of-the-art by 14.1%. We also provide detailed analysis of MVDet on a newly introduced synthetic dataset, MultiviewX, which allows us to control the level of occlusion. Code and MultiviewX dataset are available at https://github.com/hou-yz/MVDet.

    Original languageEnglish
    Title of host publicationComputer Vision – ECCV 2020 - 16th European Conference, 2020, Proceedings
    EditorsAndrea Vedaldi, Horst Bischof, Thomas Brox, Jan-Michael Frahm
    PublisherSpringer Science and Business Media Deutschland GmbH
    Pages1-18
    Number of pages18
    ISBN (Print)9783030585709
    DOIs
    Publication statusPublished - 2020
    Event16th European Conference on Computer Vision, ECCV 2020 - Glasgow, United Kingdom
    Duration: 23 Aug 202028 Aug 2020

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume12352 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference16th European Conference on Computer Vision, ECCV 2020
    Country/TerritoryUnited Kingdom
    CityGlasgow
    Period23/08/2028/08/20

    Fingerprint

    Dive into the research topics of 'Multiview Detection with Feature Perspective Transformation'. Together they form a unique fingerprint.

    Cite this