Learning a perspective-embedded deconvolution network for crowd counting

Muming Zhao, Jian Zhang, Fatih Porikli, Chongyang Zhang, Wenjun Zhang

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    8 Citations (Scopus)

    Abstract

    We present a novel deep learning framework for crowd counting by learning a perspective-embedded deconvolution network. Perspective is an inherent property of most surveillance scenes. Unlike the traditional approaches that exploit the perspective as a separate normalization, we propose to fuse the perspective into a deconvolution network, aiming to obtain a robust, accurate and consistent crowd density map. Through layer-wise fusion, we merge perspective maps at different resolutions into the deconvolution network. With the injection of perspective, our network is driven to learn to combine the underlying scene geometric constraints adaptively, thus enabling an accurate interpretation from high-level feature maps to the pixel-wise crowd density map. In addition, our network allows generating density map for arbitrary-sized input in an end-to-end fashion. The proposed method achieves competitive result on the WorldExpo2010 crowd dataset.

    Original languageEnglish
    Title of host publication2017 IEEE International Conference on Multimedia and Expo, ICME 2017
    PublisherIEEE Computer Society
    Pages403-408
    Number of pages6
    ISBN (Electronic)9781509060672
    DOIs
    Publication statusPublished - 28 Aug 2017
    Event2017 IEEE International Conference on Multimedia and Expo, ICME 2017 - Hong Kong, Hong Kong
    Duration: 10 Jul 201714 Jul 2017

    Publication series

    NameProceedings - IEEE International Conference on Multimedia and Expo
    ISSN (Print)1945-7871
    ISSN (Electronic)1945-788X

    Conference

    Conference2017 IEEE International Conference on Multimedia and Expo, ICME 2017
    Country/TerritoryHong Kong
    CityHong Kong
    Period10/07/1714/07/17

    Fingerprint

    Dive into the research topics of 'Learning a perspective-embedded deconvolution network for crowd counting'. Together they form a unique fingerprint.

    Cite this