Scale-aware crowd counting via depth-embedded convolutional neural networks

Muming Zhao, Chongyang Zhang*, Jian Zhang, Fatih Porikli, Bingbing Ni, Wenjun Zhang

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    21 Citations (Scopus)

    Abstract

    Scale variation of pedestrians in a crowd image presents a significant challenge for vision-based people counting systems. Such variations are mainly caused by perspective-related distortions due to the camera pose relative to the ground plane. Following the density-based counting paradigm, we postulate that generating density values adaptive to object scales plays a critical role in the accuracy of the final counting results. Motivated by this, we distill the underlying information from depth cues to obtain scale-aware representations that can respond to object scales considering the fact that the scale is inversely proportional to the object depth. Specifically, we propose a depth embedding module as add-ons into existing networks. This module exploits essential depth cues to spatially re-calibrate the magnitude of the original features. In this way, the objects, although in the same class, will attain distinct representations according to their scales, which directly benefits the estimation of scale-aware density values. We conduct a comprehensive analysis of the effects of the depth embedding module and validate that exploiting depth cues to perceive object scale variations in convolutional neural networks improves crowd counting performances. Our experiments demonstrate the effectiveness of the proposed approach on four popular benchmark datasets.

    Original languageEnglish
    Article number8846233
    Pages (from-to)3651-3662
    Number of pages12
    JournalIEEE Transactions on Circuits and Systems for Video Technology
    Volume30
    Issue number10
    DOIs
    Publication statusPublished - Oct 2020

    Fingerprint

    Dive into the research topics of 'Scale-aware crowd counting via depth-embedded convolutional neural networks'. Together they form a unique fingerprint.

    Cite this