TY - GEN
T1 - Multiview Detection with Cardboard Human Modeling
AU - Ma, Jiahao
AU - Duan, Zicheng
AU - Zheng, Liang
AU - Nguyen, Chuong
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Multiview detection uses multiple calibrated cameras with overlapping fields of view to locate occluded pedestrians. In this field, existing methods typically adopt a “human modeling - aggregation” strategy. To find robust pedestrian representations, some intuitively incorporate 2D perception results from each frame, while others use entire frame features projected to the ground plane. However, the former does not consider the human appearance and leads to many ambiguities, and the latter suffers from projection errors due to the lack of accurate height of the human torso and head. In this paper, we propose a new pedestrian representation scheme based on human point cloud modeling. Specifically, using ray tracing for holistic human depth estimation, we model pedestrians as upright, thin cardboard point clouds on the ground. Then, we aggregate the point clouds of the pedestrian cardboard across multiple views for a final decision. Compared with existing representations, the proposed method explicitly leverages human appearance and reduces projection errors significantly by relatively accurate height estimation. On four standard evaluation benchmarks, our method achieves very competitive results. The code and data are available at https://github.com/Jiahao-Ma/MvCHM.
AB - Multiview detection uses multiple calibrated cameras with overlapping fields of view to locate occluded pedestrians. In this field, existing methods typically adopt a “human modeling - aggregation” strategy. To find robust pedestrian representations, some intuitively incorporate 2D perception results from each frame, while others use entire frame features projected to the ground plane. However, the former does not consider the human appearance and leads to many ambiguities, and the latter suffers from projection errors due to the lack of accurate height of the human torso and head. In this paper, we propose a new pedestrian representation scheme based on human point cloud modeling. Specifically, using ray tracing for holistic human depth estimation, we model pedestrians as upright, thin cardboard point clouds on the ground. Then, we aggregate the point clouds of the pedestrian cardboard across multiple views for a final decision. Compared with existing representations, the proposed method explicitly leverages human appearance and reduces projection errors significantly by relatively accurate height estimation. On four standard evaluation benchmarks, our method achieves very competitive results. The code and data are available at https://github.com/Jiahao-Ma/MvCHM.
KW - Multi-view detection
KW - Pedestrian detection
UR - http://www.scopus.com/inward/record.url?scp=85212941219&partnerID=8YFLogxK
U2 - 10.1007/978-981-96-0960-4_4
DO - 10.1007/978-981-96-0960-4_4
M3 - Conference contribution
AN - SCOPUS:85212941219
SN - 9789819609598
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 53
EP - 70
BT - Computer Vision – ACCV 2024 - 17th Asian Conference on Computer Vision, Proceedings
A2 - Cho, Minsu
A2 - Laptev, Ivan
A2 - Tran, Du
A2 - Yao, Angela
A2 - Zha, Hongbin
PB - Springer Science and Business Media Deutschland GmbH
T2 - 17th Asian Conference on Computer Vision, ACCV 2024
Y2 - 8 December 2024 through 12 December 2024
ER -