TY - GEN
T1 - 3D Human Pose Estimation with 2D Human Pose and Depthmap
AU - Zhou, Zhiheng
AU - Cao, Yue
AU - Zhu, Xuanying
AU - Gardner, Henry
AU - Li, Hongdong
N1 - Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020
Y1 - 2020
N2 - Three-dimensional human pose estimation models are conventionally based on RGB images or by assuming that accurately-estimated (near to ground truth) 2D human pose landmarks are available. Naturally, such data only contains information about two dimensions, while the 3D poses require the three dimensions of height, width, and depth. In this paper, we propose a new 3D human pose estimation model that takes an estimated 2D pose and the depthmap of the 2D pose as input to estimate 3D human pose. In our system, the estimated 2D pose is obtained from processing an RGB image using a 2D landmark detection network that produces noisy heatmap data. We compare our results with a Simple Linear Model (SLM) of other authors that takes accurately-estimated 2D pose landmarks as input and that has reached the state-of-the-art results for 3D human pose estimate using the Human3.6m dataset. Our results show that our model can achieve better performance than the SLM, and that our model can align the 2D landmark data with the depthmap automatically. We have also tested our network using estimated 2D poses and depthmaps separately. In our model, all three conditions (depthmap+2D pose, depthmap-only and 2D pose-only) are more accurate than the SLM with, surprisingly, the depthmap-only condition being comparable in accuracy with the depthmap+2D pose condition.
AB - Three-dimensional human pose estimation models are conventionally based on RGB images or by assuming that accurately-estimated (near to ground truth) 2D human pose landmarks are available. Naturally, such data only contains information about two dimensions, while the 3D poses require the three dimensions of height, width, and depth. In this paper, we propose a new 3D human pose estimation model that takes an estimated 2D pose and the depthmap of the 2D pose as input to estimate 3D human pose. In our system, the estimated 2D pose is obtained from processing an RGB image using a 2D landmark detection network that produces noisy heatmap data. We compare our results with a Simple Linear Model (SLM) of other authors that takes accurately-estimated 2D pose landmarks as input and that has reached the state-of-the-art results for 3D human pose estimate using the Human3.6m dataset. Our results show that our model can achieve better performance than the SLM, and that our model can align the 2D landmark data with the depthmap automatically. We have also tested our network using estimated 2D poses and depthmaps separately. In our model, all three conditions (depthmap+2D pose, depthmap-only and 2D pose-only) are more accurate than the SLM with, surprisingly, the depthmap-only condition being comparable in accuracy with the depthmap+2D pose condition.
KW - 3D Pose Estimation
KW - Convolution Neural Network
KW - Depthmap
UR - http://www.scopus.com/inward/record.url?scp=85097269921&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-63820-7_30
DO - 10.1007/978-3-030-63820-7_30
M3 - Conference contribution
SN - 9783030638191
T3 - Communications in Computer and Information Science
SP - 267
EP - 274
BT - Neural Information Processing - 27th International Conference, ICONIP 2020, Proceedings
A2 - Yang, Haiqin
A2 - Pasupa, Kitsuchart
A2 - Leung, Andrew Chi-Sing
A2 - Kwok, James T.
A2 - Chan, Jonathan H.
A2 - King, Irwin
PB - Springer Science and Business Media Deutschland GmbH
T2 - 27th International Conference on Neural Information Processing, ICONIP 2020
Y2 - 18 November 2020 through 22 November 2020
ER -