TY - GEN
T1 - Human Pose Forecasting via Deep Markov Models
AU - Toyer, Sam
AU - Cherian, Anoop
AU - Han, Tengda
AU - Gould, Stephen
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/12/19
Y1 - 2017/12/19
N2 - Human pose forecasting is an important problem in computer vision with applications to human-robot interaction, visual surveillance, and autonomous driving. Usually, forecasting algorithms use 3D skeleton sequences and are trained to forecast for a few milliseconds into the future. Long-range forecasting is challenging due to the difficulty of estimating how long a person continues an activity. To this end, our contributions are threefold: (i) we propose a generative framework for poses using variational autoencoders based on Deep Markov Models (DMMs); (ii) we evaluate our pose forecasts using a pose-based action classifier, which we argue better reflects the subjective quality of pose forecasts than distance in coordinate space; (iii) last, for evaluation of the new model, we introduce a 480,000-frame video dataset called Ikea Furniture Assembly (Ikea FA), which depicts humans repeatedly assembling and disassembling furniture. We demonstrate promising results for our approach on both Ikea FA and the existing NTU RGB+D dataset.
AB - Human pose forecasting is an important problem in computer vision with applications to human-robot interaction, visual surveillance, and autonomous driving. Usually, forecasting algorithms use 3D skeleton sequences and are trained to forecast for a few milliseconds into the future. Long-range forecasting is challenging due to the difficulty of estimating how long a person continues an activity. To this end, our contributions are threefold: (i) we propose a generative framework for poses using variational autoencoders based on Deep Markov Models (DMMs); (ii) we evaluate our pose forecasts using a pose-based action classifier, which we argue better reflects the subjective quality of pose forecasts than distance in coordinate space; (iii) last, for evaluation of the new model, we introduce a 480,000-frame video dataset called Ikea Furniture Assembly (Ikea FA), which depicts humans repeatedly assembling and disassembling furniture. We demonstrate promising results for our approach on both Ikea FA and the existing NTU RGB+D dataset.
UR - http://www.scopus.com/inward/record.url?scp=85048254184&partnerID=8YFLogxK
U2 - 10.1109/DICTA.2017.8227441
DO - 10.1109/DICTA.2017.8227441
M3 - Conference contribution
T3 - DICTA 2017 - 2017 International Conference on Digital Image Computing: Techniques and Applications
SP - 1
EP - 8
BT - DICTA 2017 - 2017 International Conference on Digital Image Computing
A2 - Guo, Yi
A2 - Murshed, Manzur
A2 - Wang, Zhiyong
A2 - Feng, David Dagan
A2 - Li, Hongdong
A2 - Cai, Weidong Tom
A2 - Gao, Junbin
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2017
Y2 - 29 November 2017 through 1 December 2017
ER -