TY - GEN
T1 - Instance-aware detailed action labeling in videos
AU - Yang, Hongtao
AU - He, Xuming
AU - Porikli, Fatih
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/5/3
Y1 - 2018/5/3
N2 - We address the problem of detailed sequence labeling of complex activities in videos, which aims to assign an action label to every frame. Previous work typically focus on predicting action class labels for each frame in a sequence without reasoning action instances. However, such category-level labeling is inefficient in encoding the global constraints at the action instance level and tends to produce inconsistent results. In this work we consider a fusion approach that exploits the synergy between action detection and sequence labeling for complex activities. To this end, we propose an instance-aware sequence labeling method that utilizes the cues from action instance detection. In particular, we design an LSTM-based fusion network that integrates framewise action labeling and action instance prediction to produce a final consistent labeling. To evaluate our method, we create a large-scale RGBD video dataset on gym activities for sequence labeling and action detection called GADD. The experimental results on GADD dataset show that our method outperforms all the state-of-the-art methods consistently in terms of labeling accuracy.
AB - We address the problem of detailed sequence labeling of complex activities in videos, which aims to assign an action label to every frame. Previous work typically focus on predicting action class labels for each frame in a sequence without reasoning action instances. However, such category-level labeling is inefficient in encoding the global constraints at the action instance level and tends to produce inconsistent results. In this work we consider a fusion approach that exploits the synergy between action detection and sequence labeling for complex activities. To this end, we propose an instance-aware sequence labeling method that utilizes the cues from action instance detection. In particular, we design an LSTM-based fusion network that integrates framewise action labeling and action instance prediction to produce a final consistent labeling. To evaluate our method, we create a large-scale RGBD video dataset on gym activities for sequence labeling and action detection called GADD. The experimental results on GADD dataset show that our method outperforms all the state-of-the-art methods consistently in terms of labeling accuracy.
UR - http://www.scopus.com/inward/record.url?scp=85050980536&partnerID=8YFLogxK
U2 - 10.1109/WACV.2018.00175
DO - 10.1109/WACV.2018.00175
M3 - Conference contribution
T3 - Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018
SP - 1577
EP - 1586
BT - Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 18th IEEE Winter Conference on Applications of Computer Vision, WACV 2018
Y2 - 12 March 2018 through 15 March 2018
ER -