TY - GEN
T1 - Multi-level action detection via learning latent structure
AU - Bozorgtabar, Behzad
AU - Goecke, Roland
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/12/9
Y1 - 2015/12/9
N2 - Detecting actions in videos is still a demanding task due to large intra-class variation caused by varying pose, motion and scales. Conventional approaches use a Bag-of-Words model in the form of space-time motion feature pooling followed by learning a classifier. However, since the informative body parts motion only appear in specific regions of the body, these methods have limited capability. In this paper, we seek to learn a model of the interaction among regions of interest via a graph structure. We first discover several space-time video segments representing persistent moving body parts observed sparsely in video. Then, via learning the hidden graph structure (a subset of the graph), we identify both spatial and temporal relations between the subsets of these segments. In order to seize the more discriminative motion patterns and handle different interactions between body parts from simple to composite action, we present a multi-level action model representation. Consequently, for action classification, the classifier learned through each action model labels the test video based on the action model that gives the highest probability score. Experiments on challenging datasets, such as MSR II and UCF-Sports including complex motions and dynamic backgrounds, demonstrate the effectiveness of the proposed approach that outperforms state-of-the-art methods in this context.
AB - Detecting actions in videos is still a demanding task due to large intra-class variation caused by varying pose, motion and scales. Conventional approaches use a Bag-of-Words model in the form of space-time motion feature pooling followed by learning a classifier. However, since the informative body parts motion only appear in specific regions of the body, these methods have limited capability. In this paper, we seek to learn a model of the interaction among regions of interest via a graph structure. We first discover several space-time video segments representing persistent moving body parts observed sparsely in video. Then, via learning the hidden graph structure (a subset of the graph), we identify both spatial and temporal relations between the subsets of these segments. In order to seize the more discriminative motion patterns and handle different interactions between body parts from simple to composite action, we present a multi-level action model representation. Consequently, for action classification, the classifier learned through each action model labels the test video based on the action model that gives the highest probability score. Experiments on challenging datasets, such as MSR II and UCF-Sports including complex motions and dynamic backgrounds, demonstrate the effectiveness of the proposed approach that outperforms state-of-the-art methods in this context.
KW - Action detection
KW - latent structure
KW - multi-level video representation
KW - pooling regions
UR - http://www.scopus.com/inward/record.url?scp=84956698372&partnerID=8YFLogxK
U2 - 10.1109/ICIP.2015.7351354
DO - 10.1109/ICIP.2015.7351354
M3 - Conference contribution
T3 - Proceedings - International Conference on Image Processing, ICIP
SP - 3004
EP - 3008
BT - 2015 IEEE International Conference on Image Processing, ICIP 2015 - Proceedings
PB - IEEE Computer Society
T2 - IEEE International Conference on Image Processing, ICIP 2015
Y2 - 27 September 2015 through 30 September 2015
ER -