TY - JOUR
T1 - Breaking video into pieces for action recognition
AU - Zheng, Ying
AU - Yao, Hongxun
AU - Sun, Xiaoshuai
AU - Jiang, Xuesong
AU - Porikli, Fatih
N1 - Publisher Copyright:
© 2017, Springer Science+Business Media, LLC.
PY - 2017/11/1
Y1 - 2017/11/1
N2 - We present a simple yet effective approach for human action recognition. Most of the existing solutions based on multi-class action classification aim to assign a class label for the input video. However, the variety and complexity of real-life videos make it very challenging to achieve high classification accuracy. To address this problem, we propose to partition the input video into small clips and formulate action recognition as a joint decision-making task. First, we partition all videos into two equal segments that are processed in the same manner. We repeat this procedure to obtain three layers of video subsegments, which are then organized in a binary tree structure. We train separate classifiers for each layer. By applying the corresponding classifiers to video subsegments, we obtain a decision value matrix (DVM). Then, we construct an aggregated representation for the original full-length video by integrating the elements of the DVM. Finally, we train a new action recognition classifier based on the DVM representation. Our extensive experimental evaluations demonstrate that the proposed method achieves significant performance improvement against several compared methods on two benchmark datasets.
AB - We present a simple yet effective approach for human action recognition. Most of the existing solutions based on multi-class action classification aim to assign a class label for the input video. However, the variety and complexity of real-life videos make it very challenging to achieve high classification accuracy. To address this problem, we propose to partition the input video into small clips and formulate action recognition as a joint decision-making task. First, we partition all videos into two equal segments that are processed in the same manner. We repeat this procedure to obtain three layers of video subsegments, which are then organized in a binary tree structure. We train separate classifiers for each layer. By applying the corresponding classifiers to video subsegments, we obtain a decision value matrix (DVM). Then, we construct an aggregated representation for the original full-length video by integrating the elements of the DVM. Finally, we train a new action recognition classifier based on the DVM representation. Our extensive experimental evaluations demonstrate that the proposed method achieves significant performance improvement against several compared methods on two benchmark datasets.
KW - Action recognition
KW - Decision value matrix
KW - Video partition
KW - Video representation
UR - http://www.scopus.com/inward/record.url?scp=85026770306&partnerID=8YFLogxK
U2 - 10.1007/s11042-017-5038-6
DO - 10.1007/s11042-017-5038-6
M3 - Article
SN - 1380-7501
VL - 76
SP - 22195
EP - 22212
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 21
ER -