TY - JOUR
T1 - 3DInAction
T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
AU - Ben-Shabat, Yizhak
AU - Shrout, Oren
AU - Gould, Stephen
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - We propose a novel method for 3D point cloud action recognition. Understanding human actions in RGB videos has been widely studied in recent years, however, its 3D point cloud counterpart remains under-explored despite the clear value that 3D information may bring. This is mostly due to the inherent limitation of the point cloud data modality—lack of structure, permutation invariance, and varying number of points—which makes it difficult to learn a spatio-temporal representation. To address this limitation, we propose the 3DinAction pipeline that first estimates patches moving in time (t-patches) as a key building block, alongside a hierarchical architecture that learns an informative spatio-temporal representation. We show that our method achieves improved performance on existing datasets, including DFAUST and IKEA ASM. Code is publicly available at https://github.com/sitzikbs/3dincaction.
AB - We propose a novel method for 3D point cloud action recognition. Understanding human actions in RGB videos has been widely studied in recent years, however, its 3D point cloud counterpart remains under-explored despite the clear value that 3D information may bring. This is mostly due to the inherent limitation of the point cloud data modality—lack of structure, permutation invariance, and varying number of points—which makes it difficult to learn a spatio-temporal representation. To address this limitation, we propose the 3DinAction pipeline that first estimates patches moving in time (t-patches) as a key building block, alongside a hierarchical architecture that learns an informative spatio-temporal representation. We show that our method achieves improved performance on existing datasets, including DFAUST and IKEA ASM. Code is publicly available at https://github.com/sitzikbs/3dincaction.
KW - 3D action recognition
KW - point clouds
KW - spatio-temporal representation
KW - temporal patches
UR - http://www.scopus.com/inward/record.url?scp=85218340761&partnerID=8YFLogxK
U2 - 10.1109/CVPR52733.2024.01888
DO - 10.1109/CVPR52733.2024.01888
M3 - Conference article
AN - SCOPUS:85218340761
SN - 1063-6919
SP - 19978
EP - 19987
JO - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
JF - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Y2 - 16 June 2024 through 22 June 2024
ER -