Sage Journals: Discover world-class research

Abstract

With the recent developments in sensor technology and pose estimation algorithms, skeleton based action recognition has become popular. Classical machine learning methods based on hand-crafted features fail on large scale datasets due to their limited representation power. Recently, recurrent neural networks (RNN) based methods focus on the temporal evolution of body joints and neglect the geometric relations between them. In this paper, we propose eleven quadrilaterals to capture the geometric relations among joints for action recognition. An end-to-end 3-layer Bi-LSTM network is designed as Base-Net to learn robust representations. We propose two subnets based on the Base-Net to extract discriminative spatio temporal features. Specifically, the first subnet (SQuadNet) uses four spatial features and the second one (TQuadNet) uses two temporal features. The empirical results on two benchmark datasets, NTU RGB $+$ D and UTD MHAD, show how our method achieves state of the art performance when compared to recent methods in the literature.

Keywords

Action recognition skeleton maps quadrilateral geometric features LSTM

Get full access to this article

View all access options for this article.

References

Shotton

Fitzgibbon

Cook

Sharp

Finocchio

Moore

, et al. Real-time human pose recognition in parts from single depth images. In: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE; 2011. pp. 1297–1304.

Zhang

Liu

Xiao

. On geometric features for skeleton-based action recognition using multilayer lstm networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE; 2017. pp. 148–157.

Xia

Chen

Aggarwal

. View invariant human action recognition using histograms of 3d joints. In: Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on. IEEE; 2012. pp. 20–27.

Wang

Ogunbona

Gao

Zhang

. Mining mid-level features for action recognition based on effective skeleton representation. In: Digital lmage Computing: Techniques and Applications (DlCTA), 2014 International Conference on. IEEE; 2014. pp. 1–8.

Vemulapalli

Arrate

Chellappa

. Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2014. pp. 588–595.

Presti

La Cascia

. 3D skeleton-based human action classification: A survey. Pattern Recognition. 2016; 53: 130–147.

Wang

. Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2015. pp. 1110–1118.

Liu

Shahroudy

Wang

. Spatio-temporal lstm with trust gates for 3d human action recognition. In: European Conference on Computer Vision. Springer; 2016. pp. 816–833.

Shahroudy

Liu

Wang

. NTU RGB

+

D: A large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. pp. 1010–1019.

10.

Song

Lan

Xing

Zeng

Liu

. An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data. In: AAAI. Vol. 1; 2017. pp. 4263–4270.

11.

Wang

. Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. pp. 499–508.

12.

Müller

Röder

Clausen

. Efficient content-based retrieval of motion capture data. In: ACM Transactions on Graphics (ToG). Vol. 24. ACM; 2005. pp. 677–685.

13.

Kerola

Inoue

Shinoda

. Spectral graph skeletons for 3D action recognition. In: Asian Conference on Computer Vision. Springer; 2014. pp. 417–432.

14.

Hussein

Torki

Gowayyed

El-Saban

. Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations. In: IJCAI. Vol. 13; 2013. pp. 2466–2472.

15.

Climent-Pérez

Chaaraoui

Padilla-López

Flórez-Revuelta

. Optimal joint selection for skeletal data from RGB-D devices using a genetic algorithm. In: Mexican International Conference on Artificial Intelligence. Springer; 2012. pp. 163–174.

16.

Wang

Yuille

. An approach to pose-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2013. pp. 915–922.

17.

Chaudhry

Ofli

Kurillo

Bajcsy

Vidal

. Bio-inspired dynamic 3d discriminative skeletal features for human action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops; 2013. pp. 471–478.

18.

Presti

La Cascia

Sclaroff

Camps

. Gesture modeling by hanklet-based hidden markov model. In: Asian Conference on Computer Vision. Springer; 2014. pp. 529–546.

19.

Wang

. Skeleton based action recognition with convolutional neural network. In: Pattern Recognition (ACPR), 2015 3rd IAPR Asian Conference on. IEEE; 2015. pp. 579–583.

20.

Wang

Hou

. Action recognition based on joint trajectory maps using convolutional neural networks. In: Proceedings of the 2016 ACM on Multimedia Conference. ACM; 2016. pp. 102–106.

21.

Hou

Wang

. Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Processing Letters. 2017; 24(5): 624–628.

22.

Hochreiter

Schmidhuber

. Long short-term memory. Neural Computation. 1997; 9(8): 1735–1780.

23.

Ding

Wang

Ogunbona

. Investigation of different skeleton features for CNN-based 3D action recognition. In: Multimedia & Expo Workshops (ICMEW), 2017 IEEE International Conference on. IEEE; 2017. pp. 617–622.

24.

Chen

Jafari

Kehtarnavaz

. Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: Image Processing (ICIP), 2015 IEEE International Conference on. IEEE; 2015. pp. 168–172.

25.

Zhou

Zhang

Ogunbona

Nguyen

Zhang

. Discriminative key pose extraction using extended lc-ksvd for action recognition. In: Digital lmage Computing: Techniques and Applications (DlCTA), 2014 International Conference on. IEEE; 2014. pp. 1–8.

26.

Ohn-Bar

Trivedi

. Joint angles similarities and HOG2 for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops; 2013. pp. 465–470.

27.

Evangelidis

Singh

Horaud

. Skeletal quads: Human action recognition using joint quadruples. In: Pattern Recognition (ICPR), 2014 22nd International Conference on. IEEE; 2014. pp. 4513–4518.

28.

Huang

Wan

Probst

Van Gool

. Deep learning on lie groups for skeleton-based action recognition. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE computer Society; 2017. pp. 1243–1252.

29.

Lee

Kim

Kang

Lee

. Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks. In: 2017 IEEE International Conference on Computer Vision (ICCV). IEEE; 2017. pp. 1012–1020.

30.

Liu

Wang

Duan

Kot

. Global context-aware attention lstm networks for 3d action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Vol. 7; 2017. p. 43.

31.

Wangg

Wang

. Learning content and style: Joint action recognition and person identification from human skeletons. Pattern Recognition. 2018; 81: 23–35.

32.

Fan

Zhao

Lin

. Attention-based multiview re-observation fusion network for skeletal action recognition. IEEE Transactions on Multimedia. 2019; 21(2): 363–374.

33.

Núñez

Cabido

Pantrigo

Montemayor

Vélez

. Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition. Pattern Recognition. 2018; 76: 80–94.

Learning representations from quadrilateral based geometric features for skeleton-based action recognition using LSTM networks

Abstract

Keywords

Get full access to this article

References