ﻻ يوجد ملخص باللغة العربية
Previous methods for skeleton-based gesture recognition mostly arrange the skeleton sequence into a pseudo picture or spatial-temporal graph and apply deep Convolutional Neural Network (CNN) or Graph Convolutional Network (GCN) for feature extraction. Although achieving superior results, these methods have inherent limitations in dynamically capturing local features of interactive hand parts, and the computing efficiency still remains a serious issue. In this work, the self-attention mechanism is introduced to alleviate this problem. Considering the hierarchical structure of hand joints, we propose an efficient hierarchical self-attention network (HAN) for skeleton-based gesture recognition, which is based on pure self-attention without any CNN, RNN or GCN operators. Specifically, the joint self-attention module is used to capture spatial features of fingers, the finger self-attention module is designed to aggregate features of the whole hand. In terms of temporal features, the temporal self-attention module is utilized to capture the temporal dynamics of the fingers and the entire hand. Finally, these features are fused by the fusion self-attention module for gesture classification. Experiments show that our method achieves competitive results on three gesture recognition datasets with much lower computational complexity.
Wearable sensor based human activity recognition is a challenging problem due to difficulty in modeling spatial and temporal dependencies of sensor signals. Recognition models in closed-set assumption are forced to yield members of known activity cla
Deep learning approaches deliver state-of-the-art performance in recognition of spatiotemporal human motion data. However, one of the main challenges in these recognition tasks is limited available training data. Insufficient training data results in
Gesture recognition is a fundamental tool to enable novel interaction paradigms in a variety of application scenarios like Mixed Reality environments, touchless public kiosks, entertainment systems, and more. Recognition of hand gestures can be nowad
By extracting spatial and temporal characteristics in one network, the two-stream ConvNets can achieve the state-of-the-art performance in action recognition. However, such a framework typically suffers from the separately processing of spatial and t
Skeleton-based human action recognition has attracted great interest thanks to the easy accessibility of the human skeleton data. Recently, there is a trend of using very deep feedforward neural networks to model the 3D coordinates of joints without