ﻻ يوجد ملخص باللغة العربية
Human action recognition is an active research area in computer vision. Although great process has been made, previous methods mostly recognize actions based on depth data at only one scale, and thus they often neglect multi-scale features that provide additional information action recognition in practical application scenarios. In this paper, we present a novel framework focusing on multi-scale motion information to recognize human actions from depth video sequences. We propose a multi-scale feature map called Laplacian pyramid depth motion images(LP-DMI). We employ depth motion images (DMI) as the templates to generate the multi-scale static representation of actions. Then, we caculate LP-DMI to enhance multi-scale dynamic information of motions and reduces redundant static information in human bodies. We further extract the multi-granularity descriptor called LP-DMI-HOG to provide more discriminative features. Finally, we utilize extreme learning machine (ELM) for action classification. The proposed method yeilds the recognition accuracy of 93.41%, 85.12%, 91.94% on public MSRAction3D dataset, UTD-MHAD and DHA dataset. Through extensive experiments, we prove that our method outperforms state-of-the-art benchmarks.
We investigate the problem of representing an entire video using CNN features for human action recognition. Currently, limited by GPU memory, we have not been able to feed a whole video into CNN/RNNs for end-to-end learning. A common practice is to u
Learning actions from human demonstration video is promising for intelligent robotic systems. Extracting the exact section and re-observing the extracted video section in detail is important for imitating complex skills because human motions give val
Graph convolutional networks (GCNs) can effectively capture the features of related nodes and improve the performance of the model. More attention is paid to employing GCN in Skeleton-Based action recognition. But existing methods based on GCNs have
Training deep learning based video classifiers for action recognition requires a large amount of labeled videos. The labeling process is labor-intensive and time-consuming. On the other hand, large amount of weakly-labeled images are uploaded to the
Getting the distance to objects is crucial for autonomous vehicles. In instances where depth sensors cannot be used, this distance has to be estimated from RGB cameras. As opposed to cars, the task of estimating depth from on-board mounted cameras is