Duodepth: Static Gesture Recognition Via Dual Depth Sensors

75 0 0.0 ( 0 )

Download Cite

Added by Ilya Chugunov

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Ilya Chugunov - Avideh Zakhor

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Static gesture recognition is an effective non-verbal communication channel between a user and their devices; however many modern methods are sensitive to the relative pose of the users hands with respect to the capture device, as parts of the gesture can become occluded. We present two methodologies for gesture recognition via synchronized recording from two depth cameras to alleviate this occlusion problem. One is a more classic approach using iterative closest point registration to accurately fuse point clouds and a single PointNet architecture for classification, and the other is a dual Point-Net architecture for classification without registration. On a manually collected data-set of 20,100 point clouds we show a 39.2% reduction in misclassification for the fused point cloud method, and 53.4% for the dual PointNet, when compared to a standard single camera pipeline.

rate research

A deep-learning--based multimodal depth-aware dynamic hand gesture recognition system

137 - Hasan Mahmud , Mashrur Mahmud Morshed , Md. Kamrul Hasan 2021

Any spatio-temporal movement or reorientation of the hand, done with the intention of conveying a specific meaning, can be considered as a hand gesture. Inputs to hand gesture recognition systems can be in several forms, such as depth images, monocular RGB, or skeleton joint points. We observe that raw depth images possess low contrasts in the hand regions of interest (ROI). They do not highlight important details to learn, such as finger bending information (whether a finger is overlapping the palm, or another finger). Recently, in deep-learning--based dynamic hand gesture recognition, researchers are tying to fuse different input modalities (e.g. RGB or depth images and hand skeleton joint points) to improve the recognition accuracy. In this paper, we focus on dynamic hand gesture (DHG) recognition using depth quantized image features and hand skeleton joint points. In particular, we explore the effect of using depth-quantized features in Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) based multi-modal fusion networks. We find that our method improves existing results on the SHREC-DHG-14 dataset. Furthermore, using our method, we show that it is possible to reduce the resolution of the input images by more than four times and still obtain comparable or better accuracy to that of the resolutions used in previous methods.

Computer Vision and Pattern Recognition Human-Computer Interaction Machine Learning

WaveGlove: Transformer-based hand gesture recognition using multiple inertial sensors

193 - Matej Kralik , Marek v{S}uppa 2021

Hand Gesture Recognition (HGR) based on inertial data has grown considerably in recent years, with the state-of-the-art approaches utilizing a single handheld sensor and a vocabulary comprised of simple gestures. In this work we explore the benefits of using multiple inertial sensors. Using WaveGlove, a custom hardware prototype in the form of a glove with five inertial sensors, we acquire two datasets consisting of over $11000$ samples. To make them comparable with prior work, they are normalized along with $9$ other publicly available datasets, and subsequently used to evaluate a range of Machine Learning approaches for gesture recognition, including a newly proposed Transformer-based architecture. Our results show that even complex gestures involving different fingers can be recognized with high accuracy. An ablation study performed on the acquired datasets demonstrates the importance of multiple sensors, with an increase in performance when using up to three sensors and no significant improvements beyond that.

Human-Computer Interaction Machine Learning Signal Processing

Appearance-based Gesture recognition in the compressed domain

64 - Shaojie Xu , Anvesha Amaravati , Justin Romberg 2019

We propose a novel appearance-based gesture recognition algorithm using compressed domain signal processing techniques. Gesture features are extracted directly from the compressed measurements, which are the block averages and the coded linear combinations of the image sensors pixel values. We also improve both the computational efficiency and the memory requirement of the previous DTW-based K-NN gesture classifiers. Both simulation testing and hardware implementation strongly support the proposed algorithm.

Computer Vision and Pattern Recognition Machine Learning Machine Learning

High Five: Improving Gesture Recognition by Embracing Uncertainty

50 - Diman Zad Tootaghaj , Adrian Sampson , Todd Mytkowicz 2017

Sensors on mobile devices---accelerometers, gyroscopes, pressure meters, and GPS---invite new applications in gesture recognition, gaming, and fitness tracking. However, programming them remains challenging because human gestures captured by sensors are noisy. This paper illustrates that noisy gestures degrade training and classification accuracy for gesture recognition in state-of-the-art deterministic Hidden Markov Models (HMM). We introduce a new statistical quantization approach that mitigates these problems by (1) during training, producing gesture-specific codebooks, HMMs, and error models for gesture sequences; and (2) during classification, exploiting the error model to explore multiple feasible HMM state sequences. We implement classification in Uncertain<t>, a probabilistic programming system that encapsulates HMMs and error models and then automates sampling and inference in the runtime. Uncertain<T> developers directly express a choice of application-specific trade-off between recall and precision at gesture recognition time, rather than at training time. We demonstrate benefits in configurability, precision, recall, and recognition on two data sets with 25 gestures from 28 people and 4200 total gestures. Incorporating gesture error more accurately in modeling improves the average recognition rate of 20 gestures from 34% in prior work to 62%. Incorporating the error model during classification further improves the average gesture recognition rate to 71%. As far as we are aware, no prior work shows how to generate an HMM error model during training and use it to improve classification rates.

Computer Vision and Pattern Recognition

CNN based Multistage Gated Average Fusion (MGAF) for Human Action Recognition Using Depth and Inertial Sensors

128 - Zeeshan Ahmad , Naimul khan 2020

Convolutional Neural Network (CNN) provides leverage to extract and fuse features from all layers of its architecture. However, extracting and fusing intermediate features from different layers of CNN structure is still uninvestigated for Human Action Recognition (HAR) using depth and inertial sensors. To get maximum benefit of accessing all the CNNs layers, in this paper, we propose novel Multistage Gated Average Fusion (MGAF) network which extracts and fuses features from all layers of CNN using our novel and computationally efficient Gated Average Fusion (GAF) network, a decisive integral element of MGAF. At the input of the proposed MGAF, we transform the depth and inertial sensor data into depth images called sequential front view images (SFI) and signal images (SI) respectively. These SFI are formed from the front view information generated by depth data. CNN is employed to extract feature maps from both input modalities. GAF network fuses the extracted features effectively while preserving the dimensionality of fused feature as well. The proposed MGAF network has structural extensibility and can be unfolded to more than two modalities. Experiments on three publicly available multimodal HAR datasets demonstrate that the proposed MGAF outperforms the previous state of the art fusion methods for depth-inertial HAR in terms of recognition accuracy while being computationally much more efficient. We increase the accuracy by an average of 1.5 percent while reducing the computational cost by approximately 50 percent over the previous state of the art.

Computer Vision and Pattern Recognition Machine Learning Multimedia