Coherency in One-Shot Gesture Recognition

169 0 0.0 ( 0 )

Download Cite

Added by Maria Cabrera

Publication date 2017

fields Informatics Engineering

and research's language is English

Authors Maria Cabrera - Richard Voyles - Juan Wachs

Human-Computer Interaction

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Users intentions may be expressed through spontaneous gesturing, which have been seen only a few times or never before. Recognizing such gestures involves one shot gesture learning. While most research has focused on the recognition of the gestures itself, recently new approaches were proposed to deal with gesture perception and production as part of the same problem. The framework presented in this work focuses on learning the process that leads to gesture generation, rather than mining the gestures associated features. This is achieved using kinematic, cognitive and biomechanic characteristics of human interaction. These factors enable the artificial production of realistic gesture samples originated from a single observation. The generated samples are then used as training sets for different state-of-the-art classifiers. Performance is obtained first, by observing the machines gesture recognition percentages. Then, performance is computed by the human recognition from gestures performed by robots. Based on these two scenarios, a composite new metric of coherency is proposed relating to the amount of agreement between these two conditions. Experimental results provide an average recognition performance of 89.2% for the trained classifiers and 92.5% for the participants. Coherency in recognition was determined at 93.6%. While this new metric is not directly comparable to raw accuracy or other pure performance-based standard metrics, it provides a quantifier for validating how realistic the machine generated samples are and how accurate the resulting mimicry is.

rate research

WaveGlove: Transformer-based hand gesture recognition using multiple inertial sensors

193 - Matej Kralik , Marek v{S}uppa 2021

Hand Gesture Recognition (HGR) based on inertial data has grown considerably in recent years, with the state-of-the-art approaches utilizing a single handheld sensor and a vocabulary comprised of simple gestures. In this work we explore the benefits of using multiple inertial sensors. Using WaveGlove, a custom hardware prototype in the form of a glove with five inertial sensors, we acquire two datasets consisting of over $11000$ samples. To make them comparable with prior work, they are normalized along with $9$ other publicly available datasets, and subsequently used to evaluate a range of Machine Learning approaches for gesture recognition, including a newly proposed Transformer-based architecture. Our results show that even complex gestures involving different fingers can be recognized with high accuracy. An ablation study performed on the acquired datasets demonstrates the importance of multiple sensors, with an increase in performance when using up to three sensors and no significant improvements beyond that.

Human-Computer Interaction Machine Learning Signal Processing

AirWare: Utilizing Embedded Audio and Infrared Signals for In-Air Hand-Gesture Recognition

70 - Nibhrat Lohia , Raunak Mundada , Arya D. McCarthy 2021

We introduce AirWare, an in-air hand-gesture recognition system that uses the already embedded speaker and microphone in most electronic devices, together with embedded infrared proximity sensors. Gestures identified by AirWare are performed in the air above a touchscreen or a mobile phone. AirWare utilizes convolutional neural networks to classify a large vocabulary of hand gestures using multi-modal audio Doppler signatures and infrared (IR) sensor information. As opposed to other systems which use high frequency Doppler radars or depth cameras to uniquely identify in-air gestures, AirWare does not require any external sensors. In our analysis, we use openly available APIs to interface with the Samsung Galaxy S5 audio and proximity sensors for data collection. We find that AirWare is not reliable enough for a deployable interaction system when trying to classify a gesture set of 21 gestures, with an average true positive rate of only 50.5% per gesture. To improve performance, we train AirWare to identify subsets of the 21 gestures vocabulary based on possible usage scenarios. We find that AirWare can identify three gesture sets with average true positive rate greater than 80% using 4--7 gestures per set, which comprises a vocabulary of 16 unique in-air gestures.

Human-Computer Interaction

Real-Time Head Gesture Recognition on Head-Mounted Displays using Cascaded Hidden Markov Models

136 - Jingbo Zhao , Robert S. Allison 2017

Head gesture is a natural means of face-to-face communication between people but the recognition of head gestures in the context of virtual reality and use of head gesture as an interface for interacting with virtual avatars and virtual environments have been rarely investigated. In the current study, we present an approach for real-time head gesture recognition on head-mounted displays using Cascaded Hidden Markov Models. We conducted two experiments to evaluate our proposed approach. In experiment 1, we trained the Cascaded Hidden Markov Models and assessed the offline classification performance using collected head motion data. In experiment 2, we characterized the real-time performance of the approach by estimating the latency to recognize a head gesture with recorded real-time classification data. Our results show that the proposed approach is effective in recognizing head gestures. The method can be integrated into a virtual reality system as a head gesture interface for interacting with virtual worlds.

Human-Computer Interaction

Integrated Speech and Gesture Synthesis

208 - Siyang Wang , Simon Alexanderson , Joakim Gustafson 2021

Text-to-speech and co-speech gesture synthesis have until now been treated as separate areas by two different research communities, and applications merely stack the two technologies using a simple system-level pipeline. This can lead to modeling inefficiencies and may introduce inconsistencies that limit the achievable naturalness. We propose to instead synthesize the two modalities in a single model, a new problem we call integrated speech and gesture synthesis (ISG). We also propose a set of models modified from state-of-the-art neural speech-synthesis engines to achieve this goal. We evaluate the models in three carefully-designed user studies, two of which evaluate the synthesized speech and gesture in isolation, plus a combined study that evaluates the models like they will be used in real-world applications -- speech and gesture presented together. The results show that participants rate one of the proposed integrated synthesis models as being as good as the state-of-the-art pipeline system we compare against, in all three tests. The model is able to achieve this with faster synthesis time and greatly reduced parameter count compared to the pipeline system, illustrating some of the potential benefits of treating speech and gesture synthesis together as a single, unified problem. Videos and code are available on our project page at https://swatsw.github.io/isg_icmi21/

Human-Computer Interaction Graphics Machine Learning

Gesture Agreement Assessment Using Description Vectors

317 - Naveen Madapana , Glebys Gonzalez , Juan Wachs 2019

Participatory design is a popular design technique that involves the end users in the early stages of the design process to obtain user-friendly gestural interfaces. Guessability studies followed by agreement analyses are often used to elicit and comprehend the preferences (or gestures/proposals) of the participants. Previous approaches to assess agreement, grouped the gestures into equivalence classes and ignored the integral properties that are shared between them. In this work, we represent the gestures using binary description vectors to allow them to be partially similar. In this context, we introduce a new metric referred to as soft agreement rate (SAR) to quantify the level of consensus between the participants. In addition, we performed computational experiments to study the behavior of our partial agreement formula and mathematically show that existing agreement metrics are a special case of our approach. Our methodology was evaluated through a gesture elicitation study conducted with a group of neurosurgeons. Nevertheless, our formulation can be applied to any other user-elicitation study. Results show that the level of agreement obtained by SAR metric is 2.64 times higher than the existing metrics. In addition to the mostly agreed gesture, SAR formulation also provides the mostly agreed descriptors which can potentially help the designers to come up with a final gesture set.

Human-Computer Interaction