A Dynamic Modelling Framework for Human Hand Gesture Task Recognition

85 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Sara Masoud

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Sara Masoud - Bijoy Chowdhury - Young-Jun Son

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Gesture recognition and hand motion tracking are important tasks in advanced gesture based interaction systems. In this paper, we propose to apply a sliding windows filtering approach to sample the incoming streams of data from data gloves and a decision tree model to recognize the gestures in real time for a manual grafting operation of a vegetable seedling propagation facility. The sequence of these recognized gestures defines the tasks that are taking place, which helps to evaluate individuals performances and to identify any bottlenecks in real time. In this work, two pairs of data gloves are utilized, which reports the location of the fingers, hands, and wrists wirelessly (i.e., via Bluetooth). To evaluate the performance of the proposed framework, a preliminary experiment was conducted in multiple lab settings of tomato grafting operations, where multiple subjects wear the data gloves while performing different tasks. Our results show an accuracy of 91% on average, in terms of gesture recognition in real time by employing our proposed framework.

قيم البحث

79 - Yi Zhang , Chong Wang , Ye Zheng 2019

The purpose of gesture recognition is to recognize meaningful movements of human bodies, and gesture recognition is an important issue in computer vision. In this paper, we present a multimodal gesture recognition method based on 3D densely convoluti onal networks (3D-DenseNets) and improved temporal convolutional networks (TCNs). The key idea of our approach is to find a compact and effective representation of spatial and temporal features, which orderly and separately divide task of gesture video analysis into two parts: spatial analysis and temporal analysis. In spatial analysis, we adopt 3D-DenseNets to learn short-term spatio-temporal features effectively. Subsequently, in temporal analysis, we use TCNs to extract temporal features and employ improved Squeeze-and-Excitation Networks (SENets) to strengthen the representational power of temporal features from each TCNs layers. The method has been evaluated on the VIVA and the NVIDIA Gesture Dynamic Hand Gesture Datasets. Our approach obtains very competitive performance on VIVA benchmarks with the classification accuracies of 91.54%, and achieve state-of-the art performance with 86.37% accuracy on NVIDIA benchmark.

الرؤية الحاسوبية وتمييز الأنماط معالجة الصور والفيديو

A deep-learning--based multimodal depth-aware dynamic hand gesture recognition system

137 - Hasan Mahmud , Mashrur Mahmud Morshed , Md. Kamrul Hasan 2021

Any spatio-temporal movement or reorientation of the hand, done with the intention of conveying a specific meaning, can be considered as a hand gesture. Inputs to hand gesture recognition systems can be in several forms, such as depth images, monocul ar RGB, or skeleton joint points. We observe that raw depth images possess low contrasts in the hand regions of interest (ROI). They do not highlight important details to learn, such as finger bending information (whether a finger is overlapping the palm, or another finger). Recently, in deep-learning--based dynamic hand gesture recognition, researchers are tying to fuse different input modalities (e.g. RGB or depth images and hand skeleton joint points) to improve the recognition accuracy. In this paper, we focus on dynamic hand gesture (DHG) recognition using depth quantized image features and hand skeleton joint points. In particular, we explore the effect of using depth-quantized features in Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) based multi-modal fusion networks. We find that our method improves existing results on the SHREC-DHG-14 dataset. Furthermore, using our method, we show that it is possible to reduce the resolution of the input images by more than four times and still obtain comparable or better accuracy to that of the resolutions used in previous methods.

الرؤية الحاسوبية وتمييز الأنماط تفاعل الإنسان والحاسوب التعلم الآلي

The Imaginative Generative Adversarial Network: Automatic Data Augmentation for Dynamic Skeleton-Based Hand Gesture and Human Action Recognition

152 - Junxiao Shen , John Dudley , Per Ola Kristensson 2021

Deep learning approaches deliver state-of-the-art performance in recognition of spatiotemporal human motion data. However, one of the main challenges in these recognition tasks is limited available training data. Insufficient training data results in over-fitting and data augmentation is one approach to address this challenge. Existing data augmentation strategies, such as transformations including scaling, shifting and interpolating, require hyperparameter optimization that can easily cost hundreds of GPU hours. In this paper, we present a novel automatic data augmentation model, the Imaginative Generative Adversarial Network (GAN) that approximates the distribution of the input data and samples new data from this distribution. It is automatic in that it requires no data inspection and little hyperparameter tuning and therefore it is a low-cost and low-effort approach to generate synthetic data. The proposed data augmentation strategy is fast to train and the synthetic data leads to higher recognition accuracy than using data augmented with a classical approach.

الرؤية الحاسوبية وتمييز الأنماط تفاعل الإنسان والحاسوب

An IoT Based Framework For Activity Recognition Using Deep Learning Technique

180 - Ashwin Geet DSa , B. G. Prasad 2019

Activity recognition is the ability to identify and recognize the action or goals of the agent. The agent can be any object or entity that performs action that has end goals. The agents can be a single agent performing the action or group of agents p erforming the actions or having some interaction. Human activity recognition has gained popularity due to its demands in many practical applications such as entertainment, healthcare, simulations and surveillance systems. Vision based activity recognition is gaining advantage as it does not require any human intervention or physical contact with humans. Moreover, there are set of cameras that are networked with the intention to track and recognize the activities of the agent. Traditional applications that were required to track or recognize human activities made use of wearable devices. However, such applications require physical contact of the person. To overcome such challenges, vision based activity recognition system can be used, which uses a camera to record the video and a processor that performs the task of recognition. The work is implemented in two stages. In the first stage, an approach for the Implementation of Activity recognition is proposed using background subtraction of images, followed by 3D- Convolutional Neural Networks. The impact of using Background subtraction prior to 3D-Convolutional Neural Networks has been reported. In the second stage, the work is further extended and implemented on Raspberry Pi, that can be used to record a stream of video, followed by recognizing the activity that was involved in the video. Thus, a proof-of-concept for activity recognition using small, IoT based device, is provided, which can enhance the system and extend its applications in various forms like, increase in portability, networking, and other capabilities of the device.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Gated Task Interaction Framework for Multi-task Sequence Tagging

260 - Isaac K. E. Ampomah , Sally McClean , Zhiwei Lin 2019

Recent studies have shown that neural models can achieve high performance on several sequence labelling/tagging problems without the explicit use of linguistic features such as part-of-speech (POS) tags. These models are trained only using the charac ter-level and the word embedding vectors as inputs. Others have shown that linguistic features can improve the performance of neural models on tasks such as chunking and named entity recognition (NER). However, the change in performance depends on the degree of semantic relatedness between the linguistic features and the target task; in some instances, linguistic features can have a negative impact on performance. This paper presents an approach to jointly learn these linguistic features along with the target sequence labelling tasks with a new multi-task learning (MTL) framework called Gated Tasks Interaction (GTI) network for solving multiple sequence tagging tasks. The GTI network exploits the relations between the multiple tasks via neural gate modules. These gate modules control the flow of information between the different tasks. Experiments on benchmark datasets for chunking and NER show that our framework outperforms other competitive baselines trained with and without external training resources.

التعلم الآلي الحساب واللغة التعلم الالي