Human Action Recognition and Assessment via Deep Neural Network Self-Organization

90 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل German I. Parisi

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف German I. Parisi

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The robust recognition and assessment of human actions are crucial in human-robot interaction (HRI) domains. While state-of-the-art models of action perception show remarkable results in large-scale action datasets, they mostly lack the flexibility, robustness, and scalability needed to operate in natural HRI scenarios which require the continuous acquisition of sensory information as well as the classification or assessment of human body patterns in real time. In this chapter, I introduce a set of hierarchical models for the learning and recognition of actions from depth maps and RGB images through the use of neural network self-organization. A particularity of these models is the use of growing self-organizing networks that quickly adapt to non-stationary distributions and implement dedicated mechanisms for continual learning from temporally correlated input.

قيم البحث

132 - Behnoosh Parsa , Athma Narayanan , Behzad Dariush 2019

Recognition of human actions and associated interactions with objects and the environment is an important problem in computer vision due to its potential applications in a variety of domains. The most versatile methods can generalize to various envir onments and deal with cluttered backgrounds, occlusions, and viewpoint variations. Among them, methods based on graph convolutional networks that extract features from the skeleton have demonstrated promising performance. In this paper, we propose a novel Spatio-Temporal Pyramid Graph Convolutional Network (ST-PGN) for online action recognition for ergonomic risk assessment that enables the use of features from all levels of the skeleton feature hierarchy. The proposed algorithm outperforms state-of-art action recognition algorithms tested on two public benchmark datasets typically used for postural assessment (TUM and UW-IOM). We also introduce a pipeline to enhance postural assessment methods with online action recognition techniques. Finally, the proposed algorithm is integrated with a traditional ergonomic risk index (REBA) to demonstrate the potential value for assessment of musculoskeletal disorders in occupational safety.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Learning Graph Convolutional Network for Skeleton-based Human Action Recognition by Neural Searching

89 - Wei Peng , Xiaopeng Hong , Haoyu Chen 2019

Human action recognition from skeleton data, fueled by the Graph Convolutional Network (GCN), has attracted lots of attention, due to its powerful capability of modeling non-Euclidean structure data. However, many existing GCN methods provide a pre-d efined graph and fix it through the entire network, which can loss implicit joint correlations. Besides, the mainstream spectral GCN is approximated by one-order hop, thus higher-order connections are not well involved. Therefore, huge efforts are required to explore a better GCN architecture. To address these problems, we turn to Neural Architecture Search (NAS) and propose the first automatically designed GCN for skeleton-based action recognition. Specifically, we enrich the search space by providing multiple dynamic graph modules after fully exploring the spatial-temporal correlations between nodes. Besides, we introduce multiple-hop modules and expect to break the limitation of representational capacity caused by one-order approximation. Moreover, a sampling- and memory-efficient evolution strategy is proposed to search an optimal architecture for this task. The resulted architecture proves the effectiveness of the higher-order approximation and the dynamic graph modeling mechanism with temporal interactions, which is barely discussed before. To evaluate the performance of the searched model, we conduct extensive experiments on two very large scaled datasets and the results show that our model gets the state-of-the-art results.

الرؤية الحاسوبية وتمييز الأنماط

Deep Neural Network Perception Models and Robust Autonomous Driving Systems

102 - Mohammad Javad Shafiee , Ahmadreza Jeddi , Amir Nazemi 2020

This paper analyzes the robustness of deep learning models in autonomous driving applications and discusses the practical solutions to address that.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي علم الروبوتات

Scale Coding Bag of Deep Features for Human Attribute and Action Recognition

88 - Fahad Shahbaz Khan , Joost van de Weijer , Rao Muhammad Anwer 2016

Most approaches to human attribute and action recognition in still images are based on image representation in which multi-scale local features are pooled across scale into a single, scale-invariant encoding. Both in bag-of-words and the recently pop ular representations based on convolutional neural networks, local features are computed at multiple scales. However, these multi-scale convolutional features are pooled into a single scale-invariant representation. We argue that entirely scale-invariant image representations are sub-optimal and investigate approaches to scale coding within a Bag of Deep Features framework. Our approach encodes multi-scale information explicitly during the image encoding stage. We propose two strategies to encode multi-scale information explicitly in the final image representation. We validate our two scale coding techniques on five datasets: Willow, PASCAL VOC 2010, PASCAL VOC 2012, Stanford-40 and Human Attributes (HAT-27). On all datasets, the proposed scale coding approaches outperform both the scale-invariant method and the standard deep features of the same network. Further, combining our scale coding approaches with standard deep features leads to consistent improvement over the state-of-the-art.

الرؤية الحاسوبية وتمييز الأنماط

RSA: Randomized Simulation as Augmentation for Robust Human Action Recognition

101 - Yi Zhang , Xinyue Wei , Weichao Qiu 2019

Despite the rapid growth in datasets for video activity, stable robust activity recognition with neural networks remains challenging. This is in large part due to the explosion of possible variation in video -- including lighting changes, object vari ation, movement variation, and changes in surrounding context. An alternative is to make use of simulation data, where all of these factors can be artificially controlled. In this paper, we propose the Randomized Simulation as Augmentation (RSA) framework which augments real-world training data with synthetic data to improve the robustness of action recognition networks. We generate large-scale synthetic datasets with randomized nuisance factors. We show that training with such extra data, when appropriately constrained, can significantly improve the performance of the state-of-the-art I3D networks or, conversely, reduce the number of labeled real videos needed to achieve good performance. Experiments on two real-world datasets NTU RGB+D and VIRAT demonstrate the effectiveness of our method.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو