ترغب بنشر مسار تعليمي؟ اضغط هنا

Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points

83   0   0.0 ( 0 )
 نشر من قبل Fabien Baradel
 تاريخ النشر 2018
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We propose a method for human activity recognition from RGB data that does not rely on any pose information during test time and does not explicitly calculate pose information internally. Instead, a visual attention module learns to predict glimpse sequences in each frame. These glimpses correspond to interest points in the scene that are relevant to the classified activities. No spatial coherence is forced on the glimpse locations, which gives the module liberty to explore different points at each frame and better optimize the process of scrutinizing visual information. Tracking and sequentially integrating this kind of unstructured data is a challenge, which we address by separating the set of glimpses from a set of recurrent tracking/recognition workers. These workers receive glimpses, jointly performing subsequent motion tracking and activity prediction. The glimpses are soft-assigned to the workers, optimizing coherence of the assignments in space, time and feature space using an external memory module. No hard decisions are taken, i.e. each glimpse point is assigned to all existing workers, albeit with different importance. Our methods outperform state-of-the-art methods on the largest human activity recognition dataset available to-date; NTU RGB+D Dataset, and on a smaller human action recognition dataset Northwestern-UCLA Multiview Action 3D Dataset. Our code is publicly available at https://github.com/fabienbaradel/glimpse_clouds.



قيم البحث

اقرأ أيضاً

Privacy protection from surreptitious video recordings is an important societal challenge. We desire a computer vision system (e.g., a robot) that can recognize human activities and assist our daily life, yet ensure that it is not recording video tha t may invade our privacy. This paper presents a fundamental approach to address such contradicting objectives: human activity recognition while only using extreme low-resolution (e.g., 16x12) anonymized videos. We introduce the paradigm of inverse super resolution (ISR), the concept of learning the optimal set of image transformations to generate multiple low-resolution (LR) training videos from a single video. Our ISR learns different types of sub-pixel transformations optimized for the activity classification, allowing the classifier to best take advantage of existing high-resolution videos (e.g., YouTube videos) by creating multiple LR training videos tailored for the problem. We experimentally confirm that the paradigm of inverse super resolution is able to benefit activity recognition from extreme low-resolution videos.
Wearable sensor-based human activity recognition (HAR) has been a research focus in the field of ubiquitous and mobile computing for years. In recent years, many deep models have been applied to HAR problems. However, deep learning methods typically require a large amount of data for models to generalize well. Significant variances caused by different participants or diverse sensor devices limit the direct application of a pre-trained model to a subject or device that has not been seen before. To address these problems, we present an invariant feature learning framework (IFLF) that extracts common information shared across subjects and devices. IFLF incorporates two learning paradigms: 1) meta-learning to capture robust features across seen domains and adapt to an unseen one with similarity-based data selection; 2) multi-task learning to deal with data shortage and enhance overall performance via knowledge sharing among different subjects. Experiments demonstrated that IFLF is effective in handling both subject and device diversion across popular open datasets and an in-house dataset. It outperforms a baseline model of up to 40% in test accuracy.
Human Activity Recognition from body-worn sensor data poses an inherent challenge in capturing spatial and temporal dependencies of time-series signals. In this regard, the existing recurrent or convolutional or their hybrid models for activity recog nition struggle to capture spatio-temporal context from the feature space of sensor reading sequence. To address this complex problem, we propose a self-attention based neural network model that foregoes recurrent architectures and utilizes different types of attention mechanisms to generate higher dimensional feature representation used for classification. We performed extensive experiments on four popular publicly available HAR datasets: PAMAP2, Opportunity, Skoda and USC-HAD. Our model achieve significant performance improvement over recent state-of-the-art models in both benchmark test subjects and Leave-one-subject-out evaluation. We also observe that the sensor attention maps produced by our model is able capture the importance of the modality and placement of the sensors in predicting the different activity classes.
Human Activity Recognition~(HAR) is the classification of human movement, captured using one or more sensors either as wearables or embedded in the environment~(e.g. depth cameras, pressure mats). State-of-the-art methods of HAR rely on having access to a considerable amount of labelled data to train deep architectures with many train-able parameters. This becomes prohibitive when tasked with creating models that are sensitive to personal nuances in human movement, explicitly present when performing exercises. In addition, it is not possible to collect training data to cover all possible subjects in the target population. Accordingly, learning personalised models with few data remains an interesting challenge for HAR research. We present a meta-learning methodology for learning to learn personalised HAR models for HAR; with the expectation that the end-user need only provides a few labelled data but can benefit from the rapid adaptation of a generic meta-model. We introduce two algorithms, Personalised MAML and Personalised Relation Networks inspired by existing Meta-Learning algorithms but optimised for learning HAR models that are adaptable to any person in health and well-being applications. A comparative study shows significant performance improvements against the state-of-the-art Deep Learning algorithms and the Few-shot Meta-Learning algorithms in multiple HAR domains.
Deep learning models for human activity recognition (HAR) based on sensor data have been heavily studied recently. However, the generalization ability of deep models on complex real-world HAR data is limited by the availability of high-quality labele d activity data, which are hard to obtain. In this paper, we design a similarity embedding neural network that maps input sensor signals onto real vectors through carefully designed convolutional and LSTM layers. The embedding network is trained with a pairwise similarity loss, encouraging the clustering of samples from the same class in the embedded real space, and can be effectively trained on a small dataset and even on a noisy dataset with mislabeled samples. Based on the learned embeddings, we further propose both nonparametric and parametric approaches for activity recognition. Extensive evaluation based on two public datasets has shown that the proposed similarity embedding network significantly outperforms state-of-the-art deep models on HAR classification tasks, is robust to mislabeled samples in the training set, and can also be used to effectively denoise a noisy dataset.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا