ترغب بنشر مسار تعليمي؟ اضغط هنا

Recognizing Activities of Daily Living with a Wrist-mounted Camera

133   0   0.0 ( 0 )
 نشر من قبل Katsunori Ohnishi
 تاريخ النشر 2015
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We present a novel dataset and a novel algorithm for recognizing activities of daily living (ADL) from a first-person wearable camera. Handled objects are crucially important for egocentric ADL recognition. For specific examination of objects related to users actions separately from other objects in an environment, many previous works have addressed the detection of handled objects in images captured from head-mounted and chest-mounted cameras. Nevertheless, detecting handled objects is not always easy because they tend to appear small in images. They can be occluded by a users body. As described herein, we mount a camera on a users wrist. A wrist-mounted camera can capture handled objects at a large scale, and thus it enables us to skip object detection process. To compare a wrist-mounted camera and a head-mounted camera, we also develop a novel and publicly available dataset that includes videos and annotations of daily activities captured simultaneously by both cameras. Additionally, we propose a discriminative video representation that retains spatial and temporal information after encoding frame descriptors extracted by Convolutional Neural Networks (CNN).



قيم البحث

اقرأ أيضاً

246 - Srijan Das , Rui Dai , Di Yang 2021
Many attempts have been made towards combining RGB and 3D poses for the recognition of Activities of Daily Living (ADL). ADL may look very similar and often necessitate to model fine-grained details to distinguish them. Because the recent 3D ConvNets are too rigid to capture the subtle visual patterns across an action, this research direction is dominated by methods combining RGB and 3D Poses. But the cost of computing 3D poses from RGB stream is high in the absence of appropriate sensors. This limits the usage of aforementioned approaches in real-world applications requiring low latency. Then, how to best take advantage of 3D Poses for recognizing ADL? To this end, we propose an extension of a pose driven attention mechanism: Video-Pose Network (VPN), exploring two distinct directions. One is to transfer the Pose knowledge into RGB through a feature-level distillation and the other towards mimicking pose driven attention through an attention-level distillation. Finally, these two approaches are integrated into a single model, we call VPN++. We show that VPN++ is not only effective but also provides a high speed up and high resilience to noisy Poses. VPN++, with or without 3D Poses, outperforms the representative baselines on 4 public datasets. Code is available at https://github.com/srijandas07/vpnplusplus.
Over the years, activity sensing and recognition has been shown to play a key enabling role in a wide range of applications, from sustainability and human-computer interaction to health care. While many recognition tasks have traditionally employed i nertial sensors, acoustic-based methods offer the benefit of capturing rich contextual information, which can be useful when discriminating complex activities. Given the emergence of deep learning techniques and leveraging new, large-scaled multi-media datasets, this paper revisits the opportunity of training audio-based classifiers without the onerous and time-consuming task of annotating audio data. We propose a framework for audio-based activity recognition that makes use of millions of embedding features from public online video sound clips. Based on the combination of oversampling and deep learning approaches, our framework does not require further feature processing or outliers filtering as in prior work. We evaluated our approach in the context of Activities of Daily Living (ADL) by recognizing 15 everyday activities with 14 participants in their own homes, achieving 64.2% and 83.6% averaged within-subject accuracy in terms of top-1 and top-3 classification respectively. Individual class performance was also examined in the paper to further study the co-occurrence characteristics of the activities and the robustness of the framework.
We present a method to analyze images taken from a passive egocentric wearable camera along with the contextual information, such as time and day of week, to learn and predict everyday activities of an individual. We collected a dataset of 40,103 ego centric images over a 6 month period with 19 activity classes and demonstrate the benefit of state-of-the-art deep learning techniques for learning and predicting daily activities. Classification is conducted using a Convolutional Neural Network (CNN) with a classification method we introduce called a late fusion ensemble. This late fusion ensemble incorporates relevant contextual information and increases our classification accuracy. Our technique achieves an overall accuracy of 83.07% in predicting a persons activity across the 19 activity classes. We also demonstrate some promising results from two additional users by fine-tuning the classifier with one day of training data.
Parkinsons Disease (PD) is characterized by disorders in motor function such as freezing of gait, rest tremor, rigidity, and slowed and hyposcaled movements. Medication with dopaminergic medication may alleviate those motor symptoms, however, side-ef fects may include uncontrolled movements, known as dyskinesia. In this paper, an automatic PD motor-state assessment in free-living conditions is proposed using an accelerometer in a wrist-worn wearable sensor. In particular, an ensemble of convolutional neural networks (CNNs) is applied to capture the large variability of daily-living activities and overcome the dissimilarity between training and test patients due to the inter-patient variability. In addition, class activation map (CAM), a visualization technique for CNNs, is applied for providing an interpretation of the results.
The world is often stricken by catastrophic disasters. On-demand drone-mounted visible light communication (VLC) networks are suitable for monitoring disaster-stricken areas for leveraging disaster-response operations. The concept of an image sensor- based VLC has also attracted attention in the recent past for establishing stable links using unstably moving drones. However, existing works did not sufficiently consider the one-to-many image sensor-based VLC system. Thus, this paper proposes the concept of a one-to-many image sensor-based VLC between a camera and multiple drone-mounted LED lights with a drone-positioning algorithm to avoid interference among VLC links. Multiple drones are deployed on-demand in a disaster-stricken area to monitor the ground and continuously send image data to a camera with image sensor-based visible light communication (VLC) links. The proposed idea is demonstrated with the proof-of-concept (PoC) implemented with drones that are equipped with LED panels and a 4K camera. As a result, we confirmed the feasibility of the proposed system.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا