Detecting Hands and Recognizing Physical Contact in the Wild

60 0 0.0 ( 0 )

Download Cite

Added by Supreeth Narasimhaswamy

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Supreeth Narasimhaswamy - Trung Nguyen - Minh Hoai

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We investigate a new problem of detecting hands and recognizing their physical contact state in unconstrained conditions. This is a challenging inference task given the need to reason beyond the local appearance of hands. The lack of training annotations indicating which object or parts of an object the hand is in contact with further complicates the task. We propose a novel convolutional network based on Mask-RCNN that can jointly learn to localize hands and predict their physical contact to address this problem. The network uses outputs from another object detector to obtain locations of objects present in the scene. It uses these outputs and hand locations to recognize the hands contact state using two attention mechanisms. The first attention mechanism is based on the hand and a regions affinity, enclosing the hand and the object, and densely pools features from this region to the hand region. The second attention module adaptively selects salient features from this plausible region of contact. To develop and evaluate our methods performance, we introduce a large-scale dataset called ContactHands, containing unconstrained images annotated with hand locations and contact states. The proposed network, including the parameters of attention modules, is end-to-end trainable. This network achieves approximately 7% relative improvement over a baseline network that was built on the vanilla Mask-RCNN architecture and trained for recognizing hand contact states.

rate research

Detecting natural disasters, damage, and incidents in the wild

295 - Ethan Weber , Nuria Marzo , Dim P. Papadopoulos 2020

Responding to natural disasters, such as earthquakes, floods, and wildfires, is a laborious task performed by on-the-ground emergency responders and analysts. Social media has emerged as a low-latency data source to quickly understand disaster situations. While most studies on social media are limited to text, images offer more information for understanding disaster and incident scenes. However, no large-scale image datasets for incident detection exists. In this work, we present the Incidents Dataset, which contains 446,684 images annotated by humans that cover 43 incidents across a variety of scenes. We employ a baseline classification model that mitigates false-positive errors and we perform image filtering experiments on millions of social media images from Flickr and Twitter. Through these experiments, we show how the Incidents Dataset can be used to detect images with incidents in the wild. Code, data, and models are available online at http://incidentsdataset.csail.mit.edu.

Computer Vision and Pattern Recognition

Understanding Human Hands in Contact at Internet Scale

102 - Dandan Shan , Jiaqi Geng , Michelle Shu 2020

Hands are the central means by which humans manipulate their world and being able to reliably extract hand state information from Internet videos of humans engaged in their hands has the potential to pave the way to systems that can learn from petabytes of video data. This paper proposes steps towards this by inferring a rich representation of hands engaged in interaction method that includes: hand location, side, contact state, and a box around the object in contact. To support this effort, we gather a large-scale dataset of hands in contact with objects consisting of 131 days of footage as well as a 100K annotated hand-contact video frame dataset. The learned model on this dataset can serve as a foundation for hand-contact understanding in videos. We quantitatively evaluate it both on its own and in service of predicting and learning from 3D meshes of human hands.

Computer Vision and Pattern Recognition

Detecting Changed-Hands Online Review Accounts

238 - Geli Fei , Shuai Wang , Bing Liu 2021

A reputable social media or review account can be a good cover for spamming activities. It has become prevalent that spammers buy/sell such accounts openly on the Web. We call these sold/bought accounts the changed-hands (CH) accounts. They are hard to detect by existing spam detection algorithms as their spamming activities are under the disguise of clean histories. In this paper, we first propose the problem of detecting CH accounts, and then design an effective detection algorithm which exploits changes in content and writing styles of individual accounts, and a proposed novel feature selection method that works at a fine-grained level within each individual account. The proposed method not only determines if an account has changed hands, but also pinpoints the change point. Experimental results with online review accounts demonstrate the high effectiveness of our approach.

Social and Information Networks

Detecting and analysing spontaneous oral cancer speech in the wild

66 - Bence Mark Halpern , Rob van Son , Michiel van den Brekel 2020

Oral cancer speech is a disease which impacts more than half a million people worldwide every year. Analysis of oral cancer speech has so far focused on read speech. In this paper, we 1) present and 2) analyse a three-hour long spontaneous oral cancer speech dataset collected from YouTube. 3) We set baselines for an oral cancer speech detection task on this dataset. The analysis of these explainable machine learning baselines shows that sibilants and stop consonants are the most important indicators for spontaneous oral cancer speech detection.

Audio and Speech Processing Machine Learning Sound

Recovering and Simulating Pedestrians in the Wild

89 - Ze Yang , Siva Manivasagam , Ming Liang 2020

Sensor simulation is a key component for testing the performance of self-driving vehicles and for data augmentation to better train perception systems. Typical approaches rely on artists to create both 3D assets and their animations to generate a new scenario. This, however, does not scale. In contrast, we propose to recover the shape and motion of pedestrians from sensor readings captured in the wild by a self-driving car driving around. Towards this goal, we formulate the problem as energy minimization in a deep structured model that exploits human shape priors, reprojection consistency with 2D poses extracted from images, and a ray-caster that encourages the reconstructed mesh to agree with the LiDAR readings. Importantly, we do not require any ground-truth 3D scans or 3D pose annotations. We then incorporate the reconstructed pedestrian assets bank in a realistic LiDAR simulation system by performing motion retargeting, and show that the simulated LiDAR data can be used to significantly reduce the amount of annotated real-world data required for visual perception tasks.

Computer Vision and Pattern Recognition Robotics