Event-based Robotic Grasping Detection with Neuromorphic Vision Sensor and Event-Stream Dataset

60 0 0.0 ( 0 )

Download Cite

Added by Hu Cao

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Bin Li - Hu Cao - Zhongnan Qu

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Robotic grasping plays an important role in the field of robotics. The current state-of-the-art robotic grasping detection systems are usually built on the conventional vision, such as RGB-D camera. Compared to traditional frame-based computer vision, neuromorphic vision is a small and young community of research. Currently, there are limited event-based datasets due to the troublesome annotation of the asynchronous event stream. Annotating large scale vision dataset often takes lots of computation resources, especially the troublesome data for video-level annotation. In this work, we consider the problem of detecting robotic grasps in a moving camera view of a scene containing objects. To obtain more agile robotic perception, a neuromorphic vision sensor (DAVIS) attaching to the robot gripper is introduced to explore the potential usage in grasping detection. We construct a robotic grasping dataset named Event-Stream Dataset with 91 objects. A spatio-temporal mixed particle filter (SMP Filter) is proposed to track the led-based grasp rectangles which enables video-level annotation of a single grasp rectangle per object. As leds blink at high frequency, the Event-Stream dataset is annotated in a high frequency of 1 kHz. Based on the Event-Stream dataset, we develop a deep neural network for grasping detection which consider the angle learning problem as classification instead of regression. The method performs high detection accuracy on our Event-Stream dataset with 93% precision at object-wise level. This work provides a large-scale and well-annotated dataset, and promotes the neuromorphic vision applications in agile robot.

rate research

Neuromorphic Event-Based Slip Detection and suppression in Robotic Grasping and Manipulation

169 - Rajkumar Muthusamy , Xiaoqian Huang , Yahya Zweiri 2020

Slip detection is essential for robots to make robust grasping and fine manipulation. In this paper, a novel dynamic vision-based finger system for slip detection and suppression is proposed. We also present a baseline and feature based approach to detect object slips under illumination and vibration uncertainty. A threshold method is devised to autonomously sample noise in real-time to improve slip detection. Moreover, a fuzzy based suppression strategy using incipient slip feedback is proposed for regulating the grip force. A comprehensive experimental study of our proposed approaches under uncertainty and system for high-performance precision manipulation are presented. We also propose a slip metric to evaluate such performance quantitatively. Results indicate that the system can effectively detect incipient slip events at a sampling rate of 2kHz ($Delta t = 500mu s$) and suppress them before a gross slip occurs. The event-based approach holds promises to high precision manipulation task requirement in industrial manufacturing and household services.

Robotics Computer Vision and Pattern Recognition

Neutron-Induced, Single-Event Effects on Neuromorphic Event-based Vision Sensor: A First Step Towards Space Applications

45 - Seth Roffe , Himanshu Akolkar , Alan D. George 2021

This paper studies the suitability of neuromorphic event-based vision cameras for spaceflight, and the effects of neutron radiation on their performance. Neuromorphic event-based vision cameras are novel sensors that implement asynchronous, clockless data acquisition, providing information about the change in illuminance greater than 120dB with sub-millisecond temporal precision. These sensors have huge potential for space applications as they provide an extremely sparse representation of visual dynamics while removing redundant information, thereby conforming to low-resource requirements. An event-based sensor was irradiated under wide-spectrum neutrons at Los Alamos Neutron Science Center and its effects were classified. We found that the sensor had very fast recovery during radiation, showing high correlation of noise event bursts with respect to source macro-pulses. No significant differences were observed between the number of events induced at different angles of incidence but significant differences were found in the spatial structure of noise events at different angles. The results show that event-based cameras are capable of functioning in a space-like, radiative environment with a signal-to-noise ratio of 3.355. They also show that radiation-induced noise does not affect event-level computation. We also introduce the Event-based Radiation-Induced Noise Simulation Environment (Event-RINSE), a simulation environment based on the noise-modelling we conducted and capable of injecting the effects of radiation-induced noise from the collected data to any stream of events in order to ensure that developed code can operate in a radiative environment. To the best of our knowledge, this is the first time such analysis of neutron-induced noise analysis has been performed on a neuromorphic vision sensor, and this study shows the advantage of using such sensors for space applications.

Instrumentation and Detectors

Event-based Vision: A Survey

431 - Guillermo Gallego , Tobi Delbruck , Garrick Orchard 2019

Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.

Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning

Lightweight Convolutional Neural Network with Gaussian-based Grasping Representation for Robotic Grasping Detection

72 - Hu Cao , Guang Chen , Zhijun Li 2021

The method of deep learning has achieved excellent results in improving the performance of robotic grasping detection. However, the deep learning methods used in general object detection are not suitable for robotic grasping detection. Current modern object detectors are difficult to strike a balance between high accuracy and fast inference speed. In this paper, we present an efficient and robust fully convolutional neural network model to perform robotic grasping pose estimation from an n-channel input image of the real grasping scene. The proposed network is a lightweight generative architecture for grasping detection in one stage. Specifically, a grasping representation based on Gaussian kernel is introduced to encode training samples, which embodies the principle of maximum central point grasping confidence. Meanwhile, to extract multi-scale information and enhance the feature discriminability, a receptive field block (RFB) is assembled to the bottleneck of our grasping detection architecture. Besides, pixel attention and channel attention are combined to automatically learn to focus on fusing context information of varying shapes and sizes by suppressing the noise feature and highlighting the grasping object feature. Extensive experiments on two public grasping datasets, Cornell and Jacquard demonstrate the state-of-the-art performance of our method in balancing accuracy and inference speed. The network is an order of magnitude smaller than other excellent algorithms while achieving better performance with an accuracy of 98.9$%$ and 95.6$%$ on the Cornell and Jacquard datasets, respectively.

Computer Vision and Pattern Recognition

Spatiotemporal Feature Learning for Event-Based Vision

135 - Rohan Ghosh , Anupam Gupta , Siyi Tang 2019

Unlike conventional frame-based sensors, event-based visual sensors output information through spikes at a high temporal resolution. By only encoding changes in pixel intensity, they showcase a low-power consuming, low-latency approach to visual information sensing. To use this information for higher sensory tasks like object recognition and tracking, an essential simplification step is the extraction and learning of features. An ideal feature descriptor must be robust to changes involving (i) local transformations and (ii) re-appearances of a local event pattern. To that end, we propose a novel spatiotemporal feature representation learning algorithm based on slow feature analysis (SFA). Using SFA, smoothly changing linear projections are learnt which are robust to local visual transformations. In order to determine if the features can learn to be invariant to various visual transformations, feature point tracking tasks are used for evaluation. Extensive experiments across two datasets demonstrate the adaptability of the spatiotemporal feature learner to translation, scaling and rotational transformations of the feature points. More importantly, we find that the obtained feature representations are able to exploit the high temporal resolution of such event-based cameras in generating better feature tracks.

Computer Vision and Pattern Recognition