No Arabic abstract
We introduce the DROW detector, a deep learning based detector for 2D range data. Laser scanners are lighting invariant, provide accurate range data, and typically cover a large field of view, making them interesting sensors for robotics applications. So far, research on detection in laser range data has been dominated by hand-crafted features and boosted classifiers, potentially losing performance due to suboptimal design choices. We propose a Convolutional Neural Network (CNN) based detector for this task. We show how to effectively apply CNNs for detection in 2D range data, and propose a depth preprocessing step and voting scheme that significantly improve CNN performance. We demonstrate our approach on wheelchairs and walkers, obtaining state of the art detection results. Apart from the training data, none of our design choices limits the detector to these two classes, though. We provide a ROS node for our detector and release our dataset containing 464k laser scans, out of which 24k were annotated.
Detecting humans is a key skill for mobile robots and intelligent vehicles in a large variety of applications. While the problem is well studied for certain sensory modalities such as image data, few works exist that address this detection task using 2D range data. However, a widespread sensory setup for many mobile robots in service and domestic applications contains a horizontally mounted 2D laser scanner. Detecting people from 2D range data is challenging due to the speed and dynamics of human leg motion and the high levels of occlusion and self-occlusion particularly in crowds of people. While previous approaches mostly relied on handcrafted features, we recently developed the deep learning based wheelchair and walker detector DROW. In this paper, we show the generalization to people, including small modifications that significantly boost DROWs performance. Additionally, by providing a small, fully online temporal window in our network, we further boost our score. We extend the DROW dataset with person annotations, making this the largest dataset of person annotations in 2D range data, recorded during several days in a real-world environment with high diversity. Extensive experiments with three current baseline methods indicate it is a challenging dataset, on which our improved DROW detector beats the current state-of-the-art.
Simulating realistic radar data has the potential to significantly accelerate the development of data-driven approaches to radar processing. However, it is fraught with difficulty due to the notoriously complex image formation process. Here we propose to learn a radar sensor model capable of synthesising faithful radar observations based on simulated elevation maps. In particular, we adopt an adversarial approach to learning a forward sensor model from unaligned radar examples. In addition, modelling the backward model encourages the output to remain aligned to the world state through a cyclical consistency criterion. The backward model is further constrained to predict elevation maps from real radar data that are grounded by partial measurements obtained from corresponding lidar scans. Both models are trained in a joint optimisation. We demonstrate the efficacy of our approach by evaluating a down-stream segmentation model trained purely on simulated data in a real-world deployment. This achieves performance within four percentage points of the same model trained entirely on real data.
Force control is essential for medical robots when touching and contacting the patients body. To increase the stability and efficiency in force control, an Adaption Module could be used to adjust the parameters for different contact situations. We propose an adaptive controller with an Adaption Module which can produce control parameters based on force feedback and real-time stiffness detection. We develop methods for learning the optimal policies by value iteration and using the data generated from those policies to train the Adaptive Module. We test this controller on different zones of a persons arm. All the parameters used in practice are learned from data. The experiments show that the proposed adaptive controller can exert various target forces on different zones of the arm with fast convergence and good stability.
Deep neural network based reinforcement learning (RL) can learn appropriate visual representations for complex tasks like vision-based robotic grasping without the need for manually engineering or prior learning a perception system. However, data for RL is collected via running an agent in the desired environment, and for applications like robotics, running a robot in the real world may be extremely costly and time consuming. Simulated training offers an appealing alternative, but ensuring that policies trained in simulation can transfer effectively into the real world requires additional machinery. Simulations may not match reality, and typically bridging the simulation-to-reality gap requires domain knowledge and task-specific engineering. We can automate this process by employing generative models to translate simulated images into realistic ones. However, this sort of translation is typically task-agnostic, in that the translated images may not preserve all features that are relevant to the task. In this paper, we introduce the RL-scene consistency loss for image translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image. This allows us to learn a task-aware translation. Incorporating this loss into unsupervised domain translation, we obtain RL-CycleGAN, a new approach for simulation-to-real-world transfer for reinforcement learning. In evaluations of RL-CycleGAN on two vision-based robotics grasping tasks, we show that RL-CycleGAN offers a substantial improvement over a number of prior methods for sim-to-real transfer, attaining excellent real-world performance with only a modest number of real-world observations.
Traffic near-crash events serve as critical data sources for various smart transportation applications, such as being surrogate safety measures for traffic safety research and corner case data for automated vehicle testing. However, there are several key challenges for near-crash detection. First, extracting near-crashes from original data sources requires significant computing, communication, and storage resources. Also, existing methods lack efficiency and transferability, which bottlenecks prospective large-scale applications. To this end, this paper leverages the power of edge computing to address these challenges by processing the video streams from existing dashcams onboard in a real-time manner. We design a multi-thread system architecture that operates on edge devices and model the bounding boxes generated by object detection and tracking in linear complexity. The method is insensitive to camera parameters and backward compatible with different vehicles. The edge computing system has been evaluated with recorded videos and real-world tests on two cars and four buses for over ten thousand hours. It filters out irrelevant videos in real-time thereby saving labor cost, processing time, network bandwidth, and data storage. It collects not only event videos but also other valuable data such as road user type, event location, time to collision, vehicle trajectory, vehicle speed, brake switch, and throttle. The experiments demonstrate the promising performance of the system regarding efficiency, accuracy, reliability, and transferability. It is among the first efforts in applying edge computing for real-time traffic video analytics and is expected to benefit multiple sub-fields in smart transportation research and applications.