This research addresses the challenging problem of visual collision detection in very complex and dynamic real physical scenes, specifically, the vehicle driving scenarios. This research takes inspiration from a large-field looming sensitive neuron, i.e., the lobula giant movement detector (LGMD) in the locusts visual pathways, which represents high spike frequency to rapid approaching objects. Building upon our previous models, in this paper we propose a novel inhibition mechanism that is capable of adapting to different levels of background complexity. This adaptive mechanism works effectively to mediate the local inhibition strength and tune the temporal latency of local excitation reaching the LGMD neuron. As a result, the proposed model is effective to extract colliding cues from complex dynamic visual scenes. We tested the proposed method using a range of stimuli including simulated movements in grating backgrounds and shifting of a natural panoramic scene, as well as vehicle crash video sequences. The experimental results demonstrate the proposed method is feasible for fast collision perception in real-world situations with potential applications in future autonomous vehicles.
The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing self-driving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help align the research communitys contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset. Our new dataset consists of 1150 scenes that each span 20 seconds, consisting of well synchronized and calibrated high quality LiDAR and camera data captured across a range of urban and suburban geographies. It is 15x more diverse than the largest camera+LiDAR dataset available based on our proposed diversity metric. We exhaustively annotated this data with 2D (camera image) and 3D (LiDAR) bounding boxes, with consistent identifiers across frames. Finally, we provide strong baselines for 2D as well as 3D detection and tracking tasks. We further study the effects of dataset size and generalization across geographies on 3D detection methods. Find data, code and more up-to-date information at http://www.waymo.com/open.
Advancements in artificial intelligence (AI) gives a great opportunity to develop an autonomous devices. The contribution of this work is an improved convolutional neural network (CNN) model and its implementation for the detection of road cracks, potholes, and yellow lane in the road. The purpose of yellow lane detection and tracking is to realize autonomous navigation of unmanned aerial vehicle (UAV) by following yellow lane while detecting and reporting the road cracks and potholes to the server through WIFI or 5G medium. The fabrication of own data set is a hectic and time-consuming task. The data set is created, labeled and trained using default and an improved model. The performance of both these models is benchmarked with respect to accuracy, mean average precision (mAP) and detection time. In the testing phase, it was observed that the performance of the improved model is better in respect of accuracy and mAP. The improved model is implemented in UAV using the robot operating system for the autonomous detection of potholes and cracks in roads via UAV front camera vision in real-time.
In the field of autonomous driving, camera sensors are extremely prone to soiling because they are located outside of the car and interact with environmental sources of soiling such as rain drops, snow, dust, sand, mud and so on. This can lead to either partial or complete vision degradation. Hence detecting such decay in vision is very important for safety and overall to preserve the functionality of the autonomous components in autonomous driving. The contribution of this work involves: 1) Designing a Deep Convolutional Neural Network (DCNN) based baseline network, 2) Exploiting several network remodelling techniques such as employing static and dynamic group convolution, channel reordering to compress the baseline architecture and make it suitable for low power embedded systems with nearly 1 TOPS, 3) Comparing various result metrics of all interim networks dedicated for soiling degradation detection at tile level of size 64 x 64 on input resolution 1280 x 768. The compressed network, is called SoildNet (Sand, snOw, raIn/dIrt, oiL, Dust/muD) that uses only 9.72% trainable parameters of the base network and reduces the model size by more than 7 times with no loss in accuracy
Human drivers produce a vast amount of data which could, in principle, be used to improve autonomous driving systems. Unfortunately, seemingly straightforward approaches for creating end-to-end driving models that map sensor data directly into driving actions are problematic in terms of interpretability, and typically have significant difficulty dealing with spurious correlations. Alternatively, we propose to use this kind of action-based driving data for learning representations. Our experiments show that an affordance-based driving model pre-trained with this approach can leverage a relatively small amount of weakly annotated imagery and outperform pure end-to-end driving models, while being more interpretable. Further, we demonstrate how this strategy outperforms previous methods based on learning inverse dynamics models as well as other methods based on heavy human supervision (ImageNet).