ﻻ يوجد ملخص باللغة العربية
A crucial component of an autonomous vehicle (AV) is the artificial intelligence (AI) is able to drive towards a desired destination. Today, there are different paradigms addressing the development of AI drivers. On the one hand, we find modular pipelines, which divide the driving task into sub-tasks such as perception and maneuver planning and control. On the other hand, we find end-to-end driving approaches that try to learn a direct mapping from input raw sensor data to vehicle control signals. The later are relatively less studied, but are gaining popularity since they are less demanding in terms of sensor data annotation. This paper focuses on end-to-end autonomous driving. So far, most proposals relying on this paradigm assume RGB images as input sensor data. However, AVs will not be equipped only with cameras, but also with active sensors providing accurate depth information (e.g., LiDARs). Accordingly, this paper analyses whether combining RGB and depth modalities, i.e. using RGBD data, produces better end-to-end AI drivers than relying on a single modality. We consider multimodality based on early, mid and late fusion schemes, both in multisensory and single-sensor (monocular depth estimation) settings. Using the CARLA simulator and conditional imitation learning (CIL), we show how, indeed, early fusion multimodality outperforms single-modality.
Current deep learning based autonomous driving approaches yield impressive results also leading to in-production deployment in certain controlled scenarios. One of the most popular and fascinating approaches relies on learning vehicle controls direct
Recently, deep-learning based approaches have achieved impressive performance for autonomous driving. However, end-to-end vision-based methods typically have limited interpretability, making the behaviors of the deep networks difficult to explain. He
Deep neural network controllers for autonomous driving have recently benefited from significant performance improvements, and have begun deployment in the real world. Prior to their widespread adoption, safety guarantees are needed on the controller
Automatic affect recognition is a challenging task due to the various modalities emotions can be expressed with. Applications can be found in many domains including multimedia retrieval and human computer interaction. In recent years, deep neural net
Transcription or sub-titling of open-domain videos is still a challenging domain for Automatic Speech Recognition (ASR) due to the datas challenging acoustics, variable signal processing and the essentially unrestricted domain of the data. In previou