No Arabic abstract
The task of visual grounding requires locating the most relevant region or object in an image, given a natural language query. So far, progress on this task was mostly measured on curated datasets, which are not always representative of human spoken language. In this work, we deviate from recent, popular task settings and consider the problem under an autonomous vehicle scenario. In particular, we consider a situation where passengers can give free-form natural language commands to a vehicle which can be associated with an object in the street scene. To stimulate research on this topic, we have organized the emph{Commands for Autonomous Vehicles} (C4AV) challenge based on the recent emph{Talk2Car} dataset (URL: https://www.aicrowd.com/challenges/eccv-2020-commands-4-autonomous-vehicles). This paper presents the results of the challenge. First, we compare the used benchmark against existing datasets for visual grounding. Second, we identify the aspects that render top-performing models successful, and relate them to existing state-of-the-art models for visual grounding, in addition to detecting potential failure cases by evaluating on carefully selected subsets. Finally, we discuss several possibilities for future work.
The Commands For Autonomous Vehicles (C4AV) challenge requires participants to solve an object referral task in a real-world setting. More specifically, we consider a scenario where a passenger can pass free-form natural language commands to a self-driving car. This problem is particularly challenging, as the language is much less constrained compared to existing benchmarks, and object references are often implicit. The challenge is based on the recent texttt{Talk2Car} dataset. This document provides a technical overview of a model that we released to help participants get started in the competition. The code can be found at https://github.com/talk2car/Talk2Car.
The topical workshop {it Strong QCD from Hadron Structure Experiments} took place at Jefferson Lab from Nov. 6-9, 2019. Impressive progress in relating hadron structure observables to the strong QCD mechanisms has been achieved from the {it ab initio} QCD description of hadron structure in a diverse array of methods in order to expose emergent phenomena via quasi-particle formation. The wealth of experimental data and the advances in hadron structure theory make it possible to gain insight into strong interaction dynamics in the regime of large quark-gluon coupling (the strong QCD regime), which will address the most challenging problems of the Standard Model on the nature of the dominant part of hadron mass, quark-gluon confinement, and the emergence of the ground and excited state hadrons, as well as atomic nuclei, from QCD. This workshop aimed to develop plans and to facilitate the future synergistic efforts between experimentalists, phenomenologists, and theorists working on studies of hadron spectroscopy and structure with the goal to connect the properties of hadrons and atomic nuclei available from data to the strong QCD dynamics underlying their emergence from QCD. These results pave the way for a future breakthrough extension in the studies of QCD with an Electron-Ion Collider in the U.S.
In this report, we introduce our real-time 2D object detection system for the realistic autonomous driving scenario. Our detector is built on a newly designed YOLO model, called YOLOX. On the Argoverse-HD dataset, our system achieves 41.0 streaming AP, which surpassed second place by 7.8/6.1 on detection-only track/fully track, respectively. Moreover, equipped with TensorRT, our model achieves the 30FPS inference speed with a high-resolution input size (e.g., 1440-2304). Code and models will be available at https://github.com/Megvii-BaseDetection/YOLOX
Pedestrians are arguably one of the most safety-critical road users to consider for autonomous vehicles in urban areas. In this paper, we address the problem of jointly detecting pedestrians and recognizing 32 pedestrian attributes from a single image. These encompass visual appearance and behavior, and also include the forecasting of road crossing, which is a main safety concern. For this, we introduce a Multi-Task Learning (MTL) model relying on a composite field framework, which achieves both goals in an efficient way. Each field spatially locates pedestrian instances and aggregates attribute predictions over them. This formulation naturally leverages spatial context, making it well suited to low resolution scenarios such as autonomous driving. By increasing the number of attributes jointly learned, we highlight an issue related to the scales of gradients, which arises in MTL with numerous tasks. We solve it by normalizing the gradients coming from different objective functions when they join at the fork in the network architecture during the backward pass, referred to as fork-normalization. Experimental validation is performed on JAAD, a dataset providing numerous attributes for pedestrian analysis from autonomous vehicles, and shows competitive detection and attribute recognition results, as well as a more stable MTL training.
Autonomous Vehicles (AVs) raise important social and ethical concerns, especially about accountability, dignity, and justice. We focus on the specific concerns arising from how AV technology will affect the lives and livelihoods of professional and semi-professional drivers. Whereas previous studies of such concerns have focused on the opinions of experts, we seek to understand these ethical and societal challenges from the perspectives of the drivers themselves. To this end, we adopted a qualitative research methodology based on semi-structured interviews. This is an established social science methodology that helps understand the core concerns of stakeholders in depth by avoiding the biases of superficial methods such as surveys. We find that whereas drivers agree with the experts that AVs will significantly impact transportation systems, they are apprehensive about the prospects for their livelihoods and dismiss the suggestions that driving jobs are unsatisfying and their profession does not merit protection. By showing how drivers differ from the experts, our study has ramifications beyond AVs to AI and other advanced technologies. Our findings suggest that qualitative research applied to the relevant, especially disempowered, stakeholders is essential to ensuring that new technologies are introduced ethically.