أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Sergio Casas

IntentNet: Learning to Predict Intention from Raw Sensor Data

218 - Sergio Casas , Wenjie Luo , Raquel Urtasun 2021

In order to plan a safe maneuver, self-driving vehicles need to understand the intent of other traffic participants. We define intent as a combination of discrete high-level behaviors as well as continuous trajectories describing future motion. In th is paper, we develop a one-stage detector and forecaster that exploits both 3D point clouds produced by a LiDAR sensor as well as dynamic maps of the environment. Our multi-task model achieves better accuracy than the respective separate modules while saving computation, which is critical to reducing reaction time in self-driving applications.

علم الروبوتات الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

MP3: A Unified Model to Map, Perceive, Predict and Plan

288 - Sergio Casas , Abbas Sadat , Raquel Urtasun 2021

High-definition maps (HD maps) are a key component of most modern self-driving systems due to their valuable semantic and geometric information. Unfortunately, building HD maps has proven hard to scale due to their cost as well as the requirements th ey impose in the localization system that has to work everywhere with centimeter-level accuracy. Being able to drive without an HD map would be very beneficial to scale self-driving solutions as well as to increase the failure tolerance of existing ones (e.g., if localization fails or the map is not up-to-date). Towards this goal, we propose MP3, an end-to-end approach to mapless driving where the input is raw sensor data and a high-level command (e.g., turn left at the intersection). MP3 predicts intermediate representations in the form of an online map and the current and future state of dynamic agents, and exploits them in a novel neural motion planner to make interpretable decisions taking into account uncertainty. We show that our approach is significantly safer, more comfortable, and can follow commands better than the baselines in challenging long-term closed-loop simulations, as well as when compared to an expert driver in a large-scale real-world dataset.

علم الروبوتات الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving

107 - Alexander Cui , Sergio Casas , Abbas Sadat 2021

In this paper, we present LookOut, a novel autonomy system that perceives the environment, predicts a diverse set of futures of how the scene might unroll and estimates the trajectory of the SDV by optimizing a set of contingency plans over these fut ure realizations. In particular, we learn a diverse joint distribution over multi-agent future trajectories in a traffic scene that covers a wide range of future modes with high sample efficiency while leveraging the expressive power of generative models. Unlike previous work in diverse motion forecasting, our diversity objective explicitly rewards sampling future scenarios that require distinct reactions from the self-driving vehicle for improved safety. Our contingency planner then finds comfortable and non-conservative trajectories that ensure safe reactions to a wide range of future scenarios. Through extensive evaluations, we show that our model demonstrates significantly more diverse and sample-efficient motion forecasting in a large-scale self-driving dataset as well as safer and less-conservative motion plans in long-term closed-loop simulations when compared to current state-of-the-art models.

علم الروبوتات الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

Safety-Oriented Pedestrian Motion and Scene Occupancy Forecasting

119 - Katie Luo , Sergio Casas , Renjie Liao 2021

In this paper, we address the important problem in self-driving of forecasting multi-pedestrian motion and their shared scene occupancy map, critical for safe navigation. Our contributions are two-fold. First, we advocate for predicting both the indi vidual motions as well as the scene occupancy map in order to effectively deal with missing detections caused by postprocessing, e.g., confidence thresholding and non-maximum suppression. Second, we propose a Scene-Actor Graph Neural Network (SA-GNN) which preserves the relative spatial information of pedestrians via 2D convolution, and captures the interactions among pedestrians within the same scene, including those that have not been detected, via message passing. On two large-scale real-world datasets, nuScenes and ATG4D, we showcase that our scene-occupancy predictions are more accurate and better calibrated than those from state-of-the-art motion forecasting methods, while also matching their performance in pedestrian motion forecasting metrics.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي علم الروبوتات

StrObe: Streaming Object Detection from LiDAR Packets

106 - Davi Frossard , Simon Suo , Sergio Casas 2020

Many modern robotics systems employ LiDAR as their main sensing modality due to its geometrical richness. Rolling shutter LiDARs are particularly common, in which an array of lasers scans the scene from a rotating base. Points are emitted as a stream of packets, each covering a sector of the 360{deg} coverage. Modern perception algorithms wait for the full sweep to be built before processing the data, which introduces an additional latency. For typical 10Hz LiDARs this will be 100ms. As a consequence, by the time an output is produced, it no longer accurately reflects the state of the world. This poses a challenge, as robotics applications require minimal reaction times, such that maneuvers can be quickly planned in the event of a safety-critical situation. In this paper we propose StrObe, a novel approach that minimizes latency by ingesting LiDAR packets and emitting a stream of detections without waiting for the full sweep to be built. StrObe reuses computations from previous packets and iteratively updates a latent spatial representation of the scene, which acts as a memory, as new evidence comes in, resulting in accurate low-latency perception. We demonstrate the effectiveness of our approach on a large scale real-world dataset, showing that StrObe far outperforms the state-of-the-art when latency is taken into account, and matches the performance in the traditional setting.

الرؤية الحاسوبية وتمييز الأنماط علم الروبوتات

Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations

89 - Abbas Sadat , Sergio Casas , Mengye Ren 2020

In this paper we propose a novel end-to-end learnable network that performs joint perception, prediction and motion planning for self-driving vehicles and produces interpretable intermediate representations. Unlike existing neural motion planners, ou r motion planning costs are consistent with our perception and prediction estimates. This is achieved by a novel differentiable semantic occupancy representation that is explicitly used as cost by the motion planning process. Our network is learned end-to-end from human demonstrations. The experiments in a large-scale manual-driving dataset and closed-loop simulation show that the proposed model significantly outperforms state-of-the-art planners in imitating the human behaviors while producing much safer trajectories.

علم الروبوتات الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

Implicit Latent Variable Model for Scene-Consistent Motion Forecasting

221 - Sergio Casas , Cole Gulino , Simon Suo 2020

In order to plan a safe maneuver an autonomous vehicle must accurately perceive its environment, and understand the interactions among traffic participants. In this paper, we aim to learn scene-consistent motion forecasts of complex urban traffic dir ectly from sensor data. In particular, we propose to characterize the joint distribution over future trajectories via an implicit latent variable model. We model the scene as an interaction graph and employ powerful graph neural networks to learn a distributed latent representation of the scene. Coupled with a deterministic decoder, we obtain trajectory samples that are consistent across traffic participants, achieving state-of-the-art results in motion forecasting and interaction understanding. Last but not least, we demonstrate that our motion forecasts result in safer and more comfortable motion planning.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي علم الروبوتات

The Importance of Prior Knowledge in Precise Multimodal Prediction

119 - Sergio Casas , Cole Gulino , Simon Suo 2020

Roads have well defined geometries, topologies, and traffic rules. While this has been widely exploited in motion planning methods to produce maneuvers that obey the law, little work has been devoted to utilize these priors in perception and motion f orecasting methods. In this paper we propose to incorporate these structured priors as a loss function. In contrast to imposing hard constraints, this approach allows the model to handle non-compliant maneuvers when those happen in the real world. Safe motion planning is the end goal, and thus a probabilistic characterization of the possible future developments of the scene is key to choose the plan with the lowest expected cost. Towards this goal, we design a framework that leverages REINFORCE to incorporate non-differentiable priors over sample trajectories from a probabilistic model, thus optimizing the whole distribution. We demonstrate the effectiveness of our approach on real-world self-driving datasets containing complex road topologies and multi-agent interactions. Our motion forecasts not only exhibit better precision and map understanding, but most importantly result in safer motion plans taken by our self-driving vehicle. We emphasize that despite the importance of this evaluation, it has been often overlooked by previous perception and motion forecasting works.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Spatially-Aware Graph Neural Networks for Relational Behavior Forecasting from Sensor Data

218 - Sergio Casas , Cole Gulino , Renjie Liao 2019

In this paper, we tackle the problem of relational behavior forecasting from sensor data. Towards this goal, we propose a novel spatially-aware graph neural network (SpAGNN) that models the interactions between agents in the scene. Specifically, we e xploit a convolutional neural network to detect the actors and compute their initial states. A graph neural network then iteratively updates the actor states via a message passing process. Inspired by Gaussian belief propagation, we design the messages to be spatially-transformed parameters of the output distributions from neighboring agents. Our model is fully differentiable, thus enabling end-to-end training. Importantly, our probabilistic predictions can model uncertainty at the trajectory level. We demonstrate the effectiveness of our approach by achieving significant improvements over the state-of-the-art on two real-world self-driving datasets: ATG4D and nuScenes.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي علم الروبوتات

Discrete Residual Flow for Probabilistic Pedestrian Behavior Prediction

126 - Ajay Jain , Sergio Casas , Renjie Liao 2019

Self-driving vehicles plan around both static and dynamic objects, applying predictive models of behavior to estimate future locations of the objects in the environment. However, future behavior is inherently uncertain, and models of motion that prod uce deterministic outputs are limited to short timescales. Particularly difficult is the prediction of human behavior. In this work, we propose the discrete residual flow network (DRF-Net), a convolutional neural network for human motion prediction that captures the uncertainty inherent in long-range motion forecasting. In particular, our learned network effectively captures multimodal posteriors over future human motion by predicting and updating a discretized distribution over spatial locations. We compare our model against several strong competitors and show that our model outperforms all baselines.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي علم الروبوتات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد