No Arabic abstract
Trajectory prediction in urban mixed-traffic zones (a.k.a. shared spaces) is critical for many intelligent transportation systems, such as intent detection for autonomous driving. However, there are many challenges to predict the trajectories of heterogeneous road agents (pedestrians, cyclists and vehicles) at a microscopical level. For example, an agent might be able to choose multiple plausible paths in complex interactions with other agents in varying environments. To this end, we propose an approach named Multi-Context Encoder Network (MCENET) that is trained by encoding both past and future scene context, interaction context and motion information to capture the patterns and variations of the future trajectories using a set of stochastic latent variables. In inference time, we combine the past context and motion information of the target agent with samplings of the latent variables to predict multiple realistic trajectories in the future. Through experiments on several datasets of varying scenes, our method outperforms some of the recent state-of-the-art methods for mixed traffic trajectory prediction by a large margin and more robust in a very challenging environment. The impact of each context is justified via ablation studies.
Multi-agent interacting systems are prevalent in the world, from pure physical systems to complicated social dynamic systems. In many applications, effective understanding of the situation and accurate trajectory prediction of interactive agents play a significant role in downstream tasks, such as decision making and planning. In this paper, we propose a generic trajectory forecasting framework (named EvolveGraph) with explicit relational structure recognition and prediction via latent interaction graphs among multiple heterogeneous, interactive agents. Considering the uncertainty of future behaviors, the model is designed to provide multi-modal prediction hypotheses. Since the underlying interactions may evolve even with abrupt changes, and different modalities of evolution may lead to different outcomes, we address the necessity of dynamic relational reasoning and adaptively evolving the interaction graphs. We also introduce a double-stage training pipeline which not only improves training efficiency and accelerates convergence, but also enhances model performance. The proposed framework is evaluated on both synthetic physics simulations and multiple real-world benchmark datasets in various areas. The experimental results illustrate that our approach achieves state-of-the-art performance in terms of prediction accuracy.
Trajectory prediction is critical for applications of planning safe future movements and remains challenging even for the next few seconds in urban mixed traffic. How an agent moves is affected by the various behaviors of its neighboring agents in different environments. To predict movements, we propose an end-to-end generative model named Attentive Maps Encoder Network (AMENet) that encodes the agents motion and interaction information for accurate and realistic multi-path trajectory prediction. A conditional variational auto-encoder module is trained to learn the latent space of possible future paths based on attentive dynamic maps for interaction modeling and then is used to predict multiple plausible future trajectories conditioned on the observed past trajectories. The efficacy of AMENet is validated using two public trajectory prediction benchmarks Trajnet and InD.
To accurately predict future positions of different agents in traffic scenarios is crucial for safely deploying intelligent autonomous systems in the real-world environment. However, it remains a challenge due to the behavior of a target agent being affected by other agents dynamically and there being more than one socially possible paths the agent could take. In this paper, we propose a novel framework, named Dynamic Context Encoder Network (DCENet). In our framework, first, the spatial context between agents is explored by using self-attention architectures. Then, the two-stream encoders are trained to learn temporal context between steps by taking the respective observed trajectories and the extracted dynamic spatial context as input. The spatial-temporal context is encoded into a latent space using a Conditional Variational Auto-Encoder (CVAE) module. Finally, a set of future trajectories for each agent is predicted conditioned on the learned spatial-temporal context by sampling from the latent space, repeatedly. DCENet is evaluated on one of the most popular challenging benchmarks for trajectory forecasting Trajnet and reports a new state-of-the-art performance. It also demonstrates superior performance evaluated on the benchmark inD for mixed traffic at intersections. A series of ablation studies is conducted to validate the effectiveness of each proposed module. Our code is available at https://github.com/wtliao/DCENet.
We propose advances that address two key challenges in future trajectory prediction: (i) multimodality in both training data and predictions and (ii) constant time inference regardless of number of agents. Existing trajectory predictions are fundamentally limited by lack of diversity in training data, which is difficult to acquire with sufficient coverage of possible modes. Our first contribution is an automatic method to simulate diverse trajectories in the top-view. It uses pre-existing datasets and maps as initialization, mines existing trajectories to represent realistic driving behaviors and uses a multi-agent vehicle dynamics simulator to generate diverse new trajectories that cover various modes and are consistent with scene layout constraints. Our second contribution is a novel method that generates diverse predictions while accounting for scene semantics and multi-agent interactions, with constant-time inference independent of the number of agents. We propose a convLSTM with novel state pooling operations and losses to predict scene-consistent states of multiple agents in a single forward pass, along with a CVAE for diversity. We validate our proposed multi-agent trajectory prediction approach by training and testing on the proposed simulated dataset and existing real datasets of traffic scenes. In both cases, our approach outperforms SOTA methods by a large margin, highlighting the benefits of both our diverse dataset simulation and constant-time diverse trajectory prediction methods.
It is essential but challenging to predict future trajectories of various agents in complex scenes. Whether it is internal personality factors of agents, interactive behavior of the neighborhood, or the influence of surroundings, it will have an impact on their future behavior styles. It means that even for the same physical type of agents, there are huge differences in their behavior preferences. Although recent works have made significant progress in studying agents multi-modal plannings, most of them still apply the same prediction strategy to all agents, which makes them difficult to fully show the multiple styles of vast agents. In this paper, we propose the Multi-Style Network (MSN) to focus on this problem by divide agents preference styles into several hidden behavior categories adaptively and train each categorys prediction network separately, therefore giving agents all styles of predictions simultaneously. Experiments demonstrate that our deterministic MSN-D and generative MSN-G outperform many recent state-of-the-art methods and show better multi-style characteristics in the visualized results.