ترغب بنشر مسار تعليمي؟ اضغط هنا

Social Attention for Autonomous Decision-Making in Dense Traffic

70   0   0.0 ( 0 )
 نشر من قبل Edouard Leurent
 تاريخ النشر 2019
والبحث باللغة English




اسأل ChatGPT حول البحث

We study the design of learning architectures for behavioural planning in a dense traffic setting. Such architectures should deal with a varying number of nearby vehicles, be invariant to the ordering chosen to describe them, while staying accurate and compact. We observe that the two most popular representations in the literature do not fit these criteria, and perform badly on an complex negotiation task. We propose an attention-based architecture that satisfies all these properties and explicitly accounts for the existing interactions between the traffic participants. We show that this architecture leads to significant performance gains, and is able to capture interactions patterns that can be visualised and qualitatively interpreted. Videos and code are available at https://eleurent.github.io/social-attention/.

قيم البحث

اقرأ أيضاً

239 - Wenjun Zeng , Yi Liu 2021
In membership/subscriber acquisition and retention, we sometimes need to recommend marketing content for multiple pages in sequence. Different from general sequential decision making process, the use cases have a simpler flow where customers per seei ng recommended content on each page can only return feedback as moving forward in the process or dropping from it until a termination state. We refer to this type of problems as sequential decision making in linear--flow. We propose to formulate the problem as an MDP with Bandits where Bandits are employed to model the transition probability matrix. At recommendation time, we use Thompson sampling (TS) to sample the transition probabilities and allocate the best series of actions with analytical solution through exact dynamic programming. The way that we formulate the problem allows us to leverage TSs efficiency in balancing exploration and exploitation and Bandits convenience in modeling actions incompatibility. In the simulation study, we observe the proposed MDP with Bandits algorithm outperforms Q-learning with $epsilon$-greedy and decreasing $epsilon$, independent Bandits, and interaction Bandits. We also find the proposed algorithms performance is the most robust to changes in the across-page interdependence strength.
Many sequential decision-making tasks require choosing at each decision step the right action out of the vast set of possibilities by extracting actionable intelligence from high-dimensional data streams. Most of the times, the high-dimensionality of actions and data makes learning of the optimal actions by traditional learning methods impracticable. In this work, we investigate how to discover and leverage sparsity in actions and data to enable fast learning. As our learning model, we consider a structured contextual multi-armed bandit (CMAB) with high-dimensional arm (action) and context (data) sets, where the rewards depend only on a few relevant dimensions of the joint context-arm set, possibly in a non-linear way. We depart from the prior work by assuming a high-dimensional, continuum set of arms, and allow relevant context dimensions to vary for each arm. We propose a new online learning algorithm called {em CMAB with Relevance Learning} (CMAB-RL) and prove that its time-averaged regret asymptotically goes to zero when the expected reward varies smoothly in contexts and arms. CMAB-RL enjoys a substantially improved regret bound compared to classical CMAB algorithms whose regrets depend on dimensions $d_x$ and $d_a$ of the context and arm sets. Importantly, we show that when the learner has prior knowledge on sparsity, given in terms of upper bounds $overline{d}_x$ and $overline{d}_a$ on the number of relevant dimensions, then CMAB-RL achieves $tilde{O}(T^{1-1/(2+2overline{d}_x +overline{d}_a)})$ regret. Finally, we illustrate how CMAB algorithms can be used for optimal personalized blood glucose control in type 1 diabetes mellitus patients, and show that CMAB-RL outperforms other contextual MAB algorithms in this task.
94 - Renzhe Xu , Peng Cui , Kun Kuang 2020
Nowadays fairness issues have raised great concerns in decision-making systems. Various fairness notions have been proposed to measure the degree to which an algorithm is unfair. In practice, there frequently exist a certain set of variables we term as fair variables, which are pre-decision covariates such as users choices. The effects of fair variables are irrelevant in assessing the fairness of the decision support algorithm. We thus define conditional fairness as a more sound fairness metric by conditioning on the fairness variables. Given different prior knowledge of fair variables, we demonstrate that traditional fairness notations, such as demographic parity and equalized odds, are special cases of our conditional fairness notations. Moreover, we propose a Derivable Conditional Fairness Regularizer (DCFR), which can be integrated into any decision-making model, to track the trade-off between precision and fairness of algorithmic decision making. Specifically, an adversarial representation based conditional independence loss is proposed in our DCFR to measure the degree of unfairness. With extensive experiments on three real-world datasets, we demonstrate the advantages of our conditional fairness notation and DCFR.
We propose SLTD (`Sequential Learning-to-Defer) a framework for learning-to-defer pre-emptively to an expert in sequential decision-making settings. SLTD measures the likelihood of improving value of deferring now versus later based on the underlying uncertainty in dynamics. In particular, we focus on the non-stationarity in the dynamics to accurately learn the deferral policy. We demonstrate our pre-emptive deferral can identify regions where the current policy has a low probability of improving outcomes. SLTD outperforms existing non-sequential learning-to-defer baselines, whilst reducing overall uncertainty on multiple synthetic and real-world simulators with non-stationary dynamics. We further derive and decompose the propagated (long-term) uncertainty for interpretation by the domain expert to provide an indication of when the models performance is reliable.
Autonomous navigation in crowded, complex urban environments requires interacting with other agents on the road. A common solution to this problem is to use a prediction model to guess the likely future actions of other agents. While this is reasonab le, it leads to overly conservative plans because it does not explicitly model the mutual influence of the actions of interacting agents. This paper builds a reinforcement learning-based method named MIDAS where an ego-agent learns to affect the control actions of other cars in urban driving scenarios. MIDAS uses an attention-mechanism to handle an arbitrary number of other agents and includes a driver-type parameter to learn a single policy that works across different planning objectives. We build a simulation environment that enables diverse interaction experiments with a large number of agents and methods for quantitatively studying the safety, efficiency, and interaction among vehicles. MIDAS is validated using extensive experiments and we show that it (i) can work across different road geometries, (ii) results in an adaptive ego policy that can be tuned easily to satisfy performance criteria such as aggressive or cautious driving, (iii) is robust to changes in the driving policies of external agents, and (iv) is more efficient and safer than existing approaches to interaction-aware decision-making.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا