أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Dong Yang

A differential graded approach to the silting theorem

113 - Zongzhen Xie , Dong Yang , Houjun Zhang 2021

A silting theorem was established by Buan and Zhou as a generalisation of the classical tilting theorem of Brenner and Butler. In this paper, we give an alternative proof of the theorem by using differential graded algebras.

نظرية التمثيل حلقات وجبر

Exploring Simple 3D Multi-Object Tracking for Autonomous Driving

137 - Chenxu Luo , Xiaodong Yang , Alan Yuille 2021

3D multi-object tracking in LiDAR point clouds is a key ingredient for self-driving vehicles. Existing methods are predominantly based on the tracking-by-detection pipeline and inevitably require a heuristic matching step for the detection associatio n. In this paper, we present SimTrack to simplify the hand-crafted tracking paradigm by proposing an end-to-end trainable model for joint detection and tracking from raw point clouds. Our key design is to predict the first-appear location of each object in a given snippet to get the tracking identity and then update the location based on motion estimation. In the inference, the heuristic matching step can be completely waived by a simple read-off operation. SimTrack integrates the tracked object association, newborn object detection, and dead track killing in a single unified model. We conduct extensive evaluations on two large-scale datasets: nuScenes and Waymo Open Dataset. Experimental results reveal that our simple approach compares favorably with the state-of-the-art methods while ruling out the heuristic matching rules.

الرؤية الحاسوبية وتمييز الأنماط

Settling the Variance of Multi-Agent Policy Gradients

110 - Jakub Grudzien Kuba , Muning Wen , Yaodong Yang 2021

Policy gradient (PG) methods are popular reinforcement learning (RL) methods where a baseline is often applied to reduce the variance of gradient estimates. In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectivenes s of multi-agent PG (MAPG) methods degrades as the variance of gradient estimates increases rapidly with the number of agents. In this paper, we offer a rigorous analysis of MAPG methods by, firstly, quantifying the contributions of the number of agents and agents explorations to the variance of MAPG estimators. Based on this analysis, we derive the optimal baseline (OB) that achieves the minimal variance. In comparison to the OB, we measure the excess variance of existing MARL algorithms such as vanilla MAPG and COMA. Considering using deep neural networks, we also propose a surrogate version of OB, which can be seamlessly plugged into any existing PG methods in MARL. On benchmarks of Multi-Agent MuJoCo and StarCraft challenges, our OB technique effectively stabilises training and improves the performance of multi-agent PPO and COMA algorithms by a significant margin.

التعلم الآلي الذكاء الاصطناعي أنظمة متعددة العملاء

Neural Network Repair with Reachability Analysis

82 - Xiaodong Yang , Tom Yamaguchi , Hoang-Dung Tran 2021

Safety is a critical concern for the next generation of autonomy that is likely to rely heavily on deep neural networks for perception and control. Formally verifying the safety and robustness of well-trained DNNs and learning-enabled systems under a ttacks, model uncertainties, and sensing errors is essential for safe autonomy. This research proposes a framework to repair unsafe DNNs in safety-critical systems with reachability analysis. The repair process is inspired by adversarial training which has demonstrated high effectiveness in improving the safety and robustness of DNNs. Different from traditional adversarial training approaches where adversarial examples are utilized from random attacks and may not be representative of all unsafe behaviors, our repair process uses reachability analysis to compute the exact unsafe regions and identify sufficiently representative examples to enhance the efficacy and efficiency of the adversarial training. The performance of our framework is evaluated on two types of benchmarks without safe models as references. One is a DNN controller for aircraft collision avoidance with access to training data. The other is a rocket lander where our framework can be seamlessly integrated with the well-known deep deterministic policy gradient (DDPG) reinforcement learning algorithm. The experimental results show that our framework can successfully repair all instances on multiple safety specifications with negligible performance degradation. In addition, to increase the computational and memory efficiency of the reachability analysis algorithm, we propose a depth-first-search algorithm that combines an existing exact analysis method with an over-approximation approach based on a new set representation. Experimental results show that our method achieves a five-fold improvement in runtime and a two-fold improvement in memory usage compared to exact analysis.

التعلم الآلي

Reachability Analysis of Convolutional Neural Networks

98 - Xiaodong Yang , Tomoya Yamaguchi , Hoang-Dung Tran 2021

Deep convolutional neural networks have been widely employed as an effective technique to handle complex and practical problems. However, one of the fundamental problems is the lack of formal methods to analyze their behavior. To address this challen ge, we propose an approach to compute the exact reachable sets of a network given an input domain, where the reachable set is represented by the face lattice structure. Besides the computation of reachable sets, our approach is also capable of backtracking to the input domain given an output reachable set. Therefore, a full analysis of a networks behavior can be realized. In addition, an approach for fast analysis is also introduced, which conducts fast computation of reachable sets by considering selected sensitive neurons in each layer. The exact pixel-level reachability analysis method is evaluated on a CNN for the CIFAR10 dataset and compared to related works. The fast analysis method is evaluated over a CNN CIFAR10 dataset and VGG16 architecture for the ImageNet dataset.

الرؤية الحاسوبية وتمييز الأنماط

Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games

156 - Xidong Feng , Oliver Slumbers , Yaodong Yang 2021

When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population. Within such a process, the update rules of who to compete with (i.e., the opponent mixture) and how to beat them (i.e., finding best responses) are underpinned by manually developed game theoretical principles such as fictitious play and Double Oracle. In this paper we introduce a framework, LMAC, based on meta-gradient descent that automates the discovery of the update rule without explicit human design. Specifically, we parameterise the opponent selection module by neural networks and the best-response module by optimisation subroutines, and update their parameters solely via interaction with the game engine, where both players aim to minimise their exploitability. Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance with the state-of-the-art population-based game solvers (e.g., PSRO) on Games of Skill, differentiable Lotto, non-transitive Mixture Games, Iterated Matching Pennies, and Kuhn Poker. Additionally, we show that LMAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO on Leduc Poker. Our work inspires a promising future direction to discover general MARL algorithms solely from data.

الذكاء الاصطناعي أنظمة متعددة العملاء

Learning in Nonzero-Sum Stochastic Games with Potentials

128 - David Mguni , Yutong Wu , Yali Du 2021

Multi-agent reinforcement learning (MARL) has become effective in tackling discrete cooperative game scenarios. However, MARL has yet to penetrate settings beyond those modelled by team and zero-sum games, confining it to a small subset of multi-agen t systems. In this paper, we introduce a new generation of MARL learners that can handle nonzero-sum payoff structures and continuous settings. In particular, we study the MARL problem in a class of games known as stochastic potential games (SPGs) with continuous state-action spaces. Unlike cooperative games, in which all agents share a common reward, SPGs are capable of modelling real-world scenarios where agents seek to fulfil their individual goals. We prove theoretically our learning method, SPot-AC, enables independent agents to learn Nash equilibrium strategies in polynomial time. We demonstrate our framework tackles previously unsolvable tasks such as Coordination Navigation and large selfish routing games and that it outperforms the state of the art MARL baselines such as MADDPG and COMIX in such scenarios.

أنظمة متعددة العملاء

Learning to Shape Rewards using a Game of Switching Controls

223 - David Mguni , Jianhong Wang , Taher Jafferjee 2021

Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered shaping-reward functions whose construction is time-consuming a nd error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimal Shaping Algorithm (ROSA), an automated RS framework in which the shaping-reward function is constructed in a novel Markov game between two agents. A reward-shaping agent (Shaper) uses switching controls to determine which states to add shaping rewards and their optimal values while the other agent (Controller) learns the optimal policy for the task using these shaped rewards. We prove that ROSA, which easily adopts existing RL algorithms, learns to construct a shaping-reward function that is tailored to the task thus ensuring efficient convergence to high performance policies. We demonstrate ROSAs congenial properties in three carefully designed experiments and show its superior performance against state-of-the-art RS algorithms in challenging sparse reward environments.

التعلم الآلي الذكاء الاصطناعي علوم الكمبيوتر ونظرية الألعاب

Modelling Behavioural Diversity for Learning in Open-Ended Games

223 - Nicolas Perez Nieves , Yaodong Yang , Oliver Slumbers 2021

Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on determinantal point processes (DPP). By incorporating the diversity metric into best-response dynamics, we develop diverse fictitious play and diverse policy-space response oracle for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the gamescape -- convex polytopes spanned by agents mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve at least the same, and in most games, lower exploitability than PSRO solvers by finding effective and diverse strategies.

الذكاء الاصطناعي علوم الكمبيوتر ونظرية الألعاب أنظمة متعددة العملاء

Online Double Oracle

118 - Le Cong Dinh , Yaodong Yang , Zheng Tian 2021

Solving strategic games with huge action space is a critical yet under-explored topic in economics, operations research and artificial intelligence. This paper proposes new learning algorithms for solving two-player zero-sum normal-form games where t he number of pure strategies is prohibitively large. Specifically, we combine no-regret analysis from online learning with Double Oracle (DO) methods from game theory. Our method -- emph{Online Double Oracle (ODO)} -- is provably convergent to a Nash equilibrium (NE). Most importantly, unlike normal DO methods, ODO is emph{rationale} in the sense that each agent in ODO can exploit strategic adversary with a regret bound of $mathcal{O}(sqrt{T k log(k)})$ where $k$ is not the total number of pure strategies, but rather the size of emph{effective strategy set} that is linearly dependent on the support size of the NE. On tens of different real-world games, ODO outperforms DO, PSRO methods, and no-regret algorithms such as Multiplicative Weight Update by a significant margin, both in terms of convergence rate to a NE and average payoff against strategic adversaries.

الذكاء الاصطناعي علوم الكمبيوتر ونظرية الألعاب

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد