ترغب بنشر مسار تعليمي؟ اضغط هنا

A Framework for Automatic Behavior Generation in Multi-Function Swarms

54   0   0.0 ( 0 )
 نشر من قبل Sondre Engebr{\\aa}ten Msc
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Multi-function swarms are swarms that solve multiple tasks at once. For example, a quadcopter swarm could be tasked with exploring an area of interest while simultaneously functioning as ad-hoc relays. With this type of multi-function comes the challenge of handling potentially conflicting requirements simultaneously. Using the Quality-Diversity algorithm MAP-elites in combination with a suitable controller structure, a framework for automatic behavior generation in multi-function swarms is proposed. The framework is tested on a scenario with three simultaneous tasks: exploration, communication network creation and geolocation of RF emitters. A repertoire is evolved, consisting of a wide range of controllers, or behavior primitives, with different characteristics and trade-offs in the different tasks. This repertoire would enable the swarm to transition between behavior trade-offs online, according to the situational requirements. Furthermore, the effect of noise on the behavior characteristics in MAP-elites is investigated. A moderate number of re-evaluations is found to increase the robustness while keeping the computational requirements relatively low. A few selected controllers are examined, and the dynamics of transitioning between these controllers are explored. Finally, the study develops a methodology for analyzing the makeup of the resulting controllers. This is done through a parameter variation study where the importance of individual inputs to the swarm controllers is assessed and analyzed.

قيم البحث

اقرأ أيضاً

281 - Chao Wen , Miao Xu , Zhilin Zhang 2021
In online advertising, auto-bidding has become an essential tool for advertisers to optimize their preferred ad performance metrics by simply expressing the high-level campaign objectives and constraints. Previous works consider the design of auto-bi dding agents from the single-agent view without modeling the mutual influence between agents. In this paper, we instead consider this problem from the perspective of a distributed multi-agent system, and propose a general Multi-Agent reinforcement learning framework for Auto-Bidding, namely MAAB, to learn the auto-bidding strategies. First, we investigate the competition and cooperation relation among auto-bidding agents, and propose temperature-regularized credit assignment for establishing a mixed cooperative-competitive paradigm. By carefully making a competition and cooperation trade-off among the agents, we can reach an equilibrium state that guarantees not only individual advertisers utility but also the system performance (social welfare). Second, due to the observed collusion behaviors of bidding low prices underlying the cooperation, we further propose bar agents to set a personalized bidding bar for each agent, and then to alleviate the degradation of revenue. Third, to deploy MAAB to the large-scale advertising system with millions of advertisers, we propose a mean-field approach. By grouping advertisers with the same objective as a mean auto-bidding agent, the interactions among advertisers are greatly simplified, making it practical to train MAAB efficiently. Extensive experiments on the offline industrial dataset and Alibaba advertising platform demonstrate that our approach outperforms several baseline methods in terms of social welfare and guarantees the ad platforms revenue.
Population-based multi-agent reinforcement learning (PB-MARL) refers to the series of methods nested with reinforcement learning (RL) algorithms, which produces a self-generated sequence of tasks arising from the coupled population dynamics. By lever aging auto-curricula to induce a population of distinct emergent strategies, PB-MARL has achieved impressive success in tackling multi-agent tasks. Despite remarkable prior arts of distributed RL frameworks, PB-MARL poses new challenges for parallelizing the training frameworks due to the additional complexity of multiple nested workloads between sampling, training and evaluation involved with heterogeneous policy interactions. To solve these problems, we present MALib, a scalable and efficient computing framework for PB-MARL. Our framework is comprised of three key components: (1) a centralized task dispatching model, which supports the self-generated tasks and scalable training with heterogeneous policy combinations; (2) a programming architecture named Actor-Evaluator-Learner, which achieves high parallelism for both training and sampling, and meets the evaluation requirement of auto-curriculum learning; (3) a higher-level abstraction of MARL training paradigms, which enables efficient code reuse and flexible deployments on different distributed computing paradigms. Experiments on a series of complex tasks such as multi-agent Atari Games show that MALib achieves throughput higher than 40K FPS on a single machine with $32$ CPU cores; 5x speedup than RLlib and at least 3x speedup than OpenSpiel in multi-agent training tasks. MALib is publicly available at https://github.com/sjtu-marl/malib.
In many specific scenarios, accurate and effective system identification is a commonly encountered challenge in the model predictive control (MPC) formulation. As a consequence, the overall system performance could be significantly degraded in outcom e when the traditional MPC algorithm is adopted under those circumstances when such accuracy is lacking. To cater to this rather major shortcoming, this paper investigates a non-parametric behavior learning method for multi-agent decision making, which underpins an alternate data-driven predictive control framework. Utilizing an innovative methodology with closed-loop input/output measurements of the unknown system, the behavior of the system is learned based on the collected dataset, and thus the constructed non-parametric predictive model can be used for the determination of optimal control actions. This non-parametric predictive control framework attains the noteworthy key advantage of alleviating the heavy computational burden commonly encountered in the optimization procedures otherwise involved. Such requisite optimization procedures are typical in existing methodologies requiring open-loop input/output measurement data collection and parametric system identification. Then with a conservative approximation of probabilistic chance constraints for the MPC problem, a resulting deterministic optimization problem is formulated and solved effectively. This intuitive data-driven approach is also shown to preserve good robustness properties (even in the inevitable existence of parametric uncertainties that naturally arise in the typical system identification process). Finally, a multi-drone system is used to demonstrate the practical appeal and highly effective outcome of this promising development.
UAV swarms have triggered wide concern due to their potential application values in recent years. While there are studies proposed in terms of the architecture design for UAV swarms, two main challenges still exist: (1) Scalability, supporting a larg e scale of vehicles; (2) Versatility, integrating diversified missions. To this end, a multi-layered and distributed architecture for mission oriented miniature fixed-wing UAV swarms is presented in this paper. The proposed architecture is built on the concept of modularity. It divides the overall system to five layers: low-level control, high-level control, coordination, communication and human interaction layers, and many modules that can be viewed as black boxes with interfaces of inputs and outputs. In this way, not only the complexity of developing a large system can be reduced, but also the versatility of supporting diversified missions can be ensured. Furthermore, the proposed architecture is fully distributed that each UAV performs the decision-making procedure autonomously so as to achieve better scalability. Moreover, different kinds of aerial platforms can be feasibly extended by using the control allocation matrices and the integrated hardware box. A prototype swarm system based on the proposed architecture is built and the proposed architecture is evaluated through field experiments with a scale of 21 fixed-wing UAVs. Particularly, to the best of our knowledge, this paper is the first work which successfully demonstrates formation flight, target recognition and tracking missions within an integrated architecture for fixed-wing UAV swarms through field experiments.
Recent work from the reinforcement learning community has shown that Evolution Strategies are a fast and scalable alternative to other reinforcement learning methods. In this paper we show that Evolution Strategies are a special case of model-based s tochastic search methods. This class of algorithms has nice asymptotic convergence properties and known convergence rates. We show how these methods can be used to solve both cooperative and competitive multi-agent problems in an efficient manner. We demonstrate the effectiveness of this approach on two complex multi-agent UAV swarm combat scenarios: where a team of fixed wing aircraft must attack a well-defended base, and where two teams of agents go head to head to defeat each other.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا