ترغب بنشر مسار تعليمي؟ اضغط هنا

Structured Diversification Emergence via Reinforced Organization Control and Hierarchical Consensus Learning

138   0   0.0 ( 0 )
 نشر من قبل Wenhao Li
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

When solving a complex task, humans will spontaneously form teams and to complete different parts of the whole task, respectively. Meanwhile, the cooperation between teammates will improve efficiency. However, for current cooperative MARL methods, the cooperation team is constructed through either heuristics or end-to-end blackbox optimization. In order to improve the efficiency of cooperation and exploration, we propose a structured diversification emergence MARL framework named {sc{Rochico}} based on reinforced organization control and hierarchical consensus learning. {sc{Rochico}} first learns an adaptive grouping policy through the organization control module, which is established by independent multi-agent reinforcement learning. Further, the hierarchical consensus module based on the hierarchical intentions with consensus constraint is introduced after team formation. Simultaneously, utilizing the hierarchical consensus module and a self-supervised intrinsic reward enhanced decision module, the proposed cooperative MARL algorithm {sc{Rochico}} can output the final diversified multi-agent cooperative policy. All three modules are organically combined to promote the structured diversification emergence. Comparative experiments on four large-scale cooperation tasks show that {sc{Rochico}} is significantly better than the current SOTA algorithms in terms of exploration efficiency and cooperation strength.

قيم البحث

اقرأ أيضاً

Mean field control (MFC) is an effective way to mitigate the curse of dimensionality of cooperative multi-agent reinforcement learning (MARL) problems. This work considers a collection of $N_{mathrm{pop}}$ heterogeneous agents that can be segregated into $K$ classes such that the $k$-th class contains $N_k$ homogeneous agents. We aim to prove approximation guarantees of the MARL problem for this heterogeneous system by its corresponding MFC problem. We consider three scenarios where the reward and transition dynamics of all agents are respectively taken to be functions of $(1)$ joint state and action distributions across all classes, $(2)$ individual distributions of each class, and $(3)$ marginal distributions of the entire population. We show that, in these cases, the $K$-class MARL problem can be approximated by MFC with errors given as $e_1=mathcal{O}(frac{sqrt{|mathcal{X}||mathcal{U}|}}{N_{mathrm{pop}}}sum_{k}sqrt{N_k})$, $e_2=mathcal{O}(sqrt{|mathcal{X}||mathcal{U}|}sum_{k}frac{1}{sqrt{N_k}})$ and $e_3=mathcal{O}left(sqrt{|mathcal{X}||mathcal{U}|}left[frac{A}{N_{mathrm{pop}}}sum_{kin[K]}sqrt{N_k}+frac{B}{sqrt{N_{mathrm{pop}}}}right]right)$, respectively, where $A, B$ are some constants and $|mathcal{X}|,|mathcal{U}|$ are the sizes of state and action spaces of each agent. Finally, we design a Natural Policy Gradient (NPG) based algorithm that, in the three cases stated above, can converge to an optimal MARL policy within $mathcal{O}(e_j)$ error with a sample complexity of $mathcal{O}(e_j^{-3})$, $jin{1,2,3}$, respectively.
The recommender system is an important form of intelligent application, which assists users to alleviate from information redundancy. Among the metrics used to evaluate a recommender system, the metric of conversion has become more and more important . The majority of existing recommender systems perform poorly on the metric of conversion due to its extremely sparse feedback signal. To tackle this challenge, we propose a deep hierarchical reinforcement learning based recommendation framework, which consists of two components, i.e., high-level agent and low-level agent. The high-level agent catches long-term sparse conversion signals, and automatically sets abstract goals for low-level agent, while the low-level agent follows the abstract goals and interacts with real-time environment. To solve the inherent problem in hierarchical reinforcement learning, we propose a novel deep hierarchical reinforcement learning algorithm via multi-goals abstraction (HRL-MG). Our proposed algorithm contains three characteristics: 1) the high-level agent generates multiple goals to guide the low-level agent in different stages, which reduces the difficulty of approaching high-level goals; 2) different goals share the same state encoder parameters, which increases the update frequency of the high-level agent and thus accelerates the convergence of our proposed algorithm; 3) an appreciate benefit assignment function is designed to allocate rewards in each goal so as to coordinate different goals in a consistent direction. We evaluate our proposed algorithm based on a real-world e-commerce dataset and validate its effectiveness.
Model-based methods are the dominant paradigm for controlling robotic systems, though their efficacy depends heavily on the accuracy of the model used. Deep neural networks have been used to learn models of robot dynamics from data, but they suffer f rom data-inefficiency and the difficulty to incorporate prior knowledge. We introduce Structured Mechanical Models, a flexible model class for mechanical systems that are data-efficient, easily amenable to prior knowledge, and easily usable with model-based control techniques. The goal of this work is to demonstrate the benefits of using Structured Mechanical Models in lieu of black-box neural networks when modeling robot dynamics. We demonstrate that they generalize better from limited data and yield more reliable model-based controllers on a variety of simulated robotic domains.
103 - Meng Zhang , Ermin Wei , 2021
Federated learning enables machine learning algorithms to be trained over a network of multiple decentralized edge devices without requiring the exchange of local datasets. Successfully deploying federated learning requires ensuring that agents (e.g. , mobile devices) faithfully execute the intended algorithm, which has been largely overlooked in the literature. In this study, we first use risk bounds to analyze how the key feature of federated learning, unbalanced and non-i.i.d. data, affects agents incentives to voluntarily participate and obediently follow traditional federated learning algorithms. To be more specific, our analysis reveals that agents with less typical data distributions and relatively more samples are more likely to opt out of or tamper with federated learning algorithms. To this end, we formulate the first faithful implementation problem of federated learning and design two faithful federated learning mechanisms which satisfy economic properties, scalability, and privacy. Further, the time complexity of computing all agents payments in the number of agents is $mathcal{O}(1)$. First, we design a Faithful Federated Learning (FFL) mechanism which approximates the Vickrey-Clarke-Groves (VCG) payments via an incremental computation. We show that it achieves (probably approximate) optimality, faithful implementation, voluntary participation, and some other economic properties (such as budget balance). Second, by partitioning agents into several subsets, we present a scalable VCG mechanism approximation. We further design a scalable and Differentially Private FFL (DP-FFL) mechanism, the first differentially private faithful mechanism, that maintains the economic properties. Our mechanism enables one to make three-way performance tradeoffs among privacy, the iterations needed, and payment accuracy loss.
Reinforcement learning has the potential to automate the acquisition of behavior in complex settings, but in order for it to be successfully deployed, a number of practical challenges must be addressed. First, in real world settings, when an agent at tempts a task and fails, the environment must somehow reset so that the agent can attempt the task again. While easy in simulation, this could require considerable human effort in the real world, especially if the number of trials is very large. Second, real world learning often involves complex, temporally extended behavior that is often difficult to acquire with random exploration. While these two problems may at first appear unrelated, in this work, we show how a single method can allow an agent to acquire skills with minimal supervision while removing the need for resets. We do this by exploiting the insight that the need to reset an agent to a broad set of initial states for a learning task provides a natural setting to learn a diverse set of reset-skills. We propose a general-sum game formulation that balances the objectives of resetting and learning skills, and demonstrate that this approach improves performance on reset-free tasks, and additionally show that the skills we obtain can be used to significantly accelerate downstream learning.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا