ترغب بنشر مسار تعليمي؟ اضغط هنا

Optimal communication and control strategies in a multi-agent MDP problem

251   0   0.0 ( 0 )
 نشر من قبل Sagar Sudhakara
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

The problem of controlling multi-agent systems under different models of information sharing among agents has received significant attention in the recent literature. In this paper, we consider a setup where rather than committing to a fixed information sharing protocol (e.g. periodic sharing or no sharing etc), agents can dynamically decide at each time step whether to share information with each other and incur the resulting communication cost. This setup requires a joint design of agents communication and control strategies in order to optimize the trade-off between communication costs and control objective. We first show that agents can ignore a big part of their private information without compromising the system performance. We then provide a common information approach based solution for the strategy optimization problem. This approach relies on constructing a fictitious POMDP whose solution (obtained via a dynamic program) characterizes the optimal strategies for the agents. We also show that our solution can be easily modified to incorporate constraints on when and how frequently agents can communicate.



قيم البحث

اقرأ أيضاً

We consider the optimal coverage problem where a multi-agent network is deployed in an environment with obstacles to maximize a joint event detection probability. The objective function of this problem is non-convex and no global optimum is guarantee d by gradient-based algorithms developed to date. We first show that the objective function is monotone submodular, a class of functions for which a simple greedy algorithm is known to be within 0.63 of the optimal solution. We then derive two tighter lower bounds by exploiting the curvature information (total curvature and elemental curvature) of the objective function. We further show that the tightness of these lower bounds is complementary with respect to the sensing capabilities of the agents. The greedy algorithm solution can be subsequently used as an initial point for a gradient-based algorithm to obtain solutions even closer to the global optimum. Simulation results show that this approach leads to significantly better performance relative to previously used algorithms.
205 - Yutao Tang 2020
This paper studies an optimal consensus problem for a group of heterogeneous high-order agents with unknown control directions. Compared with existing consensus results, the consensus point is further required to an optimal solution to some distribut ed optimization problem. To solve this problem, we first augment each agent with an optimal signal generator to reproduce the global optimal point of the given distributed optimization problem, and then complete the global optimal consensus design by developing some adaptive tracking controllers for these augmented agents. Moreover, we present an extension when only real-time gradients are available. The trajectories of all agents in both cases are shown to be well-defined and achieve the expected consensus on the optimal point. Two numerical examples are given to verify the efficacy of our algorithms.
We propose a neural network approach for solving high-dimensional optimal control problems. In particular, we focus on multi-agent control problems with obstacle and collision avoidance. These problems immediately become high-dimensional, even for mo derate phase-space dimensions per agent. Our approach fuses the Pontryagin Maximum Principle and Hamilton-Jacobi-Bellman (HJB) approaches and parameterizes the value function with a neural network. Our approach yields controls in a feedback form for quick calculation and robustness to moderate disturbances to the system. We train our model using the objective function and optimality conditions of the control problem. Therefore, our training algorithm neither involves a data generation phase nor solutions from another algorithm. Our model uses empirically effective HJB penalizers for efficient training. By training on a distribution of initial states, we ensure the controls optimality is achieved on a large portion of the state-space. Our approach is grid-free and scales efficiently to dimensions where grids become impractical or infeasible. We demonstrate our approachs effectiveness on a 150-dimensional multi-agent problem with obstacles.
Distributed algorithms for both discrete-time and continuous-time linearly solvable optimal control (LSOC) problems of networked multi-agent systems (MASs) are investigated in this paper. A distributed framework is proposed to partition the optimal c ontrol problem of a networked MAS into several local optimal control problems in factorial subsystems, such that each (central) agent behaves optimally to minimize the joint cost function of a subsystem that comprises a central agent and its neighboring agents, and the local control actions (policies) only rely on the knowledge of local observations. Under this framework, we not only preserve the correlations between neighboring agents, but moderate the communication and computational complexities by decentralizing the sampling and computational processes over the network. For discrete-time systems modeled by Markov decision processes, the joint Bellman equation of each subsystem is transformed into a system of linear equations and solved using parallel programming. For continuous-time systems modeled by It^o diffusion processes, the joint optimality equation of each subsystem is converted into a linear partial differential equation, whose solution is approximated by a path integral formulation and a sample-efficient relative entropy policy search algorithm, respectively. The learned control policies are generalized to solve the unlearned tasks by resorting to the compositionality principle, and illustrative examples of cooperative UAV teams are provided to verify the effectiveness and advantages of these algorithms.
Platooning has been exploited as a method for vehicles to minimize energy consumption. In this article, we present a constraint-driven optimal control framework that yields emergent platooning behavior for connected and automated vehicles operating i n an open transportation system. Our approach combines recent insights in constraint-driven optimal control with the physical aerodynamic interactions between vehicles in a highway setting. The result is a set of equations that describes when platooning is an appropriate strategy, as well as a descriptive optimal control law that yields emergent platooning behavior. Finally, we demonstrate these properties in simulation and with a real-time experiment in a scaled testbed.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا