ترغب بنشر مسار تعليمي؟ اضغط هنا

A Neural Network Approach Applied to Multi-Agent Optimal Control

130   0   0.0 ( 0 )
 نشر من قبل Derek Onken
 تاريخ النشر 2020
  مجال البحث
والبحث باللغة English




اسأل ChatGPT حول البحث

We propose a neural network approach for solving high-dimensional optimal control problems. In particular, we focus on multi-agent control problems with obstacle and collision avoidance. These problems immediately become high-dimensional, even for moderate phase-space dimensions per agent. Our approach fuses the Pontryagin Maximum Principle and Hamilton-Jacobi-Bellman (HJB) approaches and parameterizes the value function with a neural network. Our approach yields controls in a feedback form for quick calculation and robustness to moderate disturbances to the system. We train our model using the objective function and optimality conditions of the control problem. Therefore, our training algorithm neither involves a data generation phase nor solutions from another algorithm. Our model uses empirically effective HJB penalizers for efficient training. By training on a distribution of initial states, we ensure the controls optimality is achieved on a large portion of the state-space. Our approach is grid-free and scales efficiently to dimensions where grids become impractical or infeasible. We demonstrate our approachs effectiveness on a 150-dimensional multi-agent problem with obstacles.



قيم البحث

اقرأ أيضاً

We propose a neural network approach for solving high-dimensional optimal control problems arising in real-time applications. Our approach yields controls in a feedback form and can therefore handle uncertainties such as perturbations to the systems state. We accomplish this by fusing the Pontryagin Maximum Principle (PMP) and Hamilton-Jacobi-Bellman (HJB) approaches and parameterizing the value function with a neural network. We train our neural network model using the objective function of the control problem and penalty terms that enforce the HJB equations. Therefore, our training algorithm does not involve data generated by another algorithm. By training on a distribution of initial states, we ensure the controls optimality on a large portion of the state-space. Our grid-free approach scales efficiently to dimensions where grids become impractical or infeasible. We demonstrate the effectiveness of our approach on several multi-agent collision-avoidance problems in up to 150 dimensions. Furthermore, we empirically observe that the number of parameters in our approach scales linearly with the dimension of the control problem, thereby mitigating the curse of dimensionality.
205 - Yutao Tang 2020
This paper studies an optimal consensus problem for a group of heterogeneous high-order agents with unknown control directions. Compared with existing consensus results, the consensus point is further required to an optimal solution to some distribut ed optimization problem. To solve this problem, we first augment each agent with an optimal signal generator to reproduce the global optimal point of the given distributed optimization problem, and then complete the global optimal consensus design by developing some adaptive tracking controllers for these augmented agents. Moreover, we present an extension when only real-time gradients are available. The trajectories of all agents in both cases are shown to be well-defined and achieve the expected consensus on the optimal point. Two numerical examples are given to verify the efficacy of our algorithms.
Control of complex systems involves both system identification and controller design. Deep neural networks have proven to be successful in many identification tasks, however, from model-based control perspective, these networks are difficult to work with because they are typically nonlinear and nonconvex. Therefore many systems are still identified and controlled based on simple linear models despite their poor representation capability. In this paper we bridge the gap between model accuracy and control tractability faced by neural networks, by explicitly constructing networks that are convex with respect to their inputs. We show that these input convex networks can be trained to obtain accurate models of complex physical systems. In particular, we design input convex recurrent neural networks to capture temporal behavior of dynamical systems. Then optimal controllers can be achieved via solving a convex model predictive control problem. Experiment results demonstrate the good potential of the proposed input convex neural network based approach in a variety of control applications. In particular we show that in the MuJoCo locomotion tasks, we could achieve over 10% higher performance using 5* less time compared with state-of-the-art model-based reinforcement learning method; and in the building HVAC control example, our method achieved up to 20% energy reduction compared with classic linear models.
178 - Yutao Tang , Ding Wang 2020
In this paper, we investigate a constrained optimal coordination problem for a class of heterogeneous nonlinear multi-agent systems described by high-order dynamics subject to both unknown nonlinearities and external disturbances. Each agent has a pr ivate objective function and a constraint about its output. A neural network-based distributed controller is developed for each agent such that all agent outputs can reach the constrained minimal point of the aggregate objective function with bounded residual errors. Two examples are finally given to demonstrate the effectiveness of the algorithm.
We consider the optimal coverage problem where a multi-agent network is deployed in an environment with obstacles to maximize a joint event detection probability. The objective function of this problem is non-convex and no global optimum is guarantee d by gradient-based algorithms developed to date. We first show that the objective function is monotone submodular, a class of functions for which a simple greedy algorithm is known to be within 0.63 of the optimal solution. We then derive two tighter lower bounds by exploiting the curvature information (total curvature and elemental curvature) of the objective function. We further show that the tightness of these lower bounds is complementary with respect to the sensing capabilities of the agents. The greedy algorithm solution can be subsequently used as an initial point for a gradient-based algorithm to obtain solutions even closer to the global optimum. Simulation results show that this approach leads to significantly better performance relative to previously used algorithms.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا