No Arabic abstract
Platooning has been exploited as a method for vehicles to minimize energy consumption. In this article, we present a constraint-driven optimal control framework that yields emergent platooning behavior for connected and automated vehicles operating in an open transportation system. Our approach combines recent insights in constraint-driven optimal control with the physical aerodynamic interactions between vehicles in a highway setting. The result is a set of equations that describes when platooning is an appropriate strategy, as well as a descriptive optimal control law that yields emergent platooning behavior. Finally, we demonstrate these properties in simulation and with a real-time experiment in a scaled testbed.
Distributed algorithms for both discrete-time and continuous-time linearly solvable optimal control (LSOC) problems of networked multi-agent systems (MASs) are investigated in this paper. A distributed framework is proposed to partition the optimal control problem of a networked MAS into several local optimal control problems in factorial subsystems, such that each (central) agent behaves optimally to minimize the joint cost function of a subsystem that comprises a central agent and its neighboring agents, and the local control actions (policies) only rely on the knowledge of local observations. Under this framework, we not only preserve the correlations between neighboring agents, but moderate the communication and computational complexities by decentralizing the sampling and computational processes over the network. For discrete-time systems modeled by Markov decision processes, the joint Bellman equation of each subsystem is transformed into a system of linear equations and solved using parallel programming. For continuous-time systems modeled by It^o diffusion processes, the joint optimality equation of each subsystem is converted into a linear partial differential equation, whose solution is approximated by a path integral formulation and a sample-efficient relative entropy policy search algorithm, respectively. The learned control policies are generalized to solve the unlearned tasks by resorting to the compositionality principle, and illustrative examples of cooperative UAV teams are provided to verify the effectiveness and advantages of these algorithms.
The problem of controlling multi-agent systems under different models of information sharing among agents has received significant attention in the recent literature. In this paper, we consider a setup where rather than committing to a fixed information sharing protocol (e.g. periodic sharing or no sharing etc), agents can dynamically decide at each time step whether to share information with each other and incur the resulting communication cost. This setup requires a joint design of agents communication and control strategies in order to optimize the trade-off between communication costs and control objective. We first show that agents can ignore a big part of their private information without compromising the system performance. We then provide a common information approach based solution for the strategy optimization problem. This approach relies on constructing a fictitious POMDP whose solution (obtained via a dynamic program) characterizes the optimal strategies for the agents. We also show that our solution can be easily modified to incorporate constraints on when and how frequently agents can communicate.
In this paper, a novel and innovative methodology for feasible motion planning in the multi-agent system is developed. On the basis of velocity obstacles characteristics, the chance constraints are formulated in the receding horizon control (RHC) problem, and geometric information of collision cones is used to generate the feasible regions of velocities for the host agent. By this approach, the motion planning is conducted at the velocity level instead of the position level. Thus, it guarantees a safer collision-free trajectory for the multi-agent system, especially for the systems with high-speed moving agents. Moreover, a probability threshold of potential collisions can be satisfied during the motion planning process. In order to validate the effectiveness of the methodology, different scenarios for multiple agents are investigated, and the simulation results clearly show that the proposed approach can effectively avoid potential collisions with a collision probability less than a specific threshold.
Regret analysis is challenging in Multi-Agent Reinforcement Learning (MARL) primarily due to the dynamical environments and the decentralized information among agents. We attempt to solve this challenge in the context of decentralized learning in multi-agent linear-quadratic (LQ) dynamical systems. We begin with a simple setup consisting of two agents and two dynamically decoupled stochastic linear systems, each system controlled by an agent. The systems are coupled through a quadratic cost function. When both systems dynamics are unknown and there is no communication among the agents, we show that no learning policy can generate sub-linear in $T$ regret, where $T$ is the time horizon. When only one systems dynamics are unknown and there is one-directional communication from the agent controlling the unknown system to the other agent, we propose a MARL algorithm based on the construction of an auxiliary single-agent LQ problem. The auxiliary single-agent problem in the proposed MARL algorithm serves as an implicit coordination mechanism among the two learning agents. This allows the agents to achieve a regret within $O(sqrt{T})$ of the regret of the auxiliary single-agent problem. Consequently, using existing results for single-agent LQ regret, our algorithm provides a $tilde{O}(sqrt{T})$ regret bound. (Here $tilde{O}(cdot)$ hides constants and logarithmic factors). Our numerical experiments indicate that this bound is matched in practice. From the two-agent problem, we extend our results to multi-agent LQ systems with certain communication patterns.
This paper presents a novel control protocol for robust distance-based formation control with prescribed performance in which agents are subjected to unknown external disturbances. Connectivity maintenance and collision avoidance among neighboring agents are also handled by the appropriate design of certain performance bounds that constrain the inter-agent distance errors. As an extension to the proposed scheme, distance-based formation centroid maneuvering is also studied for disturbance-free agents, in which the formation centroid tracks a desired time-varying velocity. The proposed control laws are decentralized, in the sense that each agent employs local relative information regarding its neighbors to calculate its control signal. Therefore, the control scheme is implementable on the agents local coordinate frames. Using rigid graph theory, input-to-state stability, and Lyapunov based analysis, the results are established for minimally and infinitesimally rigid formations in 2-D or 3-D space. Furthermore, it is argued that the proposed approach increases formation robustness against shape distortions and can prevent formation convergence to incorrect shapes, which is likely to happen in conventional distance-based formation control methods. Finally, extensive simulation studies clarify and verify the proposed approach.