In this paper we present a decentralized algorithm to estimate the eigenvalues of the Laplacian matrix that encodes the network topology of a multi-agent system. We consider network topologies modeled by undirected graphs. The basic idea is to provide a local interaction rule among agents so that their state trajectory is a linear combination of sinusoids oscillating only at frequencies function of the eigenvalues of the Laplacian matrix. In this way, the problem of decentralized estimation of the eigenvalues is mapped into a standard signal processing problem in which the unknowns are the finite number of frequencies at which the signal oscillates.
Learning cooperative policies for multi-agent systems is often challenged by partial observability and a lack of coordination. In some settings, the structure of a problem allows a distributed solution with limited communication. Here, we consider a scenario where no communication is available, and instead we learn local policies for all agents that collectively mimic the solution to a centralized multi-agent static optimization problem. Our main contribution is an information theoretic framework based on rate distortion theory which facilitates analysis of how well the resulting fully decentralized policies are able to reconstruct the optimal solution. Moreover, this framework provides a natural extension that addresses which nodes an agent should communicate with to improve the performance of its individual policy.
We propose a method to efficiently estimate the Laplacian eigenvalues of an arbitrary, unknown network of interacting dynamical agents. The inputs to our estimation algorithm are measurements about the evolution of a collection of agents (potentially one) during a finite time horizon; notably, we do not require knowledge of which agents are contributing to our measurements. We propose a scalable algorithm to exactly recover a subset of the Laplacian eigenvalues from these measurements. These eigenvalues correspond directly to those Laplacian modes that are observable from our measurements. We show how our technique can be applied to networks of multiagent systems with arbitrary dynamics in both continuous- and discrete-time. Finally, we illustrate our results with numerical simulations.
In this paper we study multi-agent discrete-event systems where the agents can be divided into several groups, and within each group the agents have similar or identical state transition structures. We employ a relabeling map to generate a template structure for each group, and synthesize a scalable supervisor whose state size and computational process are independent of the number of agents. This scalability allows the supervisor to remain invariant (no recomputation or reconfiguration needed) if and when there are agents removed due to failure or added for increasing productivity. The constant computational effort for synthesizing the scalable supervisor also makes our method promising for handling large-scale multi-agent systems. Moreover, based on the scalable supervisor we design scalable local controllers, one for each component agent, to establish a purely distributed control architecture. Three examples are provided to illustrate our proposed scalable supervisory synthesis and the resulting scalable supervisors as well as local controllers.
Regret analysis is challenging in Multi-Agent Reinforcement Learning (MARL) primarily due to the dynamical environments and the decentralized information among agents. We attempt to solve this challenge in the context of decentralized learning in multi-agent linear-quadratic (LQ) dynamical systems. We begin with a simple setup consisting of two agents and two dynamically decoupled stochastic linear systems, each system controlled by an agent. The systems are coupled through a quadratic cost function. When both systems dynamics are unknown and there is no communication among the agents, we show that no learning policy can generate sub-linear in $T$ regret, where $T$ is the time horizon. When only one systems dynamics are unknown and there is one-directional communication from the agent controlling the unknown system to the other agent, we propose a MARL algorithm based on the construction of an auxiliary single-agent LQ problem. The auxiliary single-agent problem in the proposed MARL algorithm serves as an implicit coordination mechanism among the two learning agents. This allows the agents to achieve a regret within $O(sqrt{T})$ of the regret of the auxiliary single-agent problem. Consequently, using existing results for single-agent LQ regret, our algorithm provides a $tilde{O}(sqrt{T})$ regret bound. (Here $tilde{O}(cdot)$ hides constants and logarithmic factors). Our numerical experiments indicate that this bound is matched in practice. From the two-agent problem, we extend our results to multi-agent LQ systems with certain communication patterns.
We study the new concept of relative coobservability in decentralized supervisory control of discrete-event systems under partial observation. This extends our previous work on relative observability from a centralized setup to a decentralized one. A fundamental concept in decentralized supervisory control is coobservability (and its several variations); this property is not, however, closed under set union, and hence there generally does not exist the supremal element. Our proposed relative coobservability, although stronger than coobservability, is algebraically well-behaved, and the supremal relatively coobservable sublanguage of a given language exists. We present an algorithm to compute this supremal sublanguage. Moreover, relative coobservability is weaker than conormality, which is also closed under set union; unlike conormality, relative coobservability imposes no constraint on disabling unobservable controllable events.