No Arabic abstract
Learning cooperative policies for multi-agent systems is often challenged by partial observability and a lack of coordination. In some settings, the structure of a problem allows a distributed solution with limited communication. Here, we consider a scenario where no communication is available, and instead we learn local policies for all agents that collectively mimic the solution to a centralized multi-agent static optimization problem. Our main contribution is an information theoretic framework based on rate distortion theory which facilitates analysis of how well the resulting fully decentralized policies are able to reconstruct the optimal solution. Moreover, this framework provides a natural extension that addresses which nodes an agent should communicate with to improve the performance of its individual policy.
In this paper we present a decentralized algorithm to estimate the eigenvalues of the Laplacian matrix that encodes the network topology of a multi-agent system. We consider network topologies modeled by undirected graphs. The basic idea is to provide a local interaction rule among agents so that their state trajectory is a linear combination of sinusoids oscillating only at frequencies function of the eigenvalues of the Laplacian matrix. In this way, the problem of decentralized estimation of the eigenvalues is mapped into a standard signal processing problem in which the unknowns are the finite number of frequencies at which the signal oscillates.
We present an information-theoretic framework for understanding overfitting and underfitting in machine learning and prove the formal undecidability of determining whether an arbitrary classification algorithm will overfit a dataset. Measuring algorithm capacity via the information transferred from datasets to models, we consider mismatches between algorithm capacities and datasets to provide a signature for when a model can overfit or underfit a dataset. We present results upper-bounding algorithm capacity, establish its relationship to quantities in the algorithmic search framework for machine learning, and relate our work to recent information-theoretic approaches to generalization.
We study reinforcement learning (RL) in a setting with a network of agents whose states and actions interact in a local manner where the objective is to find localized policies such that the (discounted) global reward is maximized. A fundamental challenge in this setting is that the state-action space size scales exponentially in the number of agents, rendering the problem intractable for large networks. In this paper, we propose a Scalable Actor-Critic (SAC) framework that exploits the network structure and finds a localized policy that is a $O(rho^kappa)$-approximation of a stationary point of the objective for some $rhoin(0,1)$, with complexity that scales with the local state-action space size of the largest $kappa$-hop neighborhood of the network.
Mobile system-on-chips (SoCs) are growing in their complexity and heterogeneity (e.g., Arms Big-Little architecture) to meet the needs of emerging applications, including games and artificial intelligence. This makes it very challenging to optimally manage the resources (e.g., controlling the number and frequency of different types of cores) at runtime to meet the desired trade-offs among multiple objectives such as performance and energy. This paper proposes a novel information-theoretic framework referred to as PaRMIS to create Pareto-optimal resource management policies for given target applications and design objectives. PaRMIS specifies parametric policies to manage resources and learns statistical models from candidate policy evaluation data in the form of target design objective values. The key idea is to select a candidate policy for evaluation in each iteration guided by statistical models that maximize the information gain about the true Pareto front. Experiments on a commercial heterogeneous SoC show that PaRMIS achieves better Pareto fronts and is easily usable to optimize complex objectives (e.g., performance per Watt) when compared to prior methods.
Trust region methods are widely applied in single-agent reinforcement learning problems due to their monotonic performance-improvement guarantee at every iteration. Nonetheless, when applied in multi-agent settings, the guarantee of trust region methods no longer holds because an agents payoff is also affected by other agents adaptive behaviors. To tackle this problem, we conduct a game-theoretical analysis in the policy space, and propose a multi-agent trust region learning method (MATRL), which enables trust region optimization for multi-agent learning. Specifically, MATRL finds a stable improvement direction that is guided by the solution concept of Nash equilibrium at the meta-game level. We derive the monotonic improvement guarantee in multi-agent settings and empirically show the local convergence of MATRL to stable fixed points in the two-player rotational differential game. To test our method, we evaluate MATRL in both discrete and continuous multiplayer general-sum games including checker and switch grid worlds, multi-agent MuJoCo, and Atari games. Results suggest that MATRL significantly outperforms strong multi-agent reinforcement learning baselines.