The paper is concerned with distributed learning and optimization in large-scale settings. The well-known Fictitious Play (FP) algorithm has been shown to achieve Nash equilibrium learning in certain classes of multi-agent games. However, FP can be computationally difficult to implement when the number of players is large. Sampled FP is a variant of FP that mitigates the computational difficulties arising in FP by using a Monte-Carlo (i.e., sampling-based) approach. The Sampled FP algorithm has been studied both as a tool for distributed learning and as an optimization heuristic for large-scale problems. Despite its computational advantages, a shortcoming of Sampled FP is that the number of samples that must be drawn in each round of the algorithm grows without bound (on the order of $sqrt{t}$, where $t$ is the round of the repeated play). In this paper we propose Computationally Efficient Sampled FP (CESFP)---a variant of Sampled FP in which only one sample need be drawn each round of the algorithm (a substantial reduction from $O(sqrt{t})$ samples per round, as required in Sampled FP). CESFP operates using a stochastic-approximation type rule to estimate the expected utility from round to round. It is proven that the CESFP algorithm achieves Nash equilibrium learning in the same sense as classical FP and Sampled FP. Simulation results suggest that the convergence rate of CESFP (in terms of repeated-play iterations) is similar to that of Sampled FP.
The paper studies the highly prototypical Fictitious Play (FP) algorithm, as well as a broad class of learning processes based on best-response dynamics, that we refer to as FP-type algorithms. A well-known shortcoming of FP is that, while players may learn an equilibrium strategy in some abstract sense, there are no guarantees that the period-by-period strategies generated by the algorithm actually converge to equilibrium themselves. This issue is fundamentally related to the discontinuous nature of the best response correspondence and is inherited by many FP-type algorithms. Not only does it cause problems in the interpretation of such algorithms as a mechanism for economic and social learning, but it also greatly diminishes the practical value of these algorithms for use in distributed control. We refer to forms of learning in which players learn equilibria in some abstract sense only (to be defined more precisely in the paper) as weak learning, and we refer to forms of learning where players period-by-period strategies converge to equilibrium as strong learning. An approach is presented for modifying an FP-type algorithm that achieves weak learning in order to construct a variant that achieves strong learning. Theoretical convergence results are proved.
Empirical Centroid Fictitious Play (ECFP) is a generalization of the well-known Fictitious Play (FP) algorithm designed for implementation in large-scale games. In ECFP, the set of players is subdivided into equivalence classes with players in the same class possessing similar properties. Players choose a next-stage action by tracking and responding to aggregate statistics related to each equivalence class. This setup alleviates the difficult task of tracking and responding to the statistical behavior of every individual player, as is the case in traditional FP. Aside from ECFP, many useful modifications have been proposed to classical FP, e.g., rules allowing for network-based implementation, increased computational efficiency, and stronger forms of learning. Such modifications tend to be of great practical value; however, their effectiveness relies heavily on two fundamental properties of FP: robustness to alterations in the empirical distribution step size process, and robustness to best-response perturbations. The main contribution of the paper is to show that similar robustness properties also hold for the ECFP algorithm. This result serves as a first step in enabling practical modifications to ECFP, similar to those already developed for FP.
This paper studies the problem of sequential Gaussian shift-in-mean hypothesis testing in a distributed multi-agent network. A sequential probability ratio test (SPRT) type algorithm in a distributed framework of the emph{consensus}+emph{innovations} form is proposed, in which the agents update their decision statistics by simultaneously processing latest observations (innovations) sensed sequentially over time and information obtained from neighboring agents (consensus). For each pre-specified set of type I and type II error probabilities, local decision parameters are derived which ensure that the algorithm achieves the desired error performance and terminates in finite time almost surely (a.s.) at each network agent. Large deviation exponents for the tail probabilities of the agent stopping time distributions are obtained and it is shown that asymptotically (in the number of agents or in the high signal-to-noise-ratio regime) these exponents associated with the distributed algorithm approach that of the optimal centralized detector. The expected stopping time for the proposed algorithm at each network agent is evaluated and is benchmarked with respect to the optimal centralized algorithm. The efficiency of the proposed algorithm in the sense of the expected stopping times is characterized in terms of network connectivity. Finally, simulation studies are presented which illustrate and verify the analytical findings.
The trend in the electric power system is to move towards increased amounts of distributed resources which suggests a transition from the current highly centralized to a more distributed control structure. In this paper, we propose a method which enables a fully distributed solution of the DC Optimal Power Flow problem (DC-OPF), i.e. the generation settings which minimize cost while supplying the load and ensuring that all line flows are below their limits are determined in a distributed fashion. The approach consists of a distributed procedure that aims at solving the first order optimality conditions in which individual bus optimization variables are iteratively updated through simple local computations and information is exchanged with neighboring entities. In particular, the update for a specific bus consists of a term which takes into account the coupling between the neighboring Lagrange multiplier variables and a local innovation term that enforces the demand/supply balance. The buses exchange information on the current update of their multipliers and the bus angle with their neighboring buses. An analytical proof is given that the proposed method converges to the optimal solution of the DC-OPF. Also, the performance is evaluated using the IEEE Reliability Test System as a test case.
This paper addresses problems on the structural design of control systems taking explicitly into consideration the possible application to large-scale systems. We provide an efficient and unified framework to solve the following major minimization problems: (i) selection of the minimum number of manipulated/measured variables to achieve structural controllability/observability of the system, and (ii) selection of the minimum number of feedback interconnections between measured and manipulated variables such that the closed-loop system has no structurally fixed modes. Contrary to what would be expected, we show that it is possible to obtain a global solution for each of the aforementioned minimization problems using polynomial complexity algorithms in the number of the state variables of the system. In addition, we provide several new graph-theoretic characterizations of structural systems concepts, which, in turn, enable us to characterize all possible solutions to the above problems.
The paper is concerned with distributed learning in large-scale games. The well-known fictitious play (FP) algorithm is addressed, which, despite theoretical convergence results, might be impractical to implement in large-scale settings due to intense computation and communication requirements. An adaptation of the FP algorithm, designated as the empirical centroid fictitious play (ECFP), is presented. In ECFP players respond to the centroid of all players actions rather than track and respond to the individual actions of every player. Convergence of the ECFP algorithm in terms of average empirical frequency (a notion made precise in the paper) to a subset of the Nash equilibria is proven under the assumption that the game is a potential game with permutation invariant potential function. A more general formulation of ECFP is then given (which subsumes FP as a special case) and convergence results are given for the class of potential games. Furthermore, a distributed formulation of the ECFP algorithm is presented, in which, players endowed with a (possibly sparse) preassigned communication graph, engage in local, non-strategic information exchange to eventually agree on a common equilibrium. Convergence results are proven for the distributed ECFP algorithm.
The paper studies distributed static parameter (vector) estimation in sensor networks with nonlinear observation models and noisy inter-sensor communication. It introduces emph{separably estimable} observation models that generalize the observability condition in linear centralized estimation to nonlinear distributed estimation. It studies two distributed estimation algorithms in separably estimable models, the $mathcal{NU}$ (with its linear counterpart $mathcal{LU}$) and the $mathcal{NLU}$. Their update rule combines a emph{consensus} step (where each sensor updates the state by weight averaging it with its neighbors states) and an emph{innovation} step (where each sensor processes its local current observation.) This makes the three algorithms of the textit{consensus + innovations} type, very different from traditional consensus. The paper proves consistency (all sensors reach consensus almost surely and converge to the true parameter value,) efficiency, and asymptotic unbiasedness. For $mathcal{LU}$ and $mathcal{NU}$, it proves asymptotic normality and provides convergence rate guarantees. The three algorithms are characterized by appropriately chosen decaying weight sequences. Algorithms $mathcal{LU}$ and $mathcal{NU}$ are analyzed in the framework of stochastic approximation theory; algorithm $mathcal{NLU}$ exhibits mixed time-scale behavior and biased perturbations, and its analysis requires a different approach that is developed in the paper.
The paper studies the problem of distributed average consensus in sensor networks with quantized data and random link failures. To achieve consensus, dither (small noise) is added to the sensor states before quantization. When the quantizer range is unbounded (countable number of quantizer levels), stochastic approximation shows that consensus is asymptotically achieved with probability one and in mean square to a finite random variable. We show that the meansquared error (m.s.e.) can be made arbitrarily small by tuning the link weight sequence, at a cost of the convergence rate of the algorithm. To study dithered consensus with random links when the range of the quantizer is bounded, we establish uniform boundedness of the sample paths of the unbounded quantizer. This requires characterization of the statistical properties of the supremum taken over the sample paths of the state of the quantizer. This is accomplished by splitting the state vector of the quantizer in two components: one along the consensus subspace and the other along the subspace orthogonal to the consensus subspace. The proofs use maximal inequalities for submartingale and supermartingale sequences. From these, we derive probability bounds on the excursions of the two subsequences, from which probability bounds on the excursions of the quantizer state vector follow. The paper shows how to use these probability bounds to design the quantizer parameters and to explore tradeoffs among the number of quantizer levels, the size of the quantization steps, the desired probability of saturation, and the desired level of accuracy $epsilon$ away from consensus. Finally, the paper illustrates the quantizer design with a numerical study.
The paper studies average consensus with random topologies (intermittent links) emph{and} noisy channels. Consensus with noise in the network links leads to the bias-variance dilemma--running consensus for long reduces the bias of the final average estimate but increases its variance. We present two different compromises to this tradeoff: the $mathcal{A-ND}$ algorithm modifies conventional consensus by forcing the weights to satisfy a emph{persistence} condition (slowly decaying to zero); and the $mathcal{A-NC}$ algorithm where the weights are constant but consensus is run for a fixed number of iterations $hat{imath}$, then it is restarted and rerun for a total of $hat{p}$ runs, and at the end averages the final states of the $hat{p}$ runs (Monte Carlo averaging). We use controlled Markov processes and stochastic approximation arguments to prove almost sure convergence of $mathcal{A-ND}$ to the desired average (asymptotic unbiasedness) and compute explicitly the m.s.e. (variance) of the consensus limit. We show that $mathcal{A-ND}$ represents the best of both worlds--low bias and low variance--at the cost of a slow convergence rate; rescaling the weights...

