No Arabic abstract
Decentralized optimization to minimize a finite sum of functions over a network of nodes has been a significant focus within control and signal processing research due to its natural relevance to optimal control and signal estimation problems. More recently, the emergence of sophisticated computing and large-scale data science needs have led to a resurgence of activity in this area. In this article, we discuss decentralized first-order gradient methods, which have found tremendous success in control, signal processing, and machine learning problems, where such methods, due to their simplicity, serve as the first method of choice for many complex inference and training tasks. In particular, we provide a general framework of decentralized first-order methods that is applicable to undirected and directed communication networks alike, and show that much of the existing work on optimization and consensus can be related explicitly to this framework. We further extend the discussion to decentralized stochastic first-order methods that rely on stochastic gradients at each node and describe how local variance reduction schemes, previously shown to have promise in the centralized settings, are able to improve the performance of decentralized methods when combined with what is known as gradient tracking. We motivate and demonstrate the effectiveness of the corresponding methods in the context of machine learning and signal processing problems that arise in decentralized environments.
In this report, we study decentralized stochastic optimization to minimize a sum of smooth and strongly convex cost functions when the functions are distributed over a directed network of nodes. In contrast to the existing work, we use gradient tracking to improve certain aspects of the resulting algorithm. In particular, we propose the~textbf{texttt{S-ADDOPT}} algorithm that assumes a stochastic first-order oracle at each node and show that for a constant step-size~$alpha$, each node converges linearly inside an error ball around the optimal solution, the size of which is controlled by~$alpha$. For decaying step-sizes~$mathcal{O}(1/k)$, we show that~textbf{texttt{S-ADDOPT}} reaches the exact solution sublinearly at~$mathcal{O}(1/k)$ and its convergence is asymptotically network-independent. Thus the asymptotic behavior of~textbf{texttt{S-ADDOPT}} is comparable to the centralized stochastic gradient descent. Numerical experiments over both strongly convex and non-convex problems illustrate the convergence behavior and the performance comparison of the proposed algorithm.
A novel data-driven stochastic robust optimization (DDSRO) framework is proposed for optimization under uncertainty leveraging labeled multi-class uncertainty data. Uncertainty data in large datasets are often collected from various conditions, which are encoded by class labels. Machine learning methods including Dirichlet process mixture model and maximum likelihood estimation are employed for uncertainty modeling. A DDSRO framework is further proposed based on the data-driven uncertainty model through a bi-level optimization structure. The outer optimization problem follows a two-stage stochastic programming approach to optimize the expected objective across different data classes; adaptive robust optimization is nested as the inner problem to ensure the robustness of the solution while maintaining computational tractability. A decomposition-based algorithm is further developed to solve the resulting multi-level optimization problem efficiently. Case studies on process network design and planning are presented to demonstrate the applicability of the proposed framework and algorithm.
How can we efficiently gather information to optimize an unknown function, when presented with multiple, mutually dependent information sources with different costs? For example, when optimizing a robotic system, intelligently trading off computer simulations and real robot testings can lead to significant savings. Existing methods, such as multi-fidelity GP-UCB or Entropy Search-based approaches, either make simplistic assumptions on the interaction among different fidelities or use simple heuristics that lack theoretical guarantees. In this paper, we study multi-fidelity Bayesian optimization with complex structural dependencies among multiple outputs, and propose MF-MI-Greedy, a principled algorithmic framework for addressing this problem. In particular, we model different fidelities using additive Gaussian processes based on shared latent structures with the target function. Then we use cost-sensitive mutual information gain for efficient Bayesian global optimization. We propose a simple notion of regret which incorporates the cost of different fidelities, and prove that MF-MI-Greedy achieves low regret. We demonstrate the strong empirical performance of our algorithm on both synthetic and real-world datasets.
In computer science, there exist a large number of optimization problems defined on graphs, that is to find a best node state configuration or a network structure such that the designed objective function is optimized under some constraints. However, these problems are notorious for their hardness to solve because most of them are NP-hard or NP-complete. Although traditional general methods such as simulated annealing (SA), genetic algorithms (GA) and so forth have been devised to these hard problems, their accuracy and time consumption are not satisfying in practice. In this work, we proposed a simple, fast, and general algorithm framework based on advanced automatic differentiation technique empowered by deep learning frameworks. By introducing Gumbel-softmax technique, we can optimize the objective function directly by gradient descent algorithm regardless of the discrete nature of variables. We also introduce evolution strategy to parallel version of our algorithm. We test our algorithm on three representative optimization problems on graph including modularity optimization from network science, Sherrington-Kirkpatrick (SK) model from statistical physics, maximum independent set (MIS) and minimum vertex cover (MVC) problem from combinatorial optimization on graph. High-quality solutions can be obtained with much less time consuming compared to traditional approaches.
Designing decentralized policies for wireless communication networks is a crucial problem, which has only been partially solved in the literature so far. In this paper, we propose the Decentralized Markov Decision Process (Dec-MDP) framework to analyze a wireless sensor network with multiple users which access a common wireless channel. We consider devices with energy harvesting capabilities, so that they aim at balancing the energy arrivals with the data departures and with the probability of colliding with other nodes. Randomly over time, an access point triggers a SYNC slot, wherein it recomputes the optimal transmission parameters of the whole network, and distributes this information. Every node receives its own policy, which specifies how it should access the channel in the future, and, thereafter, proceeds in a fully decentralized fashion, without interacting with other entities in the network. We propose a multi-layer Markov model, where an external MDP manages the jumps between SYNC slots, and an internal Dec-MDP computes the optimal policy in the near future. We numerically show that, because of the harvesting, a fully orthogonal scheme (e.g., TDMA-like) is suboptimal in energy harvesting scenarios, and the optimal trade-off lies between an orthogonal and a random access system.