ترغب بنشر مسار تعليمي؟ اضغط هنا

Mirror descent (MD) is a powerful first-order optimization technique that subsumes several optimization algorithms including gradient descent (GD). In this work, we study the exact convergence rate of MD in both centralized and distributed cases for strongly convex and smooth problems. We view MD with a dynamical system lens and leverage quadratic constraints (QCs) to provide convergence guarantees based on the Lyapunov stability. For centralized MD, we establish a semi-definite programming (SDP) that certifies exponentially fast convergence of MD subject to a linear matrix inequality (LMI). We prove that the SDP always has a feasible solution that recovers the optimal GD rate. Next, we analyze the exponential convergence of distributed MD and characterize the rate using two LMIs. To the best of our knowledge, the exact (exponential) rate of distributed MD has not been previously explored in the literature. We present numerical results as a verification of our theory and observe that the richness of the Lyapunov function entails better (worst-case) convergence rates compared to existing works on distributed GD.
We propose a novel GAN framework for non-parametric density estimation with high-dimensional data. This framework is based on a novel density estimator, called the hyperbolic cross density estimator, which enjoys nice convergence properties in the mi xed Sobolev spaces. As modifications of the usual Sobolev spaces, the mixed Sobolev spaces are more suitable for describing high-dimensional density functions. We prove that, unlike other existing approaches, the proposed GAN framework does not suffer the curse of dimensionality and can achieve the optimal convergence rate of $O_p(n^{-1/2})$, with $n$ data points in an arbitrary fixed dimension. We also study the universality of GANs in terms of the existence of ReLU networks which can approximate the density functions in the mixed Sobolev spaces up to any accuracy level.
Despite their success, kernel methods suffer from a massive computational cost in practice. In this paper, in lieu of commonly used kernel expansion with respect to $N$ inputs, we develop a novel optimal design maximizing the entropy among kernel fea tures. This procedure results in a kernel expansion with respect to entropic optimal features (EOF), improving the data representation dramatically due to features dissimilarity. Under mild technical assumptions, our generalization bound shows that with only $O(N^{frac{1}{4}})$ features (disregarding logarithmic factors), we can achieve the optimal statistical accuracy (i.e., $O(1/sqrt{N})$). The salient feature of our design is its sparsity that significantly reduces the time and space cost. Our numerical experiments on benchmark datasets verify the superiority of EOF over the state-of-the-art in kernel approximation.
Random features provide a practical framework for large-scale kernel approximation and supervised learning. It has been shown that data-dependent sampling of random features using leverage scores can significantly reduce the number of features requir ed to achieve optimal learning bounds. Leverage scores introduce an optimized distribution for features based on an infinite-dimensional integral operator (depending on input distribution), which is impractical to sample from. Focusing on empirical leverage scores in this paper, we establish an out-of-sample performance bound, revealing an interesting trade-off between the approximated kernel and the eigenvalue decay of another kernel in the domain of random features defined based on data distribution. Our experiments verify that the empirical algorithm consistently outperforms vanilla Monte Carlo sampling, and with a minor modification the method is even competitive to supervised data-dependent kernel learning, without using the output (label) information.
Nonlinear kernels can be approximated using finite-dimensional feature maps for efficient risk minimization. Due to the inherent trade-off between the dimension of the (mapped) feature space and the approximation accuracy, the key problem is to ident ify promising (explicit) features leading to a satisfactory out-of-sample performance. In this work, we tackle this problem by efficiently choosing such features from multiple kernels in a greedy fashion. Our method sequentially selects these explicit features from a set of candidate features using a correlation metric. We establish an out-of-sample error bound capturing the trade-off between the error in terms of explicit features (approximation error) and the error due to spectral properties of the best model in the Hilbert space associated to the combined kernel (spectral error). The result verifies that when the (best) underlying data model is sparse enough, i.e., the spectral error is negligible, one can control the test error with a small number of explicit features, that can scale poly-logarithmically with data. Our empirical results show that given a fixed number of explicit features, the method can achieve a lower test error with a smaller time cost, compared to the state-of-the-art in data-dependent random features.
The randomized-feature approach has been successfully employed in large-scale kernel approximation and supervised learning. The distribution from which the random features are drawn impacts the number of features required to efficiently perform a lea rning task. Recently, it has been shown that employing data-dependent randomization improves the performance in terms of the required number of random features. In this paper, we are concerned with the randomized-feature approach in supervised learning for good generalizability. We propose the Energy-based Exploration of Random Features (EERF) algorithm based on a data-dependent score function that explores the set of possible features and exploits the promising regions. We prove that the proposed score function with high probability recovers the spectrum of the best fit within the model class. Our empirical results on several benchmark datasets further verify that our method requires smaller number of random features to achieve a certain generalization error compared to the state-of-the-art while introducing negligible pre-processing overhead. EERF can be implemented in a few lines of code and requires no additional tuning parameters.
We address the M-best-arm identification problem in multi-armed bandits. A player has a limited budget to explore K arms (M<K), and once pulled, each arm yields a reward drawn (independently) from a fixed, unknown distribution. The goal is to find th e top M arms in the sense of expected reward. We develop an algorithm which proceeds in rounds to deactivate arms iteratively. At each round, the budget is divided by a nonlinear function of remaining arms, and the arms are pulled correspondingly. Based on a decision rule, the deactivated arm at each round may be accepted or rejected. The algorithm outputs the accepted arms that should ideally be the top M arms. We characterize the decay rate of the misidentification probability and establish that the nonlinear budget allocation proves to be useful for different problem environments (described by the number of competitive arms). We provide comprehensive numerical experiments showing that our algorithm outperforms the state-of-the-art using suitable nonlinearity.
This paper addresses tracking of a moving target in a multi-agent network. The target follows a linear dynamics corrupted by an adversarial noise, i.e., the noise is not generated from a statistical distribution. The location of the target at each ti me induces a global time-varying loss function, and the global loss is a sum of local losses, each of which is associated to one agent. Agents noisy observations could be nonlinear. We formulate this problem as a distributed online optimization where agents communicate with each other to track the minimizer of the global loss. We then propose a decentralized version of the Mirror Descent algorithm and provide the non-asymptotic analysis of the problem. Using the notion of dynamic regret, we measure the performance of our algorithm versus its offline counterpart in the centralized setting. We prove that the bound on dynamic regret scales inversely in the network spectral gap, and it represents the adversarial noise causing deviation with respect to the linear dynamics. Our result subsumes a number of results in the distributed optimization literature. Finally, in a numerical experiment, we verify that our algorithm can be simply implemented for multi-agent tracking with nonlinear observations.
This work addresses decentralized online optimization in non-stationary environments. A network of agents aim to track the minimizer of a global time-varying convex function. The minimizer evolves according to a known dynamics corrupted by an unknown , unstructured noise. At each time, the global function can be cast as a sum of a finite number of local functions, each of which is assigned to one agent in the network. Moreover, the local functions become available to agents sequentially, and agents do not have a prior knowledge of the future cost functions. Therefore, agents must communicate with each other to build an online approximation of the global function. We propose a decentralized variation of the celebrated Mirror Descent, developed by Nemirovksi and Yudin. Using the notion of Bregman divergence in lieu of Euclidean distance for projection, Mirror Descent has been shown to be a powerful tool in large-scale optimization. Our algorithm builds on Mirror Descent, while ensuring that agents perform a consensus step to follow the global function and take into account the dynamics of the global minimizer. To measure the performance of the proposed online algorithm, we compare it to its offline counterpart, where the global functions are available a priori. The gap between the two is called dynamic regret. We establish a regret bound that scales inversely in the spectral gap of the network, and more notably it represents the deviation of minimizer sequence with respect to the given dynamics. We then show that our results subsume a number of results in distributed optimization. We demonstrate the application of our method to decentralized tracking of dynamic parameters and verify the results via numerical experiments.
We consider the best-arm identification problem in multi-armed bandits, which focuses purely on exploration. A player is given a fixed budget to explore a finite set of arms, and the rewards of each arm are drawn independently from a fixed, unknown d istribution. The player aims to identify the arm with the largest expected reward. We propose a general framework to unify sequential elimination algorithms, where the arms are dismissed iteratively until a unique arm is left. Our analysis reveals a novel performance measure expressed in terms of the sampling mechanism and number of eliminated arms at each round. Based on this result, we develop an algorithm that divides the budget according to a nonlinear function of remaining arms at each round. We provide theoretical guarantees for the algorithm, characterizing the suitable nonlinearity for different problem environments described by the number of competitive arms. Matching the theoretical results, our experiments show that the nonlinear algorithm outperforms the state-of-the-art. We finally study the side-observation model, where pulling an arm reveals the rewards of its related arms, and we establish improved theoretical guarantees in the pure-exploration setting.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا