No Arabic abstract
One of the challenges in analyzing a learning algorithm is the circular entanglement between the objective value and the stochastic noise. This is also known as the chicken and egg phenomenon. Traditionally, people tackle this issue with the special structure of the problem and hence the analysis is difficult to generalize. In this paper, we present a general framework for analyzing high-probability bounds for stochastic dynamics in learning algorithms. Our framework composes standard techniques from probability theory to give a streamlined three-step recipe with a general and flexible principle to tackle the chicken and egg problem. We demonstrate the power and the flexibility of our framework by giving unifying analysis for three very different learning problems with both the last iterate and the strong uniform high probability convergence guarantee. The problems are stochastic gradient descent for strongly convex functions, streaming principal component analysis and linear bandit with stochastic gradient descent updates. We either improve or match the state-of-the-art bounds on all three dynamics.
Metric clustering is fundamental in areas ranging from Combinatorial Optimization and Data Mining, to Machine Learning and Operations Research. However, in a variety of situations we may have additional requirements or knowledge, distinct from the underlying metric, regarding which pairs of points should be clustered together. To capture and analyze such scenarios, we introduce a novel family of emph{stochastic pairwise constraints}, which we incorporate into several essential clustering objectives (radius/median/means). Moreover, we demonstrate that these constraints can succinctly model an intriguing collection of applications, including among others emph{Individual Fairness} in clustering and emph{Must-link} constraints in semi-supervised learning. Our main result consists of a general framework that yields approximation algorithms with provable guarantees for important clustering objectives, while at the same time producing solutions that respect the stochastic pairwise constraints. Furthermore, for certain objectives we devise improved results in the case of Must-link constraints, which are also the best possible from a theoretical perspective. Finally, we present experimental evidence that validates the effectiveness of our algorithms.
We investigate sublinear classical and quantum algorithms for matrix games, a fundamental problem in optimization and machine learning, with provable guarantees. Given a matrix $Ainmathbb{R}^{ntimes d}$, sublinear algorithms for the matrix game $min_{xinmathcal{X}}max_{yinmathcal{Y}} y^{top} Ax$ were previously known only for two special cases: (1) $mathcal{Y}$ being the $ell_{1}$-norm unit ball, and (2) $mathcal{X}$ being either the $ell_{1}$- or the $ell_{2}$-norm unit ball. We give a sublinear classical algorithm that can interpolate smoothly between these two cases: for any fixed $qin (1,2]$, we solve the matrix game where $mathcal{X}$ is a $ell_{q}$-norm unit ball within additive error $epsilon$ in time $tilde{O}((n+d)/{epsilon^{2}})$. We also provide a corresponding sublinear quantum algorithm that solves the same task in time $tilde{O}((sqrt{n}+sqrt{d})textrm{poly}(1/epsilon))$ with a quadratic improvement in both $n$ and $d$. Both our classical and quantum algorithms are optimal in the dimension parameters $n$ and $d$ up to poly-logarithmic factors. Finally, we propose sublinear classical and quantum algorithms for the approximate Caratheodory problem and the $ell_{q}$-margin support vector machines as applications.
We present Free-MESSAGEp, the first zeroth-order algorithm for convex mean-semideviation-based risk-aware learning, which is also the first three-level zeroth-order compositional stochastic optimization algorithm, whatsoever. Using a non-trivial extension of Nesterovs classical results on Gaussian smoothing, we develop the Free-MESSAGEp algorithm from first principles, and show that it essentially solves a smoothed surrogate to the original problem, the former being a uniform approximation of the latter, in a useful, convenient sense. We then present a complete analysis of the Free-MESSAGEp algorithm, which establishes convergence in a user-tunable neighborhood of the optimal solutions of the original problem, as well as explicit convergence rates for both convex and strongly convex costs. Orderwise, and for fixed problem parameters, our results demonstrate no sacrifice in convergence speed compared to existing first-order methods, while striking a certain balance among the condition of the problem, its dimensionality, as well as the accuracy of the obtained results, naturally extending previous results in zeroth-order risk-neutral learning.
We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer $x_star$ of any Lipschitz strongly-convex function. In particular, we use a multilevel Monte-Carlo approach due to Blanchet and Glynn to turn any optimal stochastic gradient method into an estimator of $x_star$ with bias $delta$, variance $O(log(1/delta))$, and an expected sampling cost of $O(log(1/delta))$ stochastic gradient evaluations. As an immediate consequence, we obtain cheap and nearly unbiased gradient estimators for the Moreau-Yoshida envelope of any Lipschitz convex function, allowing us to perform dimension-free randomized smoothing. We demonstrate the potential of our estimator through four applications. First, we develop a method for minimizing the maximum of $N$ functions, improving on recent results and matching a lower bound up logarithmic factors. Second and third, we recover state-of-the-art rates for projection-efficient and gradient-efficient optimization using simple algorithms with a transparent analysis. Finally, we show that an improved version of our estimator would yield a nearly linear-time, optimal-utility, differentially-private non-smooth stochastic optimization method.
We present a data-driven model predictive control (MPC) scheme for chance-constrained Markov jump systems with unknown switching probabilities. Using samples of the underlying Markov chain, ambiguity sets of transition probabilities are estimated which include the true conditional probability distributions with high probability. These sets are updated online and used to formulate a time-varying, risk-averse optimal control problem. We prove recursive feasibility of the resulting MPC scheme and show that the original chance constraints remain satisfied at every time step. Furthermore, we show that under sufficient decrease of the confidence levels, the resulting MPC scheme renders the closed-loop system mean-square stable with respect to the true-but-unknown distributions, while remaining less conservative than a fully robust approach. Finally, we show that the data-driven value function converges to its nominal counterpart as the sample size grows to infinity. We illustrate our approach on a numerical example.