ترغب بنشر مسار تعليمي؟ اضغط هنا

Linear fixed point equations in Hilbert spaces arise in a variety of settings, including reinforcement learning, and computational methods for solving differential and integral equations. We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space. First, we prove an instance-dependent upper bound on the mean-squared error for a linear stochastic approximation scheme that exploits Polyak--Ruppert averaging. This bound consists of two terms: an approximation error term with an instance-dependent approximation factor, and a statistical error term that captures the instance-specific complexity of the noise when projected onto the low-dimensional subspace. Using information theoretic methods, we also establish lower bounds showing that both of these terms cannot be improved, again in an instance-dependent sense. A concrete consequence of our characterization is that the optimal approximation factor in this problem can be much larger than a universal constant. We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation, establishing their optimality.
235 - Wenlong Mou , Zheng Wen , Xi Chen 2020
We study the optimal sample complexity in large-scale Reinforcement Learning (RL) problems with policy space generalization, i.e. the agent has a prior knowledge that the optimal policy lies in a known policy space. Existing results show that without a generalization model, the sample complexity of an RL algorithm will inevitably depend on the cardinalities of state space and action space, which are intractably large in many practical problems. To avoid such undesirable dependence on the state and action space sizes, this paper proposes a new notion of eluder dimension for the policy space, which characterizes the intrinsic complexity of policy learning in an arbitrary Markov Decision Process (MDP). Using a simulator oracle, we prove a near-optimal sample complexity upper bound that only depends linearly on the eluder dimension. We further prove a similar regret bound in deterministic systems without the simulator.
We undertake a precise study of the asymptotic and non-asymptotic properties of stochastic approximation procedures with Polyak-Ruppert averaging for solving a linear system $bar{A} theta = bar{b}$. When the matrix $bar{A}$ is Hurwitz, we prove a cen tral limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity. The CLT characterizes the exact asymptotic covariance matrix, which is the sum of the classical Polyak-Ruppert covariance and a correction term that scales with the step size. Under assumptions on the tail of the noise distribution, we prove a non-asymptotic concentration inequality whose main term matches the covariance in CLT in any direction, up to universal constants. When the matrix $bar{A}$ is not Hurwitz but only has non-negative real parts in its eigenvalues, we prove that the averaged LSA procedure actually achieves an $O(1/T)$ rate in mean-squared error. Our results provide a more refined understanding of linear stochastic approximation in both the asymptotic and non-asymptotic settings. We also show various applications of the main results, including the study of momentum-based stochastic gradient methods as well as temporal difference algorithms in reinforcement learning.
We study the problem of sampling from the power posterior distribution in Bayesian Gaussian mixture models, a robust version of the classical posterior. This power posterior is known to be non-log-concave and multi-modal, which leads to exponential m ixing times for some standard MCMC algorithms. We introduce and study the Reflected Metropolis-Hastings Random Walk (RMRW) algorithm for sampling. For symmetric two-component Gaussian mixtures, we prove that its mixing time is bounded as $d^{1.5}(d + Vert theta_{0} Vert^2)^{4.5}$ as long as the sample size $n$ is of the order $d (d + Vert theta_{0} Vert^2)$. Notably, this result requires no conditions on the separation of the two means. En route to proving this bound, we establish some new results of possible independent interest that allow for combining Poincar{e} inequalities for conditional and marginal densities.
We consider the problem of sampling from a density of the form $p(x) propto exp(-f(x)- g(x))$, where $f: mathbb{R}^d rightarrow mathbb{R}$ is a smooth and strongly convex function and $g: mathbb{R}^d rightarrow mathbb{R}$ is a convex and Lipschitz fu nction. We propose a new algorithm based on the Metropolis-Hastings framework, and prove that it mixes to within TV distance $varepsilon$ of the target density in at most $O(d log (d/varepsilon))$ iterations. This guarantee extends previous results on sampling from distributions with smooth log densities ($g = 0$) to the more general composite non-smooth case, with the same mixing time up to a multiple of the condition number. Our method is based on a novel proximal-based proposal distribution that can be efficiently computed for a large class of non-smooth functions $g$.
We propose a Markov chain Monte Carlo (MCMC) algorithm based on third-order Langevin dynamics for sampling from distributions with log-concave and smooth densities. The higher-order dynamics allow for more flexible discretization schemes, and we deve lop a specific method that combines splitting with more accurate integration. For a broad class of $d$-dimensional distributions arising from generalized linear models, we prove that the resulting third-order algorithm produces samples from a distribution that is at most $varepsilon > 0$ in Wasserstein distance from the target distribution in $Oleft(frac{d^{1/4}}{ varepsilon^{1/2}} right)$ steps. This result requires only Lipschitz conditions on the gradient. For general strongly convex potentials with $alpha$-th order smoothness, we prove that the mixing time scales as $O left(frac{d^{1/4}}{varepsilon^{1/2}} + frac{d^{1/2}}{varepsilon^{1/(alpha - 1)}} right)$.
We present an improved analysis of the Euler-Maruyama discretization of the Langevin diffusion. Our analysis does not require global contractivity, and yields polynomial dependence on the time horizon. Compared to existing approaches, we make an addi tional smoothness assumption, and improve the existing rate from $O(eta)$ to $O(eta^2)$ in terms of the KL divergence. This result matches the correct order for numerical SDEs, without suffering from exponential time dependence. When applied to algorithms for sampling and learning, this result simultaneously improves all those methods based on Dalayans approach.
Algorithm-dependent generalization error bounds are central to statistical learning theory. A learning algorithm may use a large hypothesis space, but the limited number of iterations controls its model capacity and generalization error. The impacts of stochastic gradient methods on generalization error for non-convex learning problems not only have important theoretical consequences, but are also critical to generalization errors of deep learning. In this paper, we study the generalization errors of Stochastic Gradient Langevin Dynamics (SGLD) with non-convex objectives. Two theories are proposed with non-asymptotic discrete-time analysis, using Stability and PAC-Bayesian results respectively. The stability-based theory obtains a bound of $Oleft(frac{1}{n}Lsqrt{beta T_k}right)$, where $L$ is uniform Lipschitz parameter, $beta$ is inverse temperature, and $T_k$ is aggregated step sizes. For PAC-Bayesian theory, though the bound has a slower $O(1/sqrt{n})$ rate, the contribution of each step is shown with an exponentially decaying factor by imposing $ell^2$ regularization, and the uniform Lipschitz constant is also replaced by actual norms of gradients along trajectory. Our bounds have no implicit dependence on dimensions, norms or other capacity measures of parameter, which elegantly characterizes the phenomenon of Fast Training Guarantees Generalization in non-convex settings. This is the first algorithm-dependent result with reasonable dependence on aggregated step sizes for non-convex learning, and has important implications to statistical learning aspects of stochastic gradient methods in complicated models such as deep learning.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا