Variance reduction for Random Coordinate Descent-Langevin Monte Carlo

147 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zhiyan Ding

تاريخ النشر 2020

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Zhiyan Ding - Qin Li

التعلم الالي التعلم الآلي نظرية الإحصاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Sampling from a log-concave distribution function is one core problem that has wide applications in Bayesian statistics and machine learning. While most gradient free methods have slow convergence rate, the Langevin Monte Carlo (LMC) that provides fast convergence requires the computation of gradients. In practice one uses finite-differencing approximations as surrogates, and the method is expensive in high-dimensions. A natural strategy to reduce computational cost in each iteration is to utilize random gradient approximations, such as random coordinate descent (RCD) or simultaneous perturbation stochastic approximation (SPSA). We show by a counter-example that blindly applying RCD does not achieve the goal in the most general setting. The high variance induced by the randomness means a larger number of iterations are needed, and this balances out the saving in each iteration. We then introduce a new variance reduction approach, termed Randomized Coordinates Averaging Descent (RCAD), and incorporate it with both overdamped and underdamped LMC. The methods are termed RCAD-O-LMC and RCAD-U-LMC respectively. The methods still sit in the random gradient approximation framework, and thus the computational cost in each iteration is low. However, by employing RCAD, the variance is reduced, so the methods converge within the same number of iterations as the classical overdamped and underdamped LMC. This leads to a computational saving overall.

قيم البحث

126 - Zhiyan Ding , Qin Li 2020

Langevin Monte Carlo (LMC) is a popular Bayesian sampling method. For the log-concave distribution function, the method converges exponentially fast, up to a controllable discretization error. However, the method requires the evaluation of a full gra dient in each iteration, and for a problem on $mathbb{R}^d$, this amounts to $d$ times partial derivative evaluations per iteration. The cost is high when $dgg1$. In this paper, we investigate how to enhance computational efficiency through the application of RCD (random coordinate descent) on LMC. There are two sides of the theory: 1 By blindly applying RCD to LMC, one surrogates the full gradient by a randomly selected directional derivative per iteration. Although the cost is reduced per iteration, the total number of iteration is increased to achieve a preset error tolerance. Ultimately there is no computational gain; 2 We then incorporate variance reduction techniques, such as SAGA (stochastic average gradient) and SVRG (stochastic variance reduced gradient), into RCD-LMC. It will be proved that the cost is reduced compared with the classical LMC, and in the underdamped case, convergence is achieved with the same number of iterations, while each iteration requires merely one-directional derivative. This means we obtain the best possible computational cost in the underdamped-LMC framework.

التعلم الالي التعلم الآلي

Random Coordinate Langevin Monte Carlo

91 - Zhiyan Ding , Qin Li , Jianfeng Lu 2020

Langevin Monte Carlo (LMC) is a popular Markov chain Monte Carlo sampling method. One drawback is that it requires the computation of the full gradient at each iteration, an expensive operation if the dimension of the problem is high. We propose a ne w sampling method: Random Coordinate LMC (RC-LMC). At each iteration, a single coordinate is randomly selected to be updated by a multiple of the partial derivative along this direction plus noise, and all other coordinates remain untouched. We investigate the total complexity of RC-LMC and compare it with the classical LMC for log-concave probability distributions. When the gradient of the log-density is Lipschitz, RC-LMC is less expensive than the classical LMC if the log-density is highly skewed for high dimensional problems, and when both the gradient and the Hessian of the log-density are Lipschitz, RC-LMC is always cheaper than the classical LMC, by a factor proportional to the square root of the problem dimension. In the latter case, our estimate of complexity is sharp with respect to the dimension.

التعلم الالي التعلم الآلي

Random Coordinate Underdamped Langevin Monte Carlo

176 - Zhiyan Ding , Qin Li , Jianfeng Lu 2020

The Underdamped Langevin Monte Carlo (ULMC) is a popular Markov chain Monte Carlo sampling method. It requires the computation of the full gradient of the log-density at each iteration, an expensive operation if the dimension of the problem is high. We propose a sampling method called Random Coordinate ULMC (RC-ULMC), which selects a single coordinate at each iteration to be updated and leaves the other coordinates untouched. We investigate the computational complexity of RC-ULMC and compare it with the classical ULMC for strongly log-concave probability distributions. We show that RC-ULMC is always cheaper than the classical ULMC, with a significant cost reduction when the problem is highly skewed and high dimensional. Our complexity bound for RC-ULMC is also tight in terms of dimension dependence.

التعلم الالي التعلم الآلي

Constrained Ensemble Langevin Monte Carlo

88 - Zhiyan Ding , Qin Li 2021

The classical Langevin Monte Carlo method looks for samples from a target distribution by descending the samples along the gradient of the target distribution. The method enjoys a fast convergence rate. However, the numerical cost is sometimes high b ecause each iteration requires the computation of a gradient. One approach to eliminate the gradient computation is to employ the concept of ensemble. A large number of particles are evolved together so the neighboring particles provide gradient information to each other. In this article, we discuss two algorithms that integrate the ensemble feature into LMC and the associated properties. In particular, we find that if one directly surrogates the gradient using the ensemble approximation, the algorithm, termed Ensemble Langevin Monte Carlo, is unstable due to a high variance term. If the gradients are replaced by the ensemble approximations only in a constrained manner, to protect from the unstable points, the algorithm, termed Constrained Ensemble Langevin Monte Carlo, resembles the classical LMC up to an ensemble error but removes most of the gradient computation.

التعلم الالي التعلم الآلي التحليل العددي

Learning Patterns in Sample Distributions for Monte Carlo Variance Reduction

79 - Oskar Elek , Manu M. Thomas , Angus Forbes 2019

This paper investigates a novel a-posteriori variance reduction approach in Monte Carlo image synthesis. Unlike most established methods based on lateral filtering in the image space, our proposition is to produce the best possible estimate for each pixel separately, from all the samples drawn for it. To enable this, we systematically study the per-pixel sample distributions for diverse scene configurations. Noting that these are too complex to be characterized by standard statistical distributions (e.g. Gaussians), we identify patterns recurring in them and exploit those for training a variance-reduction model based on neural nets. In result, we obtain numerically better estimates compared to simple averaging of samples. This method is compatible with existing image-space denoising methods, as the improved estimates of our model can be used for further processing. We conclude by discussing how the proposed model could in future be extended for fully progressive rendering with constant memory footprint and scene-sensitive output.

الرسم الحاسوبي التعلم الآلي