بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Non-asymptotic convergence bounds for Wasserstein approximation using point clouds

126 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Quentin Merigot

تاريخ النشر 2021

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Quentin Merigot - Filippo Santambrogio (ICJ

التحسين والتحكم التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Several issues in machine learning and inverse problems require to generate discrete data, as if sampled from a model probability distribution. A common way to do so relies on the construction of a uniform probability distribution over a set of $N$ points which minimizes the Wasserstein distance to the model distribution. This minimization problem, where the unknowns are the positions of the atoms, is non-convex. Yet, in most cases, a suitably adjusted version of Lloyds algorithm -- in which Voronoi cells are replaced by Power cells -- leads to configurations with small Wasserstein error. This is surprising because, again, of the non-convex nature of the problem, as well as the existence of spurious critical points. We provide explicit upper bounds for the convergence speed of this Lloyd-type algorithm, starting from a cloud of points sufficiently far from each other. This already works after one step of the iteration procedure, and similar bounds can be deduced, for the corresponding gradient descent. These bounds naturally lead to a modified Poliak-Lojasiewicz inequality for the Wasserstein distance cost, with an error term depending on the distances between Dirac masses in the discrete distribution.

قيم البحث

57 - Maximilien Germain , Huy^en Pham (LPSM 2021

We prove a rate of convergence for the $N$-particle approximation of a second-order partial differential equation in the space of probability measures, like the Master equation or Bellman equation of mean-field control problem under common noise. The rate is of order $1/N$ for the pathwise error on the solution $v$ and of order $1/sqrt{N}$ for the $L^2$-error on its $L$-derivative $partial_mu v$. The proof relies on backward stochastic differential equations techniques.

التحسين والتحكم الاحتمالات المالية الحاسوبية

Convergence rates and approximation results for SGD and its continuous-time counterpart

169 - Xavier Fontaine , Valentin De Bortoli , 2020

This paper proposes a thorough theoretical analysis of Stochastic Gradient Descent (SGD) with non-increasing step sizes. First, we show that the recursion defining SGD can be provably approximated by solutions of a time inhomogeneous Stochastic Diffe rential Equation (SDE) using an appropriate coupling. In the specific case of a batch noise we refine our results using recent advances in Steins method. Then, motivated by recent analyses of deterministic and stochastic optimization methods by their continuous counterpart, we study the long-time behavior of the continuous processes at hand and establish non-asymptotic bounds. To that purpose, we develop new comparison techniques which are of independent interest. Adapting these techniques to the discrete setting, we show that the same results hold for the corresponding SGD sequences. In our analysis, we notably improve non-asymptotic bounds in the convex setting for SGD under weaker assumptions than the ones considered in previous works. Finally, we also establish finite-time convergence results under various conditions, including relaxations of the famous {L}ojasiewicz inequality, which can be applied to a class of non-convex functions.

التحسين والتحكم التعلم الالي

Stochastic Approximation versus Sample Average Approximation for population Wasserstein barycenters

94 - Darina Dvinskikh 2020

In machine learning and optimization community there are two main approaches for convex risk minimization problem, namely, the Stochastic Approximation (SA) and the Sample Average Approximation (SAA). In terms of oracle complexity (required number of stochastic gradient evaluations), both approaches are considered equivalent on average (up to a logarithmic factor). The total complexity depends on the specific problem, however, starting from work cite{nemirovski2009robust} it was generally accepted that the SA is better than the SAA. Nevertheless, in case of large-scale problems SA may run out of memory as storing all data on one machine and organizing online access to it can be impossible without communications with other machines. SAA in contradistinction to SA allows parallel/distributed calculations. In this paper, we shed new light on the comparison of SA and SAA for particular problem of calculating the population (regularized) Wasserstein barycenter of discrete measures. The conclusion is valid even for non-parallel (non-decentralized) setup.

التحسين والتحكم التعلم الآلي التعلم الالي

Approximation Bounds for Sparse Programs

72 - Armin Askari , Alexandre dAspremont , Laurent El Ghaoui 2021

We show that sparsity constrained optimization problems over low dimensional spaces tend to have a small duality gap. We use the Shapley-Folkman theorem to derive both data-driven bounds on the duality gap, and an efficient primalization procedure to recover feasible points satisfying these bounds. These error bounds are proportional to the rate of growth of the objective with the target cardinality, which means in particular that the relaxation is nearly tight as soon as the target cardinality is large enough so that only uninformative features are added.

التحسين والتحكم

The perturbed prox-preconditioned spider algorithm: non-asymptotic convergence bounds

72 - Gersende Fort 2021

A novel algorithm named Perturbed Prox-Preconditioned SPIDER (3P-SPIDER) is introduced. It is a stochastic variancereduced proximal-gradient type algorithm built on Stochastic Path Integral Differential EstimatoR (SPIDER), an algorithm known to achie ve near-optimal first-order oracle inequality for nonconvex and nonsmooth optimization. Compared to the vanilla prox-SPIDER, 3P-SPIDER uses preconditioned gradient estimators. Preconditioning can either be applied explicitly to a gradient estimator or be introduced implicitly as in applications to the EM algorithm. 3P-SPIDER also assumes that the preconditioned gradients may (possibly) be not known in closed analytical form and therefore must be approximated which adds an additional degree of perturbation. Studying the convergence in expectation, we show that 3P-SPIDER achieves a near-optimal oracle inequality O(n^(1/2) /epsilon) where n is the number of observations and epsilon the target precision even when the gradient is estimated by Monte Carlo methods. We illustrate the algorithm on an application to the minimization of a penalized empirical loss.

معالجة الإشارات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة دمشق

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Non-asymptotic convergence bounds for Wasserstein approximation using point clouds

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً