ترغب بنشر مسار تعليمي؟ اضغط هنا

Zeroth-order randomized block methods for constrained minimization of expectation-valued Lipschitz continuous functions

204   0   0.0 ( 0 )
 نشر من قبل Farzad Yousefian
 تاريخ النشر 2021
  مجال البحث
والبحث باللغة English




اسأل ChatGPT حول البحث

We consider the minimization of an $L_0$-Lipschitz continuous and expectation-valued function, denoted by $f$ and defined as $f(x)triangleq mathbb{E}[tilde{f}(x,omega)]$, over a Cartesian product of closed and convex sets with a view towards obtaining both asymptotics as well as rate and complexity guarantees for computing an approximate stationary point (in a Clarke sense). We adopt a smoothing-based approach reliant on minimizing $f_{eta}$ where $f_{eta}(x) triangleq mathbb{E}_{u}[f(x+eta u)]$, $u$ is a random variable defined on a unit sphere, and $eta > 0$. In fact, it is observed that a stationary point of the $eta$-smoothed problem is a $2eta$-stationary point for the original problem in the Clarke sense. In such a setting, we derive a suitable residual function that provides a metric for stationarity for the smoothed problem. By leveraging a zeroth-order framework reliant on utilizing sampled function evaluations implemented in a block-structured regime, we make two sets of contributions for the sequence generated by the proposed scheme. (i) The residual function of the smoothed problem tends to zero almost surely along the generated sequence; (ii) To compute an $x$ that ensures that the expected norm of the residual of the $eta$-smoothed problem is within $epsilon$ requires no greater than $mathcal{O}(tfrac{1}{eta epsilon^2})$ projection steps and $mathcal{O}left(tfrac{1}{eta^2 epsilon^4}right)$ function evaluations. These statements appear to be novel and there appear to be few results to contend with general nonsmooth, nonconvex, and stochastic regimes via zeroth-order approaches.

قيم البحث

اقرأ أيضاً

199 - Xin Chen , Jorge I. Poveda , Na Li 2021
In power distribution systems, the growing penetration of renewable energy resources brings new challenges to maintaining voltage safety, which is further complicated by the limited model information of distribution systems. To address these challeng es, we develop a model-free optimal voltage control algorithm based on projected primal-dual gradient dynamics and continuous-time zeroth-order method (extreme seeking control). This proposed algorithm i) operates purely based on voltage measurements and does not require any other model information, ii) can drive the voltage magnitudes back to the acceptable range, iii) satisfies the power capacity constraints all the time, iv) minimizes the total operating cost, and v) is implemented in a decentralized fashion where the privacy of controllable devices is preserved and plug-and-play operation is enabled. We prove that the proposed algorithm is semi-globally practically asymptotically stable and is structurally robust to measurement noises. Lastly, the performance of the proposed algorithm is further demonstrated via numerical simulations.
170 - Yair Carmon , John C. Duchi 2020
We consider minimization of indefinite quadratics with either trust-region (norm) constraints or cubic regularization. Despite the nonconvexity of these problems we prove that, under mild assumptions, gradient descent converges to their global soluti ons, and give a non-asymptotic rate of convergence for the cubic variant. We also consider Krylov subspace solutions and establish sharp convergence guarantees to the solutions of both trust-region and cubic-regularized problems. Our rates mirror the behavior of these methods on convex quadratics and eigenvector problems, highlighting their scalability. When we use Krylov subspace solutions to approximate the cubic-regularized Newton step, our results recover the strongest known convergence guarantees to approximate second-order stationary points of general smooth nonconvex functions.
We consider the zeroth-order optimization problem in the huge-scale setting, where the dimension of the problem is so large that performing even basic vector operations on the decision variables is infeasible. In this paper, we propose a novel algori thm, coined ZO-BCD, that exhibits favorable overall query complexity and has a much smaller per-iteration computational complexity. In addition, we discuss how the memory footprint of ZO-BCD can be reduced even further by the clever use of circulant measurement matrices. As an application of our new method, we propose the idea of crafting adversarial attacks on neural network based classifiers in a wavelet domain, which can result in problem dimensions of over 1.7 million. In particular, we show that crafting adversarial examples to audio classifiers in a wavelet domain can achieve the state-of-the-art attack success rate of 97.9%.
We derive two upper bounds for the probability of deviation of a vector-valued Lipschitz function of a collection of random variables from its expected value. The resulting upper bounds can be tighter than bounds obtained by a direct application of a classical theorem due to Bobkov and G{o}tze.
85 - Ganzhao Yuan 2021
Difference-of-Convex (DC) minimization, referring to the problem of minimizing the difference of two convex functions, has been found rich applications in statistical learning and studied extensively for decades. However, existing methods are primari ly based on multi-stage convex relaxation, only leading to weak optimality of critical points. This paper proposes a coordinate descent method for minimizing DC functions based on sequential nonconvex approximation. Our approach iteratively solves a nonconvex one-dimensional subproblem globally, and it is guaranteed to converge to a coordinate-wise stationary point. We prove that this new optimality condition is always stronger than the critical point condition and the directional point condition when the objective function is weakly convex. For comparisons, we also include a naive variant of coordinate descent methods based on sequential convex approximation in our study. When the objective function satisfies an additional regularity condition called emph{sharpness}, coordinate descent methods with an appropriate initialization converge emph{linearly} to the optimal solution set. Also, for many applications of interest, we show that the nonconvex one-dimensional subproblem can be computed exactly and efficiently using a breakpoint searching method. We present some discussions and extensions of our proposed method. Finally, we have conducted extensive experiments on several statistical learning tasks to show the superiority of our approach. Keywords: Coordinate Descent, DC Minimization, DC Programming, Difference-of-Convex Programs, Nonconvex Optimization, Sparse Optimization, Binary Optimization.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا