ﻻ يوجد ملخص باللغة العربية
We present theoretical results on the convergence of emph{non-convex} accelerated gradient descent in matrix factorization models with $ell_2$-norm loss. The purpose of this work is to study the effects of acceleration in non-convex settings, where provable convergence with acceleration should not be considered a emph{de facto} property. The technique is applied to matrix sensing problems, for the estimation of a rank $r$ optimal solution $X^star in mathbb{R}^{n times n}$. Our contributions can be summarized as follows. $i)$ We show that acceleration in factored gradient descent converges at a linear rate; this fact is novel for non-convex matrix factorization settings, under common assumptions. $ii)$ Our proof technique requires the acceleration parameter to be carefully selected, based on the properties of the problem, such as the condition number of $X^star$ and the condition number of objective function. $iii)$ Currently, our proof leads to the same dependence on the condition number(s) in the contraction parameter, similar to recent results on non-accelerated algorithms. $iv)$ Acceleration is observed in practice, both in synthetic examples and in two real applications: neuronal multi-unit activities recovery from single electrode recordings, and quantum state tomography on quantum computing simulators.
We analyze the DQN reinforcement learning algorithm as a stochastic approximation scheme using the o.d.e. (for ordinary differential equation) approach and point out certain theoretical issues. We then propose a modified scheme called Full Gradient D
One of the mysteries in the success of neural networks is randomly initialized first order methods like gradient descent can achieve zero training loss even though the objective function is non-convex and non-smooth. This paper demystifies this surpr
We propose a provably convergent method, called Efficient Learned Descent Algorithm (ELDA), for low-dose CT (LDCT) reconstruction. ELDA is a highly interpretable neural network architecture with learned parameters and meanwhile retains convergence gu
Non-convex optimization problems are challenging to solve; the success and computational expense of a gradient descent algorithm or variant depend heavily on the initialization strategy. Often, either random initialization is used or initialization r
We present the first provably convergent two-timescale off-policy actor-critic algorithm (COF-PAC) with function approximation. Key to COF-PAC is the introduction of a new critic, the emphasis critic, which is trained via Gradient Emphasis Learning (