The paper proposes and justifies a new algorithm of the proximal Newton type to solve a broad class of nonsmooth composite convex optimization problems without strong convexity assumptions. Based on advanced notions and techniques of variational analysis, we establish implementable results on the global convergence of the proposed algorithm as well as its local convergence with superlinear and quadratic rates. For certain structural problems, the obtained local convergence conditions do not require the local Lipschitz continuity of the corresponding Hessian mappings that is a crucial assumption used in the literature to ensure a superlinear convergence of other algorithms of the proximal Newton type. The conducted numerical experiments of solving the $l_1$ regularized logistic regression model illustrate the possibility of applying the proposed algorithm to deal with practically important problems.
Minimax optimization problems are an important class of optimization problems arising from modern machine learning and traditional research areas. While there have been many numerical algorithms for solving smooth convex-concave minimax problems, numerical algorithms for nonsmooth convex-concave minimax problems are very rare. This paper aims to develop an efficient numerical algorithm for a structured nonsmooth convex-concave minimax problem. A majorized semi-proximal alternating coordinate method (mspACM) is proposed, in which a majorized quadratic convex-concave function is adopted for approximating the smooth part of the objective function and semi-proximal terms are added in each subproblem. This construction enables the subproblems at each iteration are solvable and even easily solved when the semiproximal terms are cleverly chosen. We prove the global convergence of the algorithm mspACM under mild assumptions, without requiring strong convexity-concavity condition. Under the locally metrical subregularity of the solution mapping, we prove that the algorithm mspACM has the linear rate of convergence. Preliminary numerical results are reported to verify the efficiency of the algorithm mspACM.
In this work, we present a globalized stochastic semismooth Newton method for solving stochastic optimization problems involving smooth nonconvex and nonsmooth convex terms in the objective function. We assume that only noisy gradient and Hessian information of the smooth part of the objective function is available via calling stochastic first and second order oracles. The proposed method can be seen as a hybrid approach combining stochastic semismooth Newton steps and stochastic proximal gradient steps. Two inexact growth conditions are incorporated to monitor the convergence and the acceptance of the semismooth Newton steps and it is shown that the algorithm converges globally to stationary points in expectation. Moreover, under standard assumptions and utilizing random matrix concentration inequalities, we prove that the proposed approach locally turns into a pure stochastic semismooth Newton method and converges r-superlinearly with high probability. We present numerical results and comparisons on $ell_1$-regularized logistic regression and nonconvex binary classification that demonstrate the efficiency of our algorithm.
Newtons method for polynomial root finding is one of mathematics most well-known algorithms. The method also has its shortcomings: it is undefined at critical points, it could exhibit chaotic behavior and is only guaranteed to converge locally. Based on the {it Geometric Modulus Principle} for a complex polynomial $p(z)$, together with a {it Modulus Reduction Theorem} proved here, we develop the {it Robust Newtons method} (RNM), defined everywhere with a step-size that guarantees an {it a priori} reduction in polynomial modulus in each iteration. Furthermore, we prove RNM iterates converge globally, either to a root or a critical point. Specifically, given $varepsilon $ and any seed $z_0$, in $t=O(1/varepsilon^{2})$ iterations of RNM, independent of degree of $p(z)$, either $|p(z_t)| leq varepsilon$ or $|p(z_t) p(z_t)| leq varepsilon$. By adjusting the iterates at {it near-critical points}, we describe a {it modified} RNM that necessarily convergence to a root. In combination with Smales point estimation, RNM results in a globally convergent Newtons method having a locally quadratic rate. We present sample polynomiographs that demonstrate how in contrast with Newtons method RNM smooths out the fractal boundaries of basins of attraction of roots. RNM also finds potentials in computing all roots of arbitrary degree polynomials. A particular consequence of RNM is a simple algorithm for solving cubic equations.
The Fast Proximal Gradient Method (FPGM) and the Monotone FPGM (MFPGM) for minimization of nonsmooth convex functions are introduced and applied to tomographic image reconstruction. Convergence properties of the sequence of objective function values are derived, including a $Oleft(1/k^{2}right)$ non-asymptotic bound. The presented theory broadens current knowledge and explains the convergence behavior of certain methods that are known to present good practical performance. Numerical experimentation involving computerized tomography image reconstruction shows the methods to be competitive in practical scenarios. Experimental comparison with Algebraic Reconstruction Techniques are performed uncovering certain behaviors of accelerated Proximal Gradient algorithms that apparently have not yet been noticed when these are applied to tomographic image reconstruction.
Decentralized optimization is a powerful paradigm that finds applications in engineering and learning design. This work studies decentralized composite optimization problems with non-smooth regularization terms. Most existing gradient-based proximal decentralized methods are known to converge to the optimal solution with sublinear rates, and it remains unclear whether this family of methods can achieve global linear convergence. To tackle this problem, this work assumes the non-smooth regularization term is common across all networked agents, which is the case for many machine learning problems. Under this condition, we design a proximal gradient decentralized algorithm whose fixed point coincides with the desired minimizer. We then provide a concise proof that establishes its linear convergence. In the absence of the non-smooth term, our analysis technique covers the well known EXTRA algorithm and provides useful bounds on the convergence rate and step-size.