No Arabic abstract
Exploring the relationship among multiple sets of data from one same group enables practitioners to make better decisions in medical science and engineering. In this paper, we propose a sparse collaborative learning (SCL) model, an optimization with double-sparsity constraints, to process the problem with two sets of data and a shared response variable. It is capable of dealing with the classification problems or the regression problems dependent on the discreteness of the response variable as well as exploring the relationship between two datasets simultaneously. To solve SCL, we first present some necessary and sufficient optimality conditions and then design a gradient projection Newton algorithm which has proven to converge to a unique locally optimal solution globally with at least a quadratic convergence rate. Finally, the reported numerical experiments illustrate the efficiency of the proposed method.
The smoothly clipped absolute deviation (SCAD) and the minimax concave penalty (MCP) penalized regression models are two important and widely used nonconvex sparse learning tools that can handle variable selection and parameter estimation simultaneously, and thus have potential applications in various fields such as mining biological data in high-throughput biomedical studies. Theoretically, these two models enjoy the oracle property even in the high-dimensional settings, where the number of predictors $p$ may be much larger than the number of observations $n$. However, numerically, it is quite challenging to develop fast and stable algorithms due to their non-convexity and non-smoothness. In this paper we develop a fast algorithm for SCAD and MCP penalized learning problems. First, we show that the global minimizers of both models are roots of the nonsmooth equations. Then, a semi-smooth Newton (SSN) algorithm is employed to solve the equations. We prove that the SSN algorithm converges locally and superlinearly to the Karush-Kuhn-Tucker (KKT) points. Computational complexity analysis shows that the cost of the SSN algorithm per iteration is $O(np)$. Combined with the warm-start technique, the SSN algorithm can be very efficient and accurate. Simulation studies and a real data example suggest that our SSN algorithm, with comparable solution accuracy with the coordinate descent (CD) and the difference of convex (DC) proximal Newton algorithms, is more computationally efficient.
Sparse optimization is a central problem in machine learning and computer vision. However, this problem is inherently NP-hard and thus difficult to solve in general. Combinatorial search methods find the global optimal solution but are confined to small-sized problems, while coordinate descent methods are efficient but often suffer from poor local minima. This paper considers a new block decomposition algorithm that combines the effectiveness of combinatorial search methods and the efficiency of coordinate descent methods. Specifically, we consider a random strategy or/and a greedy strategy to select a subset of coordinates as the working set, and then perform a global combinatorial search over the working set based on the original objective function. We show that our method finds stronger stationary points than Amir Beck et al.s coordinate-wise optimization method. In addition, we establish the convergence rate of our algorithm. Our experiments on solving sparse regularized and sparsity constrained least squares optimization problems demonstrate that our method achieves state-of-the-art performance in terms of accuracy. For example, our method generally outperforms the well-known greedy pursuit method.
An important method to optimize a function on standard simplex is the active set algorithm, which requires the gradient of the function to be projected onto a hyperplane, with sign constraints on the variables that lie in the boundary of the simplex. We propose a new algorithm to efficiently project the gradient for this purpose. Furthermore, we apply the proposed gradient projection method to quadratic programs (QP) with standard simplex constraints, where gradient projection is used to explore the feasible region and, when we believe the optimal active set is identified, we switch to constrained conjugate gradient to accelerate convergence. Specifically, two different directions of gradient projection are used to explore the simplex, namely, the projected gradient and the reduced gradient. We choose one of the two directions according to the angle between the directions. Moreover, we propose two conditions for guessing the optimal active set heuristically. The first condition is that the working set remains unchanged for many iterations, and the second condition is that the angle between the projected gradient and the reduced gradient is small enough. Based on these strategies, a new active set algorithm for solving quadratic programs on standard simplex is proposed.
This paper describes an extension of the BFGS and L-BFGS methods for the minimization of a nonlinear function subject to errors. This work is motivated by applications that contain computational noise, employ low-precision arithmetic, or are subject to statistical noise. The classical BFGS and L-BFGS methods can fail in such circumstances because the updating procedure can be corrupted and the line search can behave erratically. The proposed method addresses these difficulties and ensures that the BFGS update is stable by employing a lengthening procedure that spaces out the points at which gradient differences are collected. A new line search, designed to tolerate errors, guarantees that the Armijo-Wolfe conditions are satisfied under most reasonable conditions, and works in conjunction with the lengthening procedure. The proposed methods are shown to enjoy convergence guarantees for strongly convex functions. Detailed implementations of the methods are presented, together with encouraging numerical results.
Novel sparse reconstruction algorithms are proposed for beamspace channel estimation in massive multiple-input multiple-output systems. The proposed algorithms minimize a least-squares objective having a nonconvex regularizer. This regularizer removes the penalties on a few large-magnitude elements from the conventional l1-norm regularizer, and thus it only forces penalties on the remaining elements that are expected to be zeros. Accurate and fast reconstructions can be achieved by performing gradient projection updates within the framework of difference of convex functions (DC) programming. A double-loop algorithm and a single-loop algorithm are proposed via different DC decompositions, and these two algorithms have distinct computation complexities and convergence rates. Then, an extension algorithm is further proposed by designing the step sizes of the single-loop algorithm. The extension algorithm has a faster convergence rate and can achieve approximately the same level of accuracy as the proposed double-loop algorithm. Numerical results show significant advantages of the proposed algorithms over existing reconstruction algorithms in terms of reconstruction accuracies and runtimes. Compared to the benchmark channel estimation techniques, the proposed algorithms also achieve smaller mean squared error and higher achievable spectral efficiency.