No Arabic abstract
We compute approximate solutions to L0 regularized linear regression using L1 regularization, also known as the Lasso, as an initialization step. Our algorithm, the Lass-0 (Lass-zero), uses a computationally efficient stepwise search to determine a locally optimal L0 solution given any L1 regularization solution. We present theoretical results of consistency under orthogonality and appropriate handling of redundant features. Empirically, we use synthetic data to demonstrate that Lass-0 solutions are closer to the true sparse support than L1 regularization models. Additionally, in real-world data Lass-0 finds more parsimonious solutions than L1 regularization while maintaining similar predictive accuracy.
Quantifying uncertainty in predictions or, more generally, estimating the posterior conditional distribution, is a core challenge in machine learning and statistics. We introduce Convex Nonparanormal Regression (CNR), a conditional nonparanormal approach for coping with this task. CNR involves a convex optimization of a posterior defined via a rich dictionary of pre-defined non linear transformations on Gaussians. It can fit an arbitrary conditional distribution, including multimodal and non-symmetric posteriors. For the special but powerful case of a piecewise linear dictionary, we provide a closed form of the posterior mean which can be used for point-wise predictions. Finally, we demonstrate the advantages of CNR over classical competitors using synthetic and real world data.
Non-convex sparse minimization (NSM), or $ell_0$-constrained minimization of convex loss functions, is an important optimization problem that has many machine learning applications. NSM is generally NP-hard, and so to exactly solve NSM is almost impossible in polynomial time. As regards the case of quadratic objective functions, exact algorithms based on quadratic mixed-integer programming (MIP) have been studied, but no existing exact methods can handle more general objective functions including Huber and logistic losses; this is unfortunate since those functions are prevalent in practice. In this paper, we consider NSM with $ell_2$-regularized convex objective functions and develop an algorithm by leveraging the efficiency of best-first search (BFS). Our BFS can compute solutions with objective errors at most $Deltage0$, where $Delta$ is a controllable hyper-parameter that balances the trade-off between the guarantee of objective errors and computation cost. Experiments demonstrate that our BFS is useful for solving moderate-size NSM instances with non-quadratic objectives and that BFS is also faster than the MIP-based method when applied to quadratic objectives.
In this paper, we propose texttt{FedGLOMO}, the first (first-order) FL algorithm that achieves the optimal iteration complexity (i.e matching the known lower bound) on smooth non-convex objectives -- without using clients full gradient in each round. Our key algorithmic idea that enables attaining this optimal complexity is applying judicious momentum terms that promote variance reduction in both the local updates at the clients, and the global update at the server. Our algorithm is also provably optimal even with compressed communication between the clients and the server, which is an important consideration in the practical deployment of FL algorithms. Our experiments illustrate the intrinsic variance reduction effect of texttt{FedGLOMO} which implicitly suppresses client-drift in heterogeneous data distribution settings and promotes communication-efficiency. As a prequel to texttt{FedGLOMO}, we propose texttt{FedLOMO} which applies momentum only in the local client updates. We establish that texttt{FedLOMO} enjoys improved convergence rates under common non-convex settings compared to prior work, and with fewer assumptions.
When we are interested in high-dimensional system and focus on classification performance, the $ell_{1}$-penalized logistic regression is becoming important and popular. However, the Lasso estimates could be problematic when penalties of different coefficients are all the same and not related to the data. We proposed two types of weighted Lasso estimates depending on covariates by the McDiarmid inequality. Given sample size $n$ and dimension of covariates $p$, the finite sample behavior of our proposed methods with a diverging number of predictors is illustrated by non-asymptotic oracle inequalities such as $ell_{1}$-estimation error and squared prediction error of the unknown parameters. We compare the performance of our methods with former weighted estimates on simulated data, then apply these methods to do real data analysis.
This paper proposes a fast and accurate method for sparse regression in the presence of missing data. The underlying statistical model encapsulates the low-dimensional structure of the incomplete data matrix and the sparsity of the regression coefficients, and the proposed algorithm jointly learns the low-dimensional structure of the data and a linear regressor with sparse coefficients. The proposed stochastic optimization method, Sparse Linear Regression with Missing Data (SLRM), performs an alternating minimization procedure and scales well with the problem size. Large deviation inequalities shed light on the impact of the various problem-dependent parameters on the expected squared loss of the learned regressor. Extensive simulations on both synthetic and real datasets show that SLRM performs better than competing algorithms in a variety of contexts.