No Arabic abstract
The idea of unfolding iterative algorithms as deep neural networks has been widely applied in solving sparse coding problems, providing both solid theoretical analysis in convergence rate and superior empirical performance. However, for sparse nonlinear regression problems, a similar idea is rarely exploited due to the complexity of nonlinearity. In this work, we bridge this gap by introducing the Nonlinear Learned Iterative Shrinkage Thresholding Algorithm (NLISTA), which can attain a linear convergence under suitable conditions. Experiments on synthetic data corroborate our theoretical results and show our method outperforms state-of-the-art methods.
Most of machine learning approaches have stemmed from the application of minimizing the mean squared distance principle, based on the computationally efficient quadratic optimization methods. However, when faced with high-dimensional and noisy data, the quadratic error functionals demonstrated many weaknesses including high sensitivity to contaminating factors and dimensionality curse. Therefore, a lot of recent applications in machine learning exploited properties of non-quadratic error functionals based on $L_1$ norm or even sub-linear potentials corresponding to quasinorms $L_p$ ($0<p<1$). The back side of these approaches is increase in computational cost for optimization. Till so far, no approaches have been suggested to deal with {it arbitrary} error functionals, in a flexible and computationally efficient framework. In this paper, we develop a theory and basic universal data approximation algorithms ($k$-means, principal components, principal manifolds and graphs, regularized and sparse regression), based on piece-wise quadratic error potentials of subquadratic growth (PQSQ potentials). We develop a new and universal framework to minimize {it arbitrary sub-quadratic error potentials} using an algorithm with guaranteed fast convergence to the local or global error minimum. The theory of PQSQ potentials is based on the notion of the cone of minorant functions, and represents a natural approximation formalism based on the application of min-plus algebra. The approach can be applied in most of existing machine learning methods, including methods of data approximation and regularized and sparse regression, leading to the improvement in the computational cost/accuracy trade-off. We demonstrate that on synthetic and real-life datasets PQSQ-based machine learning methods achieve orders of magnitude faster computational performance than the corresponding state-of-the-art methods.
Differentially private (DP) learning, which aims to accurately extract patterns from the given dataset without exposing individual information, is an important subfield in machine learning and has been extensively explored. However, quantum algorithms that could preserve privacy, while outperform their classical counterparts, are still lacking. The difficulty arises from the distinct priorities in DP and quantum machine learning, i.e., the former concerns a low utility bound while the latter pursues a low runtime cost. These varied goals request that the proposed quantum DP algorithm should achieve the runtime speedup over the best known classical results while preserving the optimal utility bound. The Lasso estimator is broadly employed to tackle the high dimensional sparse linear regression tasks. The main contribution of this paper is devising a quantum DP Lasso estimator to earn the runtime speedup with the privacy preservation, i.e., the runtime complexity is $tilde{O}(N^{3/2}sqrt{d})$ with a nearly optimal utility bound $tilde{O}(1/N^{2/3})$, where $N$ is the sample size and $d$ is the data dimension with $Nll d$. Since the optimal classical (private) Lasso takes $Omega(N+d)$ runtime, our proposal achieves quantum speedups when $N<O(d^{1/3})$. There are two key components in our algorithm. First, we extend the Frank-Wolfe algorithm from the classical Lasso to the quantum scenario, {where the proposed quantum non-private Lasso achieves a quadratic runtime speedup over the optimal classical Lasso.} Second, we develop an adaptive privacy mechanism to ensure the privacy guarantee of the non-private Lasso. Our proposal opens an avenue to design various learning tasks with both the proven runtime speedups and the privacy preservation.
We propose a sparse and low-rank tensor regression model to relate a univariate outcome to a feature tensor, in which each unit-rank tensor from the CP decomposition of the coefficient tensor is assumed to be sparse. This structure is both parsimonious and highly interpretable, as it implies that the outcome is related to the features through a few distinct pathways, each of which may only involve subsets of feature dimensions. We take a divide-and-conquer strategy to simplify the task into a set of sparse unit-rank tensor regression problems. To make the computation efficient and scalable, for the unit-rank tensor regression, we propose a stagewise estimation procedure to efficiently trace out its entire solution path. We show that as the step size goes to zero, the stagewise solution paths converge exactly to those of the corresponding regularized regression. The superior performance of our approach is demonstrated on various real-world and synthetic examples.
Sparse linear regression is a fundamental problem in high-dimensional statistics, but strikingly little is known about how to efficiently solve it without restrictive conditions on the design matrix. We consider the (correlated) random design setting, where the covariates are independently drawn from a multivariate Gaussian $N(0,Sigma)$ with $Sigma : n times n$, and seek estimators $hat{w}$ minimizing $(hat{w}-w^*)^TSigma(hat{w}-w^*)$, where $w^*$ is the $k$-sparse ground truth. Information theoretically, one can achieve strong error bounds with $O(k log n)$ samples for arbitrary $Sigma$ and $w^*$; however, no efficient algorithms are known to match these guarantees even with $o(n)$ samples, without further assumptions on $Sigma$ or $w^*$. As far as hardness, computational lower bounds are only known with worst-case design matrices. Random-design instances are known which are hard for the Lasso, but these instances can generally be solved by Lasso after a simple change-of-basis (i.e. preconditioning). In this work, we give upper and lower bounds clarifying the power of preconditioning in sparse linear regression. First, we show that the preconditioned Lasso can solve a large class of sparse linear regression problems nearly optimally: it succeeds whenever the dependency structure of the covariates, in the sense of the Markov property, has low treewidth -- even if $Sigma$ is highly ill-conditioned. Second, we construct (for the first time) random-design instances which are provably hard for an optimally preconditioned Lasso. In fact, we complete our treewidth classification by proving that for any treewidth-$t$ graph, there exists a Gaussian Markov Random Field on this graph such that the preconditioned Lasso, with any choice of preconditioner, requires $Omega(t^{1/20})$ samples to recover $O(log n)$-sparse signals when covariates are drawn from this model.
The Wagner function in classical unsteady aerodynamic theory represents the response in lift on an airfoil that is subject to a sudden change in conditions. While it plays a fundamental role in the development and application of unsteady aerodynamic methods, explicit expressions for this function are difficult to obtain. The Wagner function requires computation of an inverse Laplace transform, or similar inversion, of a non-rational function in the Laplace domain, which is closely related to the Theodorsen function. This has led to numerous proposed approximations to the Wagner function, which facilitate convenient and rapid computations. While these approximations can be sufficient for many purposes, their behavior is often noticeably different from the true Wagner function, especially for long-time asymptotic behavior. In particular, while many approximations have small maximum absolute error across all times, the relative error of the asymptotic behavior can be substantial. As well as documenting this error, we propose an alternative approximation methodology that is accurate for all times, for a variety of accuracy measures. This methodology casts the Wagner function as the solution of a nonlinear scalar ordinary differential equation, which is identified using a variant of the sparse identification of nonlinear dynamics (SINDy) algorithm. We show that this approach can give accurate approximations using either first- or second-order differential equations. We additionally show that this method can be applied to model the analogous lift response for a more realistic aerodynamic system, featuring a finite thickness airfoil and a nonplanar wake.