Truncated Linear Regression in High Dimensions

92 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Dhruv Rohatgi

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Constantinos Daskalakis - Dhruv Rohatgi - Manolis Zampetakis

التعلم الآلي بنى وهياكل البيانات والخوارزميات نظرية الإحصاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

As in standard linear regression, in truncated linear regression, we are given access to observations $(A_i, y_i)_i$ whose dependent variable equals $y_i= A_i^{rm T} cdot x^* + eta_i$, where $x^$ is some fixed unknown vector of interest and $eta_i$ is independent noise; except we are only given an observation if its dependent variable $y_i$ lies in some truncation set $S subset mathbb{R}$. The goal is to recover $x^$ under some favorable conditions on the $A_i$s and the noise distribution. We prove that there exists a computationally and statistically efficient method for recovering $k$-sparse $n$-dimensional vectors $x^*$ from $m$ truncated samples, which attains an optimal $ell_2$ reconstruction error of $O(sqrt{(k log n)/m})$. As a corollary, our guarantees imply a computationally efficient and information-theoretically optimal algorithm for compressed sensing with truncation, which may arise from measurement saturation effects. Our result follows from a statistical and computational analysis of the Stochastic Gradient Descent (SGD) algorithm for solving a natural adaptation of the LASSO optimization problem that accommodates truncation. This generalizes the works of both: (1) [Daskalakis et al. 2018], where no regularization is needed due to the low-dimensionality of the data, and (2) [Wainright 2009], where the objective function is simple due to the absence of truncation. In order to deal with both truncation and high-dimensionality at the same time, we develop new techniques that not only generalize the existing ones but we believe are of independent interest.

قيم البحث

149 - Jonathan Kelner , Frederic Koehler , Raghu Meka 2021

Sparse linear regression is a fundamental problem in high-dimensional statistics, but strikingly little is known about how to efficiently solve it without restrictive conditions on the design matrix. We consider the (correlated) random design setting , where the covariates are independently drawn from a multivariate Gaussian $N(0,Sigma)$ with $Sigma : n times n$, and seek estimators $hat{w}$ minimizing $(hat{w}-w^*)^TSigma(hat{w}-w^*)$, where $w^*$ is the $k$-sparse ground truth. Information theoretically, one can achieve strong error bounds with $O(k log n)$ samples for arbitrary $Sigma$ and $w^*$; however, no efficient algorithms are known to match these guarantees even with $o(n)$ samples, without further assumptions on $Sigma$ or $w^*$. As far as hardness, computational lower bounds are only known with worst-case design matrices. Random-design instances are known which are hard for the Lasso, but these instances can generally be solved by Lasso after a simple change-of-basis (i.e. preconditioning). In this work, we give upper and lower bounds clarifying the power of preconditioning in sparse linear regression. First, we show that the preconditioned Lasso can solve a large class of sparse linear regression problems nearly optimally: it succeeds whenever the dependency structure of the covariates, in the sense of the Markov property, has low treewidth -- even if $Sigma$ is highly ill-conditioned. Second, we construct (for the first time) random-design instances which are provably hard for an optimally preconditioned Lasso. In fact, we complete our treewidth classification by proving that for any treewidth-$t$ graph, there exists a Gaussian Markov Random Field on this graph such that the preconditioned Lasso, with any choice of preconditioner, requires $Omega(t^{1/20})$ samples to recover $O(log n)$-sparse signals when covariates are drawn from this model.

التعلم الآلي بنى وهياكل البيانات والخوارزميات نظرية الإحصاء

Restricted Eigenvalue from Stable Rank with Applications to Sparse Linear Regression

77 - Shiva Prasad Kasiviswanathan , Mark Rudelson 2017

High-dimensional settings, where the data dimension ($d$) far exceeds the number of observations ($n$), are common in many statistical and machine learning applications. Methods based on $ell_1$-relaxation, such as Lasso, are very popular for sparse recovery in these settings. Restricted Eigenvalue (RE) condition is among the weakest, and hence the most general, condition in literature imposed on the Gram matrix that guarantees nice statistical properties for the Lasso estimator. It is natural to ask: what families of matrices satisfy the RE condition? Following a line of work in this area, we construct a new broad ensemble of dependent random design matrices that have an explicit RE bound. Our construction starts with a fixed (deterministic) matrix $X in mathbb{R}^{n times d}$ satisfying a simple stable rank condition, and we show that a matrix drawn from the distribution $X Phi^top Phi$, where $Phi in mathbb{R}^{m times d}$ is a subgaussian random matrix, with high probability, satisfies the RE condition. This construction allows incorporating a fixed matrix that has an easily {em verifiable} condition into the design process, and allows for generation of {em compressed} design matrices that have a lower storage requirement than a standard design matrix. We give two applications of this construction to sparse linear regression problems, including one to a compressed sparse regression setting where the regression algorithm only has access to a compressed representation of a fixed design matrix $X$.

التعلم الالي بنى وهياكل البيانات والخوارزميات نظرية الإحصاء

New approach to Bayesian high-dimensional linear regression

177 - Shirin Jalali , Arian Maleki 2016

Consider the problem of estimating parameters $X^n in mathbb{R}^n $, generated by a stationary process, from $m$ response variables $Y^m = AX^n+Z^m$, under the assumption that the distribution of $X^n$ is known. This is the most general version of th e Bayesian linear regression problem. The lack of computationally feasible algorithms that can employ generic prior distributions and provide a good estimate of $X^n$ has limited the set of distributions researchers use to model the data. In this paper, a new scheme called Q-MAP is proposed. The new method has the following properties: (i) It has similarities to the popular MAP estimation under the noiseless setting. (ii) In the noiseless setting, it achieves the asymptotically optimal performance when $X^n$ has independent and identically distributed components. (iii) It scales favorably with the dimensions of the problem and therefore is applicable to high-dimensional setups. (iv) The solution of the Q-MAP optimization can be found via a proposed iterative algorithm which is provably robust to the error (noise) in the response variables.

نظرية المعلومات نظرية المعلومات نظرية الإحصاء

Truthful Linear Regression

431 - Rachel Cummings , Stratis Ioannidis , 2015

We consider the problem of fitting a linear model to data held by individuals who are concerned about their privacy. Incentivizing most players to truthfully report their data to the analyst constrains our design to mechanisms that provide a privacy guarantee to the participants; we use differential privacy to model individuals privacy losses. This immediately poses a problem, as differentially private computation of a linear model necessarily produces a biased estimation, and existing approaches to design mechanisms to elicit data from privacy-sensitive individuals do not generalize well to biased estimators. We overcome this challenge through an appropriate design of the computation and payment scheme.

علوم الكمبيوتر ونظرية الألعاب بنى وهياكل البيانات والخوارزميات التعلم الالي

Causal Strategic Linear Regression

67 - Yonadav Shavit , Benjamin Edelman , Brian Axelrod 2020

In many predictive decision-making scenarios, such as credit scoring and academic testing, a decision-maker must construct a model that accounts for agents propensity to game the decision rule by changing their features so as to receive better decisi ons. Whereas the strategic classification literature has previously assumed that agents outcomes are not causally affected by their features (and thus that strategic agents goal is deceiving the decision-maker), we join concurrent work in modeling agents outcomes as a function of their changeable attributes. As our main contribution, we provide efficient algorithms for learning decision rules that optimize three distinct decision-maker objectives in a realizable linear setting: accurately predicting agents post-gaming outcomes (prediction risk minimization), incentivizing agents to improve these outcomes (agent outcome maximization), and estimating the coefficients of the true underlying model (parameter estimation). Our algorithms circumvent a hardness result of Miller et al. (2020) by allowing the decision maker to test a sequence of decision rules and observe agents responses, in effect performing causal interventions through the decision rules.

التعلم الآلي الذكاء الاصطناعي التعلم الالي