New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Near-optimal inference in adaptive linear regression

340 0 0.0 ( 0 )

Download Cite

Added by Koulik Khamaru

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Koulik Khamaru - Yash Deshpande - Lester Mackey

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

When data is collected in an adaptive manner, even simple methods like ordinary least squares can exhibit non-normal asymptotic behavior. As an undesirable consequence, hypothesis tests and confidence intervals based on asymptotic normality can lead to erroneous results. We propose an online debiasing estimator to correct these distributional anomalies in least squares estimation. Our proposed method takes advantage of the covariance structure present in the dataset and provides sharper estimates in directions for which more information has accrued. We establish an asymptotic normality property for our proposed online debiasing estimator under mild conditions on the data collection process, and provide asymptotically exact confidence intervals. We additionally prove a minimax lower bound for the adaptive linear regression problem, thereby providing a baseline by which to compare estimators. There are various conditions under which our proposed estimator achieves the minimax lower bound up to logarithmic factors. We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.

rate research

Variational Inference in high-dimensional linear regression

144 - Sumit Mukherjee , Subhabrata Sen 2021

We study high-dimensional Bayesian linear regression with product priors. Using the nascent theory of non-linear large deviations (Chatterjee and Dembo,2016), we derive sufficient conditions for the leading-order correctness of the naive mean-field approximation to the log-normalizing constant of the posterior distribution. Subsequently, assuming a true linear model for the observed data, we derive a limiting infinite dimensional variational formula for the log normalizing constant of the posterior. Furthermore, we establish that under an additional separation condition, the variational problem has a unique optimizer, and this optimizer governs the probabilistic properties of the posterior distribution. We provide intuitive sufficient conditions for the validity of this separation condition. Finally, we illustrate our results on concrete examples with specific design matrices.

Statistics Theory Probability Machine Learning

Inference in High-Dimensional Linear Regression via Lattice Basis Reduction and Integer Relation Detection

85 - David Gamarnik , Eren C. K{i}z{i}ldau{g} , Ilias Zadik 2019

We focus on the high-dimensional linear regression problem, where the algorithmic goal is to efficiently infer an unknown feature vector $beta^*inmathbb{R}^p$ from its linear measurements, using a small number $n$ of samples. Unlike most of the literature, we make no sparsity assumption on $beta^*$, but instead adopt a different regularization: In the noiseless setting, we assume $beta^*$ consists of entries, which are either rational numbers with a common denominator $Qinmathbb{Z}^+$ (referred to as $Q$-rationality); or irrational numbers supported on a rationally independent set of bounded cardinality, known to learner; collectively called as the mixed-support assumption. Using a novel combination of the PSLQ integer relation detection, and LLL lattice basis reduction algorithms, we propose a polynomial-time algorithm which provably recovers a $beta^*inmathbb{R}^p$ enjoying the mixed-support assumption, from its linear measurements $Y=Xbeta^*inmathbb{R}^n$ for a large class of distributions for the random entries of $X$, even with one measurement $(n=1)$. In the noisy setting, we propose a polynomial-time, lattice-based algorithm, which recovers a $beta^*inmathbb{R}^p$ enjoying $Q$-rationality, from its noisy measurements $Y=Xbeta^*+Winmathbb{R}^n$, even with a single sample $(n=1)$. We further establish for large $Q$, and normal noise, this algorithm tolerates information-theoretically optimal level of noise. We then apply these ideas to develop a polynomial-time, single-sample algorithm for the phase retrieval problem. Our methods address the single-sample $(n=1)$ regime, where the sparsity-based methods such as LASSO and Basis Pursuit are known to fail. Furthermore, our results also reveal an algorithmic connection between the high-dimensional linear regression problem, and the integer relation detection, randomized subset-sum, and shortest vector problems.

Statistics Theory Probability Machine Learning

Statistical inference for the slope parameter in functional linear regression

128 - Tim Kutta , Gauthier Dierickx , Holger Dette 2021

In this paper we consider the linear regression model $Y =S X+varepsilon $ with functional regressors and responses. We develop new inference tools to quantify deviations of the true slope $S$ from a hypothesized operator $S_0$ with respect to the Hilbert--Schmidt norm $| S- S_0|^2$, as well as the prediction error $mathbb{E} | S X - S_0 X |^2$. Our analysis is applicable to functional time series and based on asymptotically pivotal statistics. This makes it particularly user friendly, because it avoids the choice of tuning parameters inherent in long-run variance estimation or bootstrap of dependent data. We also discuss two sample problems as well as change point detection. Finite sample properties are investigated by means of a simulation study. Mathematically our approach is based on a sequential version of the popular spectral cut-off estimator $hat S_N$ for $S$. It is well-known that the $L^2$-minimax rates in the functional regression model, both in estimation and prediction, are substantially slower than $1/sqrt{N}$ (where $N$ denotes the sample size) and that standard estimators for $S$ do not converge weakly to non-degenerate limits. However, we demonstrate that simple plug-in estimators - such as $| hat S_N - S_0 |^2$ for $| S - S_0 |^2$ - are $sqrt{N}$-consistent and its sequenti

Statistics Theory Statistics Theory

Optimal Semi-supervised Estimation and Inference for High-dimensional Linear Regression

85 - Siyi Deng , Yang Ning , Jiwei Zhao 2020

There are many scenarios such as the electronic health records where the outcome is much more difficult to collect than the covariates. In this paper, we consider the linear regression problem with such a data structure under the high dimensionality. Our goal is to investigate when and how the unlabeled data can be exploited to improve the estimation and inference of the regression parameters in linear models, especially in light of the fact that such linear models may be misspecified in data analysis. In particular, we address the following two important questions. (1) Can we use the labeled data as well as the unlabeled data to construct a semi-supervised estimator such that its convergence rate is faster than the supervised estimators? (2) Can we construct confidence intervals or hypothesis tests that are guaranteed to be more efficient or powerful than the supervised estimators? To address the first question, we establish the minimax lower bound for parameter estimation in the semi-supervised setting. We show that the upper bound from the supervised estimators that only use the labeled data cannot attain this lower bound. We close this gap by proposing a new semi-supervised estimator which attains the lower bound. To address the second question, based on our proposed semi-supervised estimator, we propose two additional estimators for semi-supervised inference, the efficient estimator and the safe estimator. The former is fully efficient if the unknown conditional mean function is estimated consistently, but may not be more efficient than the supervised approach otherwise. The latter usually does not aim to provide fully efficient inference, but is guaranteed to be no worse than the supervised approach, no matter whether the linear model is correctly specified or the conditional mean function is consistently estimated.

Methodology Statistics Theory Machine Learning

Sparse Group Lasso: Optimal Sample Complexity, Convergence Rate, and Statistical Inference

77 - T. Tony Cai , Anru Zhang , Yuchen Zhou 2019

In this paper, we study sparse group Lasso for high-dimensional double sparse linear regression, where the parameter of interest is simultaneously element-wise and group-wise sparse. This problem is an important instance of the simultaneously structured model -- an actively studied topic in statistics and machine learning. In the noiseless case, we provide matching upper and lower bounds on sample complexity for the exact recovery of sparse vectors and for stable estimation of approximately sparse vectors, respectively. In the noisy case, we develop upper and matching minimax lower bounds for estimation error. We also consider the debiased sparse group Lasso and investigate its asymptotic property for the purpose of statistical inference. Finally, numerical studies are provided to support the theoretical results.

Statistics Theory Machine Learning Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Near-optimal inference in adaptive linear regression

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions