No Arabic abstract
We study high-dimensional Bayesian linear regression with product priors. Using the nascent theory of non-linear large deviations (Chatterjee and Dembo,2016), we derive sufficient conditions for the leading-order correctness of the naive mean-field approximation to the log-normalizing constant of the posterior distribution. Subsequently, assuming a true linear model for the observed data, we derive a limiting infinite dimensional variational formula for the log normalizing constant of the posterior. Furthermore, we establish that under an additional separation condition, the variational problem has a unique optimizer, and this optimizer governs the probabilistic properties of the posterior distribution. We provide intuitive sufficient conditions for the validity of this separation condition. Finally, we illustrate our results on concrete examples with specific design matrices.
We focus on the high-dimensional linear regression problem, where the algorithmic goal is to efficiently infer an unknown feature vector $beta^*inmathbb{R}^p$ from its linear measurements, using a small number $n$ of samples. Unlike most of the literature, we make no sparsity assumption on $beta^*$, but instead adopt a different regularization: In the noiseless setting, we assume $beta^*$ consists of entries, which are either rational numbers with a common denominator $Qinmathbb{Z}^+$ (referred to as $Q$-rationality); or irrational numbers supported on a rationally independent set of bounded cardinality, known to learner; collectively called as the mixed-support assumption. Using a novel combination of the PSLQ integer relation detection, and LLL lattice basis reduction algorithms, we propose a polynomial-time algorithm which provably recovers a $beta^*inmathbb{R}^p$ enjoying the mixed-support assumption, from its linear measurements $Y=Xbeta^*inmathbb{R}^n$ for a large class of distributions for the random entries of $X$, even with one measurement $(n=1)$. In the noisy setting, we propose a polynomial-time, lattice-based algorithm, which recovers a $beta^*inmathbb{R}^p$ enjoying $Q$-rationality, from its noisy measurements $Y=Xbeta^*+Winmathbb{R}^n$, even with a single sample $(n=1)$. We further establish for large $Q$, and normal noise, this algorithm tolerates information-theoretically optimal level of noise. We then apply these ideas to develop a polynomial-time, single-sample algorithm for the phase retrieval problem. Our methods address the single-sample $(n=1)$ regime, where the sparsity-based methods such as LASSO and Basis Pursuit are known to fail. Furthermore, our results also reveal an algorithmic connection between the high-dimensional linear regression problem, and the integer relation detection, randomized subset-sum, and shortest vector problems.
We study high-dimensional regression with missing entries in the covariates. A common strategy in practice is to emph{impute} the missing entries with an appropriate substitute and then implement a standard statistical procedure acting as if the covariates were fully observed. Recent literature on this subject proposes instead to design a specific, often complicated or non-convex, algorithm tailored to the case of missing covariates. We investigate a simpler approach where we fill-in the missing entries with their conditional mean given the observed covariates. We show that this imputation scheme coupled with standard off-the-shelf procedures such as the LASSO and square-root LASSO retains the minimax estimation rate in the random-design setting where the covariates are i.i.d. sub-Gaussian. We further show that the square-root LASSO remains emph{pivotal} in this setting. It is often the case that the conditional expectation cannot be computed exactly and must be approximated from data. We study two cases where the covariates either follow an autoregressive (AR) process, or are jointly Gaussian with sparse precision matrix. We propose tractable estimators for the conditional expectation and then perform linear regression via LASSO, and show similar estimation rates in both cases. We complement our theoretical results with simulations on synthetic and semi-synthetic examples, illustrating not only the sharpness of our bounds, but also the broader utility of this strategy beyond our theoretical assumptions.
When data is collected in an adaptive manner, even simple methods like ordinary least squares can exhibit non-normal asymptotic behavior. As an undesirable consequence, hypothesis tests and confidence intervals based on asymptotic normality can lead to erroneous results. We propose an online debiasing estimator to correct these distributional anomalies in least squares estimation. Our proposed method takes advantage of the covariance structure present in the dataset and provides sharper estimates in directions for which more information has accrued. We establish an asymptotic normality property for our proposed online debiasing estimator under mild conditions on the data collection process, and provide asymptotically exact confidence intervals. We additionally prove a minimax lower bound for the adaptive linear regression problem, thereby providing a baseline by which to compare estimators. There are various conditions under which our proposed estimator achieves the minimax lower bound up to logarithmic factors. We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
Spike-and-slab priors are popular Bayesian solutions for high-dimensional linear regression problems. Previous theoretical studies on spike-and-slab methods focus on specific prior formulations and use prior-dependent conditions and analyses, and thus can not be generalized directly. In this paper, we propose a class of generic spike-and-slab priors and develop a unified framework to rigorously assess their theoretical properties. Technically, we provide general conditions under which generic spike-and-slab priors can achieve the nearly-optimal posterior contraction rate and the model selection consistency. Our results include those of Narisetty and He (2014) and Castillo et al. (2015) as special cases.
We study a mean-field spike and slab variational Bayes (VB) approximation to Bayesian model selection priors in sparse high-dimensional linear regression. Under compatibility conditions on the design matrix, oracle inequalities are derived for the mean-field VB approximation, implying that it converges to the sparse truth at the optimal rate and gives optimal prediction of the response vector. The empirical performance of our algorithm is studied, showing that it works comparably well as other state-of-the-art Bayesian variable selection methods. We also numerically demonstrate that the widely used coordinate-ascent variational inference (CAVI) algorithm can be highly sensitive to the parameter updating order, leading to potentially poor performance. To mitigate this, we propose a novel prioritized updating scheme that uses a data-driven updating order and performs better in simulations. The variational algorithm is implemented in the R package sparsevb.