No Arabic abstract
This paper is about the ability and means to root-n consistently and efficiently estimate linear, mean square continuous functionals of a high dimensional, approximately sparse regression. Such objects include a wide variety of interesting parameters such as the covariance between two regression residuals, a coefficient of a partially linear model, an average derivative, and the average treatment effect. We give lower bounds on the convergence rate of estimators of such objects and find that these bounds are substantially larger than in a low dimensional, semiparametric setting. We also give automatic debiased machine learners that are $1/sqrt{n}$ consistent and asymptotically efficient under minimal conditions. These estimators use no cross-fitting or a special kind of cross-fitting to attain efficiency with faster than $n^{-1/4}$ convergence of the regression. This rate condition is substantially weaker than the product of convergence rates of two functions being faster than $1/sqrt{n},$ as required for many other debiased machine learners.
We analyze the combination of multiple predictive distributions for time series data when all forecasts are misspecified. We show that a specific dynamic form of Bayesian predictive synthesis -- a general and coherent Bayesian framework for ensemble methods -- produces exact minimax predictive densities with regard to Kullback-Leibler loss, providing theoretical support for finite sample predictive performance over existing ensemble methods. A simulation study that highlights this theoretical result is presented, showing that dynamic Bayesian predictive synthesis is superior to other ensemble methods using multiple metrics.
Many popular methods for building confidence intervals on causal effects under high-dimensional confounding require strong ultra-sparsity assumptions that may be difficult to validate in practice. To alleviate this difficulty, we here study a new method for average treatment effect estimation that yields asymptotically exact confidence intervals assuming that either the conditional response surface or the conditional probability of treatment allows for an ultra-sparse representation (but not necessarily both). This guarantee allows us to provide valid inference for average treatment effect in high dimensions under considerably more generality than available baselines. In addition, we showcase that our results are semi-parametrically efficient.
We consider the problem of recovering clustered sparse signals with no prior knowledge of the sparsity pattern. Beyond simple sparsity, signals of interest often exhibits an underlying sparsity pattern which, if leveraged, can improve the reconstruction performance. However, the sparsity pattern is usually unknown a priori. Inspired by the idea of k-nearest neighbor (k-NN) algorithm, we propose an efficient algorithm termed approximate message passing with nearest neighbor sparsity pattern learning (AMP-NNSPL), which learns the sparsity pattern adaptively. AMP-NNSPL specifies a flexible spike and slab prior on the unknown signal and, after each AMP iteration, sets the sparse ratios as the average of the nearest neighbor estimates via expectation maximization (EM). Experimental results on both synthetic and real data demonstrate the superiority of our proposed algorithm both in terms of reconstruction performance and computational complexity.
Robust methods, though ubiquitous in practice, are yet to be fully understood in the context of regularized estimation and high dimensions. Even simple questions become challenging very quickly. For example, classical statistical theory identifies equivalence between model-averaged and composite quantile estimation. However, little to nothing is known about such equivalence between methods that encourage sparsity. This paper provides a toolbox to further study robustness in these settings and focuses on prediction. In particular, we study optimally weighted model-averaged as well as composite $l_1$-regularized estimation. Optimal weights are determined by minimizing the asymptotic mean squared error. This approach incorporates the effects of regularization, without the assumption of perfect selection, as is often used in practice. Such weights are then optimal for prediction quality. Through an extensive simulation study, we show that no single method systematically outperforms others. We find, however, that model-averaged and composite quantile estimators often outperform least-squares methods, even in the case of Gaussian model noise. Real data application witnesses the methods practical use through the reconstruction of compressed audio signals.
We study semiparametric efficiency bounds and efficient estimation of parameters defined through general moment restrictions with missing data. Identification relies on auxiliary data containing information about the distribution of the missing variables conditional on proxy variables that are observed in both the primary and the auxiliary database, when such distribution is common to the two data sets. The auxiliary sample can be independent of the primary sample, or can be a subset of it. For both cases, we derive bounds when the probability of missing data given the proxy variables is unknown, or known, or belongs to a correctly specified parametric family. We find that the conditional probability is not ancillary when the two samples are independent. For all cases, we discuss efficient semiparametric estimators. An estimator based on a conditional expectation projection is shown to require milder regularity conditions than one based on inverse probability weighting.