No Arabic abstract
Two-stage least squares (TSLS) estimators and variants thereof are widely used to infer the effect of an exposure on an outcome using instrumental variables (IVs). They belong to a wider class of two-stage IV estimators, which are based on fitting a conditional mean model for the exposure, and then using the fitted exposure values along with the covariates as predictors in a linear model for the outcome. We show that standard TSLS estimators enjoy greater robustness to model misspecification than more general two-stage estimators. However, by potentially using a wrong exposure model, e.g. when the exposure is binary, they tend to be inefficient. In view of this, we study double-robust G-estimators instead. These use working models for the exposure, IV and outcome but only require correct specification of either the IV model or the outcome model to guarantee consistent estimation of the exposure effect. As the finite sample performance of the locally efficient G-estimator can be poor, we further develop G-estimation procedures with improved efficiency and robustness properties under misspecification of some or all working models. Simulation studies and a data analysis demonstrate drastic improvements, with remarkably good performance even when one or more working models are misspecified.
We present a general framework for using existing data to estimate the efficiency gain from using a covariate-adjusted estimator of a marginal treatment effect in a future randomized trial. We describe conditions under which it is possible to define a mapping from the distribution that generated the existing external data to the relative efficiency of a covariate-adjusted estimator compared to an unadjusted estimator. Under conditions, these relative efficiencies approximate the ratio of sample size needed to achieve a desired power. We consider two situations where the outcome is either fully or partially observed and several treatment effect estimands that are of particular interest in most trials. For each such estimand, we develop a semiparametrically efficient estimator of the relative efficiency that allows for the application of flexible statistical learning tools to estimate the nuisance functions and an analytic form of a corresponding Wald-type confidence interval. We also propose a double bootstrap scheme for constructing confidence intervals. We demonstrate the performance of the proposed methods through simulation studies and apply these methods to data to estimate the relative efficiency of using covariate adjustment in Covid-19 therapeutic trials.
Differences between biological networks corresponding to disease conditions can help delineate the underlying disease mechanisms. Existing methods for differential network analysis do not account for dependence of networks on covariates. As a result, these approaches may detect spurious differential connections induced by the effect of the covariates on both the disease condition and the network. To address this issue, we propose a general covariate-adjusted test for differential network analysis. Our method assesses differential network connectivity by testing the null hypothesis that the network is the same for individuals who have identical covariates and only differ in disease status. We show empirically in a simulation study that the covariate-adjusted test exhibits improved type-I error control compared with naive hypothesis testing procedures that do not account for covariates. We additionally show that there are settings in which our proposed methodology provides improved power to detect differential connections. We illustrate our method by applying it to detect differences in breast cancer gene co-expression networks by subtype.
This paper considers the instrumental variable quantile regression model (Chernozhukov and Hansen, 2005, 2013) with a binary endogenous treatment. It offers two identification results when the treatment status is not directly observed. The first result is that, remarkably, the reduced-form quantile regression of the outcome variable on the instrumental variable provides a lower bound on the structural quantile treatment effect under the stochastic monotonicity condition (Small and Tan, 2007; DiNardo and Lee, 2011). This result is relevant, not only when the treatment variable is subject to misclassification, but also when any measurement of the treatment variable is not available. The second result is for the structural quantile function when the treatment status is measured with error; I obtain the sharp identified set by deriving moment conditions under widely-used assumptions on the measurement error. Furthermore, I propose an inference method in the presence of other covariates.
Regularization methods allow one to handle a variety of inferential problems where there are more covariates than cases. This allows one to consider a potentially enormous number of covariates for a problem. We exploit the power of these techniques, supersaturating models by augmenting the natural covariates in the problem with an additional indicator for each case in the data set. We attach a penalty term for these case-specific indicators which is designed to produce a desired effect. For regression methods with squared error loss, an $ell_1$ penalty produces a regression which is robust to outliers and high leverage cases; for quantile regression methods, an $ell_2$ penalty decreases the variance of the fit enough to overcome an increase in bias. The paradigm thus allows us to robustify procedures which lack robustness and to increase the efficiency of procedures which are robust. We provide a general framework for the inclusion of case-specific parameters in regularization problems, describing the impact on the effective loss for a variety of regression and classification problems. We outline a computational strategy by which existing software can be modified to solve the augmented regularization problem, providing conditions under which such modification will converge to the optimum solution. We illustrate the benefits of including case-specific parameters in the context of mean regression and quantile regression through analysis of NHANES and linguistic data sets.
Instrumental variable methods can identify causal effects even when the treatment and outcome are confounded. We study the problem of imperfect measurements of the binary instrumental variable, treatment or outcome. We first consider non-differential measurement errors, that is, the mis-measured variable does not depend on other variables given its true value. We show that the measurement error of the instrumental variable does not bias the estimate, the measurement error of the treatment biases the estimate away from zero, and the measurement error of the outcome biases the estimate toward zero. Moreover, we derive sharp bounds on the causal effects without additional assumptions. These bounds are informative because they exclude zero. We then consider differential measurement errors, and focus on sensitivity analyses in those settings.