No Arabic abstract
Differences between biological networks corresponding to disease conditions can help delineate the underlying disease mechanisms. Existing methods for differential network analysis do not account for dependence of networks on covariates. As a result, these approaches may detect spurious differential connections induced by the effect of the covariates on both the disease condition and the network. To address this issue, we propose a general covariate-adjusted test for differential network analysis. Our method assesses differential network connectivity by testing the null hypothesis that the network is the same for individuals who have identical covariates and only differ in disease status. We show empirically in a simulation study that the covariate-adjusted test exhibits improved type-I error control compared with naive hypothesis testing procedures that do not account for covariates. We additionally show that there are settings in which our proposed methodology provides improved power to detect differential connections. We illustrate our method by applying it to detect differences in breast cancer gene co-expression networks by subtype.
Covariate-specific treatment effects (CSTEs) represent heterogeneous treatment effects across subpopulations defined by certain selected covariates. In this article, we consider marginal structural models where CSTEs are linearly represented using a set of basis functions of the selected covariates. We develop a new approach in high-dimensional settings to obtain not only doubly robust point estimators of CSTEs, but also model-assisted confidence intervals, which are valid when a propensity score model is correctly specified but an outcome regression model may be misspecified. With a linear outcome model and subpopulations defined by discrete covariates, both point estimators and confidence intervals are doubly robust for CSTEs. In contrast, confidence intervals from existing high-dimensional methods are valid only when both the propensity score and outcome models are correctly specified. We establish asymptotic properties of the proposed point estimators and the associated confidence intervals. We present simulation studies and empirical applications which demonstrate the advantages of the proposed method compared with competing ones.
Two-stage least squares (TSLS) estimators and variants thereof are widely used to infer the effect of an exposure on an outcome using instrumental variables (IVs). They belong to a wider class of two-stage IV estimators, which are based on fitting a conditional mean model for the exposure, and then using the fitted exposure values along with the covariates as predictors in a linear model for the outcome. We show that standard TSLS estimators enjoy greater robustness to model misspecification than more general two-stage estimators. However, by potentially using a wrong exposure model, e.g. when the exposure is binary, they tend to be inefficient. In view of this, we study double-robust G-estimators instead. These use working models for the exposure, IV and outcome but only require correct specification of either the IV model or the outcome model to guarantee consistent estimation of the exposure effect. As the finite sample performance of the locally efficient G-estimator can be poor, we further develop G-estimation procedures with improved efficiency and robustness properties under misspecification of some or all working models. Simulation studies and a data analysis demonstrate drastic improvements, with remarkably good performance even when one or more working models are misspecified.
In this paper, we propose a propensity score adapted variable selection procedure to select covariates for inclusion in propensity score models, in order to eliminate confounding bias and improve statistical efficiency in observational studies. Our variable selection approach is specially designed for causal inference, it only requires the propensity scores to be $sqrt{n}$-consistently estimated through a parametric model and need not correct specification of potential outcome models. By using estimated propensity scores as inverse probability treatment weights in performing an adaptive lasso on the outcome, it successfully excludes instrumental variables, and includes confounders and outcome predictors. We show its oracle properties under the linear association conditions. We also perform some numerical simulations to illustrate our propensity score adapted covariate selection procedure and evaluate its performance under model misspecification. Comparison to other covariate selection methods is made using artificial data as well, through which we find that it is more powerful in excluding instrumental variables and spurious covariates.
Among the most popular variable selection procedures in high-dimensional regression, Lasso provides a solution path to rank the variables and determines a cut-off position on the path to select variables and estimate coefficients. In this paper, we consider variable selection from a new perspective motivated by the frequently occurred phenomenon that relevant variables are not completely distinguishable from noise variables on the solution path. We propose to characterize the positions of the first noise variable and the last relevant variable on the path. We then develop a new variable selection procedure to control over-selection of the noise variables ranking after the last relevant variable, and, at the same time, retain a high proportion of relevant variables ranking before the first noise variable. Our procedure utilizes the recently developed covariance test statistic and Q statistic in post-selection inference. In numerical examples, our method compares favorably with other existing methods in selection accuracy and the ability to interpret its results.
With the availability of high dimensional genetic biomarkers, it is of interest to identify heterogeneous effects of these predictors on patients survival, along with proper statistical inference. Censored quantile regression has emerged as a powerful tool for detecting heterogeneous effects of covariates on survival outcomes. To our knowledge, there is little work available to draw inference on the effects of high dimensional predictors for censored quantile regression. This paper proposes a novel procedure to draw inference on all predictors within the framework of global censored quantile regression, which investigates covariate-response associations over an interval of quantile levels, instead of a few discrete values. The proposed estimator combines a sequence of low dimensional model estimates that are based on multi-sample splittings and variable selection. We show that, under some regularity conditions, the estimator is consistent and asymptotically follows a Gaussian process indexed by the quantile level. Simulation studies indicate that our procedure can properly quantify the uncertainty of the estimates in high dimensional settings. We apply our method to analyze the heterogeneous effects of SNPs residing in lung cancer pathways on patients survival, using the Boston Lung Cancer Survival Cohort, a cancer epidemiology study on the molecular mechanism of lung cancer.