No Arabic abstract
In medical research, continuous markers are widely employed in diagnostic tests to distinguish diseased and non-diseased subjects. The accuracy of such diagnostic tests is commonly assessed using the receiver operating characteristic (ROC) curve. To summarize an ROC curve and determine its optimal cut-point, the Youden index is popularly used. In literature, estimation of the Youden index has been widely studied via various statistical modeling strategies on the conditional density. This paper proposes a new model-free estimation method, which directly estimates the covariate-adjusted cut-point without estimating the conditional density. Consequently, covariate-adjusted Youden index can be estimated based on the estimated cutpoint. The proposed method formulates the estimation problem in a large margin classification framework, which allows flexible modeling of the covariate-adjusted Youden index through kernel machines. The advantage of the proposed method is demonstrated in a variety of simulated experiments as well as a real application to Pima Indians diabetes study.
Datasets from field experiments with covariate-adaptive randomizations (CARs) usually contain extra baseline covariates in addition to the strata indicators. We propose to incorporate these extra covariates via auxiliary regressions in the estimation and inference of unconditional QTEs under CARs. We establish the consistency, limiting distribution, and validity of the multiplier bootstrap of the regression-adjusted QTE estimator. The auxiliary regression may be estimated parametrically, nonparametrically, or via regularization when the data are high-dimensional. Even when the auxiliary regression is misspecified, the proposed bootstrap inferential procedure still achieves the nominal rejection probability in the limit under the null. When the auxiliary regression is correctly specified, the regression-adjusted estimator achieves the minimum asymptotic variance. We also derive the optimal pseudo true values for the potentially misspecified parametric model that minimize the asymptotic variance of the corresponding QTE estimator. We demonstrate the finite sample performance of the new estimation and inferential methods using simulations and provide an empirical application to a well-known dataset in education.
The Youden index is a popular summary statistic for receiver operating characteristic curve. It gives the optimal cutoff point of a biomarker to distinguish the diseased and healthy individuals. In this paper, we propose to model the distributions of a biomarker for individuals in the healthy and diseased groups via a semiparametric density ratio model. Based on this model, we use the maximum empirical likelihood method to estimate the Youden index and the optimal cutoff point. We further establish the asymptotic normality of the proposed estimators and construct valid confidence intervals for the Youden index and the corresponding optimal cutoff point. The proposed method automatically covers both cases when there is no lower limit of detection (LLOD) and when there is a fixed and finite LLOD for the biomarker. Extensive simulation studies and a real data example are used to illustrate the effectiveness of the proposed method and its advantages over the existing methods.
Differences between biological networks corresponding to disease conditions can help delineate the underlying disease mechanisms. Existing methods for differential network analysis do not account for dependence of networks on covariates. As a result, these approaches may detect spurious differential connections induced by the effect of the covariates on both the disease condition and the network. To address this issue, we propose a general covariate-adjusted test for differential network analysis. Our method assesses differential network connectivity by testing the null hypothesis that the network is the same for individuals who have identical covariates and only differ in disease status. We show empirically in a simulation study that the covariate-adjusted test exhibits improved type-I error control compared with naive hypothesis testing procedures that do not account for covariates. We additionally show that there are settings in which our proposed methodology provides improved power to detect differential connections. We illustrate our method by applying it to detect differences in breast cancer gene co-expression networks by subtype.
Two-stage least squares (TSLS) estimators and variants thereof are widely used to infer the effect of an exposure on an outcome using instrumental variables (IVs). They belong to a wider class of two-stage IV estimators, which are based on fitting a conditional mean model for the exposure, and then using the fitted exposure values along with the covariates as predictors in a linear model for the outcome. We show that standard TSLS estimators enjoy greater robustness to model misspecification than more general two-stage estimators. However, by potentially using a wrong exposure model, e.g. when the exposure is binary, they tend to be inefficient. In view of this, we study double-robust G-estimators instead. These use working models for the exposure, IV and outcome but only require correct specification of either the IV model or the outcome model to guarantee consistent estimation of the exposure effect. As the finite sample performance of the locally efficient G-estimator can be poor, we further develop G-estimation procedures with improved efficiency and robustness properties under misspecification of some or all working models. Simulation studies and a data analysis demonstrate drastic improvements, with remarkably good performance even when one or more working models are misspecified.
Robust estimation and variable selection procedure are developed for the extended t-process regression model with functional data. Statistical properties such as consistency of estimators and predictions are obtained. Numerical studies show that the proposed method performs well.