ترغب بنشر مسار تعليمي؟ اضغط هنا

136 - Yifan Cui , Min-ge Xie 2021
This paper introduces to readers the new concept and methodology of confidence distribution and the modern-day distributional inference in statistics. This discussion should be of interest to people who would like to go into the depth of the statisti cal inference methodology and to utilize distribution estimators in practice. We also include in the discussion the topic of generalized fiducial inference, a special type of modern distributional inference, and relate it to the concept of confidence distribution. Several real data examples are also provided for practitioners. We hope that the selected content covers the greater part of the developments on this subject.
Random forests are one of the most popular machine learning methods due to their accuracy and variable importance assessment. However, random forests only provide variable importance in a global sense. There is an increasing need for such assessments at a local level, motivated by applications in personalized medicine, policy-making, and bioinformatics. We propose a new nonparametric estimator that pairs the flexible random forest kernel with local sufficient dimension reduction to adapt to a regression functions local structure. This allows us to estimate a meaningful directional local variable importance measure at each prediction point. We develop a computationally efficient fitting procedure and provide sufficient conditions for the recovery of the splitting directions. We demonstrate significant accuracy gains of our proposed estimator over competing methods on simulated and real regression problems. Finally, we apply the proposed method to seasonal particulate matter concentration data collected in Beijing, China, which yields meaningful local importance measures. The methods presented here are available in the drforest Python package.
148 - Yifan Cui , Hongming Pu , Xu Shi 2020
Skepticism about the assumption of no unmeasured confounding, also known as exchangeability, is often warranted in making causal inferences from observational data; because exchangeability hinges on an investigators ability to accurately measure cova riates that capture all potential sources of confounding. In practice, the most one can hope for is that covariate measurements are at best proxies of the true underlying confounding mechanism operating in a given observational study. In this paper, we consider the framework of proximal causal inference introduced by Tchetgen Tchetgen et al. (2020), which while explicitly acknowledging covariate measurements as imperfect proxies of confounding mechanisms, offers an opportunity to learn about causal effects in settings where exchangeability on the basis of measured covariates fails. We make a number of contributions to proximal inference including (i) an alternative set of conditions for nonparametric proximal identification of the average treatment effect; (ii) general semiparametric theory for proximal estimation of the average treatment effect including efficiency bounds for key semiparametric models of interest; (iii) a characterization of proximal doubly robust and locally efficient estimators of the average treatment effect. Moreover, we provide analogous identification and efficiency results for the average treatment effect on the treated. Our approach is illustrated via simulation studies and a data application on evaluating the effectiveness of right heart catheterization in the intensive care unit of critically ill patients.
Unmeasured confounding is a threat to causal inference and individualized decision making. Similar to Cui and Tchetgen Tchetgen (2020); Qiu et al. (2020); Han (2020a), we consider the problem of identification of optimal individualized treatment regi mes with a valid instrumental variable. Han (2020a) provided an alternative identifying condition of optimal treatment regimes using the conditional Wald estimand of Cui and Tchetgen Tchetgen (2020); Qiu et al. (2020) when treatment assignment is subject to endogeneity and a valid binary instrumental variable is available. In this note, we provide a necessary and sufficient condition for identification of optimal treatment regimes using the conditional Wald estimand. Our novel condition is necessarily implied by those of Cui and Tchetgen Tchetgen (2020); Qiu et al. (2020); Han (2020a) and may continue to hold in a variety of potential settings not covered by prior results.
A standard assumption for causal inference from observational data is that one has measured a sufficiently rich set of covariates to ensure that within covariate strata, subjects are exchangeable across observed treatment values. Skepticism about the exchangeability assumption in observational studies is often warranted because it hinges on investigators ability to accurately measure covariates capturing all potential sources of confounding. Realistically, confounding mechanisms can rarely if ever, be learned with certainty from measured covariates. One can therefore only ever hope that covariate measurements are at best proxies of true underlying confounding mechanisms operating in an observational study, thus invalidating causal claims made on basis of standard exchangeability conditions. Causal learning from proxies is a challenging inverse problem which has to date remained unresolved. In this paper, we introduce a formal potential outcome framework for proximal causal learning, which while explicitly acknowledging covariate measurements as imperfect proxies of confounding mechanisms, offers an opportunity to learn about causal effects in settings where exchangeability on the basis of measured covariates fails. Sufficient conditions for nonparametric identification are given, leading to the proximal g-formula and corresponding proximal g-computation algorithm for estimation. These may be viewed as generalizations of Robins foundational g-formula and g-computation algorithm, which account explicitly for bias due to unmeasured confounding. Both point treatment and time-varying treatment settings are considered, and an application of proximal g-computation of causal effects is given for illustration.
93 - Yifan Cui , Jan Hannig 2020
Fiducial inference, as generalized by Hannig et al. (2016), is applied to nonparametric g-modeling (Efron, 2016) in the discrete case. We propose a computationally efficient algorithm to sample from the fiducial distribution, and use generated sample s to construct point estimates and confidence intervals. We study the theoretical properties of the fiducial distribution and perform extensive simulations in various scenarios. The proposed approach gives rise to surprisingly good statistical performance in terms of the mean squared error of point estimators and coverage of confidence intervals. Furthermore, we apply the proposed fiducial method to estimate the probability of each satellite site being malignant using gastric adenocarcinoma data with 844 patients (Efron, 2016).
Robins 1997 introduced marginal structural models (MSMs), a general class of counterfactual models for the joint effects of time-varying treatment regimes in complex longitudinal studies subject to time-varying confounding. In his work, identificatio n of MSM parameters is established under a sequential randomization assumption (SRA), which rules out unmeasured confounding of treatment assignment over time. We consider sufficient conditions for identification of the parameters of a subclass, Marginal Structural Mean Models (MSMMs), when sequential randomization fails to hold due to unmeasured confounding, using instead a time-varying instrumental variable. Our identification conditions require that no unobserved confounder predicts compliance type for the time-varying treatment. We describe a simple weighted estimator and examine its finite-sample properties in a simulation study. We apply the proposed estimator to examine the effect of delivery hospital on neonatal survival probability.
Forest-based methods have recently gained in popularity for non-parametric treatment effect estimation. Building on this line of work, we introduce causal survival forests, which can be used to estimate heterogeneous treatment effects in a survival a nd observational setting where outcomes may be right-censored. Our approach relies on orthogonal estimating equations to robustly adjust for both censoring and selection effects. In our experiments, we find our approach to perform well relative to a number of baselines.
There is a fast-growing literature on estimating optimal treatment regimes based on randomized trials or observational studies under a key identifying condition of no unmeasured confounding. Because confounding by unmeasured factors cannot generally be ruled out with certainty in observational studies or randomized trials subject to noncompliance, we propose a general instrumental variable approach to learning optimal treatment regimes under endogeneity. Specifically, we establish identification of both value function $E[Y_{mathcal{D}(L)}]$ for a given regime $mathcal{D}$ and optimal regimes $text{argmax}_{mathcal{D}} E[Y_{mathcal{D}(L)}]$ with the aid of a binary instrumental variable, when no unmeasured confounding fails to hold. We also construct novel multiply robust classification-based estimators. Furthermore, we propose to identify and estimate optimal treatment regimes among those who would comply to the assigned treatment under a standard monotonicity assumption. In this latter case, we establish the somewhat surprising result that complier optimal regimes can be consistently estimated without directly collecting compliance information and therefore without the complier average treatment effect itself being identified. Our approach is illustrated via extensive simulation studies and a data application on the effect of child rearing on labor participation.
While model selection is a well-studied topic in parametric and nonparametric regression or density estimation, selection of possibly high-dimensional nuisance parameters in semiparametric problems is far less developed. In this paper, we propose a s elective machine learning framework for making inferences about a finite-dimensional functional defined on a semiparametric model, when the latter admits a doubly robust estimating function and several candidate machine learning algorithms are available for estimating the nuisance parameters. We introduce two new selection criteria for bias reduction in estimating the functional of interest, each based on a novel definition of pseudo-risk for the functional that embodies the double robustness property and thus is used to select the pair of learners that is nearest to fulfilling this property. We establish an oracle property for a multi-fold cross-validation version of the new selection criteria which states that our empirical criteria perform nearly as well as an oracle with a priori knowledge of the pseudo-risk for each pair of candidate learners. We also describe a smooth approximation to the selection criteria which allows for valid post-selection inference. Finally, we apply the approach to model selection of a semiparametric estimator of average treatment effect given an ensemble of candidate machine learners to account for confounding in an observational study.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا