ترغب بنشر مسار تعليمي؟ اضغط هنا

We study the problem of estimating a functional or a parameter in the context where outcome is subject to nonignorable missingness. We completely avoid modeling the regression relation, while allowing the propensity to be modeled by a semiparametric logistic relation where the dependence on covariates is unspecified. We discover a surprising phenomenon in that the estimation of the parameter in the propensity model as well as the functional estimation can be carried out without assessing the missingness dependence on covariates. This allows us to propose a general class of estimators for both model parameter estimation and functional estimation, including estimating the outcome mean. The robustness of the estimators are nonstandard and are established rigorously through theoretical derivations, and are supported by simulations and a data application.
This work is motivated by learning the individualized minimal clinically important difference, a vital concept to assess clinical importance in various biomedical studies. We formulate the scientific question into a high-dimensional statistical probl em where the parameter of interest lies in an individualized linear threshold. The goal of this paper is to develop a hypothesis testing procedure for the significance of a single element in this high-dimensional parameter as well as for the significance of a linear combination of this parameter. The difficulty dues to the high-dimensionality of the nuisance component in developing such a testing procedure, and also stems from the fact that this high-dimensional threshold model is nonregular and the limiting distribution of the corresponding estimator is nonstandard. To deal with these challenges, we construct a test statistic via a new bias corrected smoothed decorrelated score approach, and establish its asymptotic distributions under both the null and local alternative hypotheses. In addition, we propose a double-smoothing approach to select the optimal bandwidth parameter in our test statistic and provide theoretical guarantees for the selected bandwidth. We conduct comprehensive simulation studies to demonstrate how our proposed procedure can be applied in empirical studies. Finally, we apply the proposed method to a clinical trial where the scientific goal is to assess the clinical importance of a surgery procedure.
There are many scenarios such as the electronic health records where the outcome is much more difficult to collect than the covariates. In this paper, we consider the linear regression problem with such a data structure under the high dimensionality. Our goal is to investigate when and how the unlabeled data can be exploited to improve the estimation and inference of the regression parameters in linear models, especially in light of the fact that such linear models may be misspecified in data analysis. In particular, we address the following two important questions. (1) Can we use the labeled data as well as the unlabeled data to construct a semi-supervised estimator such that its convergence rate is faster than the supervised estimators? (2) Can we construct confidence intervals or hypothesis tests that are guaranteed to be more efficient or powerful than the supervised estimators? To address the first question, we establish the minimax lower bound for parameter estimation in the semi-supervised setting. We show that the upper bound from the supervised estimators that only use the labeled data cannot attain this lower bound. We close this gap by proposing a new semi-supervised estimator which attains the lower bound. To address the second question, based on our proposed semi-supervised estimator, we propose two additional estimators for semi-supervised inference, the efficient estimator and the safe estimator. The former is fully efficient if the unknown conditional mean function is estimated consistently, but may not be more efficient than the supervised approach otherwise. The latter usually does not aim to provide fully efficient inference, but is guaranteed to be no worse than the supervised approach, no matter whether the linear model is correctly specified or the conditional mean function is consistently estimated.
71 - Jiwei Zhao , Yanyuan Ma 2019
We consider the estimation problem in a regression setting where the outcome variable is subject to nonignorable missingness and identifiability is ensured by the shadow variable approach. We propose a versatile estimation procedure where modeling of missingness mechanism is completely bypassed. We show that our estimator is easy to implement and we derive the asymptotic theory of the proposed estimator. We also investigate some alternative estimators under different scenarios. Comprehensive simulation studies are conducted to demonstrate the finite sample performance of the method. We apply the estimator to a childrens mental health study to illustrate its usefulness.
The regularization approach for variable selection was well developed for a completely observed data set in the past two decades. In the presence of missing values, this approach needs to be tailored to different missing data mechanisms. In this pape r, we focus on a flexible and generally applicable missing data mechanism, which contains both ignorable and nonignorable missing data mechanism assumptions. We show how the regularization approach for variable selection can be adapted to the situation under this missing data mechanism. The computational and theoretical properties for variable selection consistency are established. The proposed method is further illustrated by comprehensive simulation studies and real data analyses, for both low and high dimensional settings.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا