ترغب بنشر مسار تعليمي؟ اضغط هنا

A generalization of moderated statistics to data adaptive semiparametric estimation in high-dimensional biology

67   0   0.0 ( 0 )
 نشر من قبل Nima Hejazi
 تاريخ النشر 2017
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

The widespread availability of high-dimensional biological data has made the simultaneous screening of numerous biological characteristics a central statistical problem in computational biology. While the dimensionality of such datasets continues to increase, the problem of teasing out the effects of biomarkers in studies measuring baseline confounders while avoiding model misspecification remains only partially addressed. Efficient estimators constructed from data adaptive estimates of the data-generating distribution provide an avenue for avoiding model misspecification; however, in the context of high-dimensional problems requiring simultaneous estimation of numerous parameters, standard variance estimators have proven unstable, resulting in unreliable Type-I error control under standard multiple testing corrections. We present the formulation of a general approach for applying empirical Bayes shrinkage approaches to asymptotically linear estimators of parameters defined in the nonparametric model. The proposal applies existing shrinkage estimators to the estimated variance of the influence function, allowing for increased inferential stability in high-dimensional settings. A methodology for nonparametric variable importance analysis for use with high-dimensional biological datasets with modest sample sizes is introduced and the proposed technique is demonstrated to be robust in small samples even when relying on data adaptive estimators that eschew parametric forms. Use of the proposed variance moderation strategy in constructing stabilized variable importance measures of biomarkers is demonstrated by application to an observational study of occupational exposure. The result is a data adaptive approach for robustly uncovering stable associations in high-dimensional data with limited sample sizes.



قيم البحث

اقرأ أيضاً

We propose an estimation methodology for a semiparametric quantile factor panel model. We provide tools for inference that are robust to the existence of moments and to the form of weak cross-sectional dependence in the idiosyncratic error term. We apply our method to daily stock return data.
175 - Qi Zheng , Limin Peng , Xuming He 2015
Quantile regression has become a valuable tool to analyze heterogeneous covaraite-response associations that are often encountered in practice. The development of quantile regression methodology for high-dimensional covariates primarily focuses on ex amination of model sparsity at a single or multiple quantile levels, which are typically pre-specified ad hoc by the users. The resulting models may be sensitive to the specific choices of the quantile levels, leading to difficulties in interpretation and erosion of confidence in the results. In this article, we propose a new penalization framework for quantile regression in the high-dimensional setting. We employ adaptive L1 penalties, and more importantly, propose a uniform selector of the tuning parameter for a set of quantile levels to avoid some of the potential problems with model selection at individual quantile levels. Our proposed approach achieves consistent shrinkage of regression quantile estimates across a continuous range of quantiles levels, enhancing the flexibility and robustness of the existing penalized quantile regression methods. Our theoretical results include the oracle rate of uniform convergence and weak convergence of the parameter estimators. We also use numerical studies to confirm our theoretical findings and illustrate the practical utility of our proposal
We develop new semiparametric methods for estimating treatment effects. We focus on a setting where the outcome distributions may be thick tailed, where treatment effects are small, where sample sizes are large and where assignment is completely rand om. This setting is of particular interest in recent experimentation in tech companies. We propose using parametric models for the treatment effects, as opposed to parametric models for the full outcome distributions. This leads to semiparametric models for the outcome distributions. We derive the semiparametric efficiency bound for this setting, and propose efficient estimators. In the case with a constant treatment effect one of the proposed estimators has an interesting interpretation as a weighted average of quantile treatment effects, with the weights proportional to (minus) the second derivative of the log of the density of the potential outcomes. Our analysis also results in an extension of Hubers model and trimmed mean to include asymmetry and a simplified condition on linear combinations of order statistics, which may be of independent interest.
As a competitive alternative to least squares regression, quantile regression is popular in analyzing heterogenous data. For quantile regression model specified for one single quantile level $tau$, major difficulties of semiparametric efficient estim ation are the unavailability of a parametric efficient score and the conditional density estimation. In this paper, with the help of the least favorable submodel technique, we first derive the semiparametric efficient scores for linear quantile regression models that are assumed for a single quantile level, multiple quantile levels and all the quantile levels in $(0,1)$ respectively. Our main discovery is a one-step (nearly) semiparametric efficient estimation for the regression coefficients of the quantile regression models assumed for multiple quantile levels, which has several advantages: it could be regarded as an optimal way to pool information across multiple/other quantiles for efficiency gain; it is computationally feasible and easy to implement, as the initial estimator is easily available; due to the nature of quantile regression models under investigation, the conditional density estimation is straightforward by plugging in an initial estimator. The resulting estimator is proved to achieve the corresponding semiparametric efficiency lower bound under regularity conditions. Numerical studies including simulations and an example of birth weight of children confirms that the proposed estimator leads to higher efficiency compared with the Koenker-Bassett quantile regression estimator for all quantiles of interest.
89 - BaoLuo Sun , Lan Liu , Wang Miao 2016
Missing data occur frequently in empirical studies in health and social sciences, often compromising our ability to make accurate inferences. An outcome is said to be missing not at random (MNAR) if, conditional on the observed variables, the missing data mechanism still depends on the unobserved outcome. In such settings, identification is generally not possible without imposing additional assumptions. Identification is sometimes possible, however, if an instrumental variable (IV) is observed for all subjects which satisfies the exclusion restriction that the IV affects the missingness process without directly influencing the outcome. In this paper, we provide necessary and sufficient conditions for nonparametric identification of the full data distribution under MNAR with the aid of an IV. In addition, we give sufficient identification conditions that are more straightforward to verify in practice. For inference, we focus on estimation of a population outcome mean, for which we develop a suite of semiparametric estimators that extend methods previously developed for data missing at random. Specifically, we propose inverse probability weighted estimation, outcome regression-based estimation and doubly robust estimation of the mean of an outcome subject to MNAR. For illustration, the methods are used to account for selection bias induced by HIV testing refusal in the evaluation of HIV seroprevalence in Mochudi, Botswana, using interviewer characteristics such as gender, age and years of experience as IVs.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا