ترغب بنشر مسار تعليمي؟ اضغط هنا

A Random Forest Approach for Modeling Bounded Outcomes

71   0   0.0 ( 0 )
 نشر من قبل Leonie Weinhold
 تاريخ النشر 2019
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Random forests have become an established tool for classification and regression, in particular in high-dimensional settings and in the presence of complex predictor-response relationships. For bounded outcome variables restricted to the unit interval, however, classical random forest approaches may severely suffer as they do not account for the heteroscedasticity in the data. A random forest approach is proposed for relating beta distributed outcomes to explanatory variables. The approach explicitly makes use of the likelihood function of the beta distribution for the selection of splits during the tree-building procedure. In each iteration of the tree-building algorithm one chooses the combination of explanatory variable and splitting rule that maximizes the log-likelihood function of the beta distribution with the parameter estimates derived from the nodes of the currently built tree. Several simulation studies demonstrate the properties of the method and compare its performance to classical random forest approaches as well as to parametric regression models.



قيم البحث

اقرأ أيضاً

This paper extends Bayesian mortality projection models for multiple populations considering the stochastic structure and the effect of spatial autocorrelation among the observations. We explain high levels of overdispersion according to adjacent loc ations based on the conditional autoregressive model. In an empirical study, we compare different hierarchical projection models for the analysis of geographical diversity in mortality between the Japanese counties in multiple years, according to age. By a Markov chain Monte Carlo (MCMC) computation, results have demonstrated the flexibility and predictive performance of our proposed model.
In electronic health records (EHRs), latent subgroups of patients may exhibit distinctive patterning in their longitudinal health trajectories. For such data, growth mixture models (GMMs) enable classifying patients into different latent classes base d on individual trajectories and hypothesized risk factors. However, the application of GMMs is hindered by the special missing data problem in EHRs, which manifests two patient-led missing data processes: the visit process and the response process for an EHR variable conditional on a patient visiting the clinic. If either process is associated with the process generating the longitudinal outcomes, then valid inferences require accounting for a nonignorable missing data mechanism. We propose a Bayesian shared parameter model that links GMMs of multiple longitudinal health outcomes, the visit process, and the response process of each outcome given a visit using a discrete latent class variable. Our focus is on multiple longitudinal health outcomes for which there can be a clinically prescribed visit schedule. We demonstrate our model in EHR measurements on early childhood weight and height z-scores. Using data simulations, we illustrate the statistical properties of our method with respect to subgroup-specific or marginal inferences. We built the R package EHRMiss for model fitting, selection, and checking.
We develop Bayesian models for density regression with emphasis on discrete outcomes. The problem of density regression is approached by considering methods for multivariate density estimation of mixed scale variables, and obtaining conditional densi ties from the multivariate ones. The approach to multivariate mixed scale outcome density estimation that we describe represents discrete variables, either responses or covariates, as discretis
Practical problems with missing data are common, and statistical methods have been developed concerning the validity and/or efficiency of statistical procedures. On a central focus, there have been longstanding interests on the mechanism governing da ta missingness, and correctly deciding the appropriate mechanism is crucially relevant for conducting proper practical investigations. The conventional notions include the three common potential classes -- missing completely at random, missing at random, and missing not at random. In this paper, we present a new hypothesis testing approach for deciding between missing at random and missing not at random. Since the potential alternatives of missing at random are broad, we focus our investigation on a general class of models with instrumental variables for data missing not at random. Our setting is broadly applicable, thanks to that the model concerning the missing data is nonparametric, requiring no explicit model specification for the data missingness. The foundational idea is to develop appropriate discrepancy measures between estimators whose properties significantly differ only when missing at random does not hold. We show that our new hypothesis testing approach achieves an objective data oriented choice between missing at random or not. We demonstrate the feasibility, validity, and efficacy of the new test by theoretical analysis, simulation studies, and a real data analysis.
In relational event networks, the tendency for actors to interact with each other depends greatly on the past interactions between the actors in a social network. Both the quantity of past interactions and the time that elapsed since the past interac tions occurred affect the actors decision-making to interact with other actors in the network. Recently occurred events generally have a stronger influence on current interaction behavior than past events that occurred a long time ago--a phenomenon known as memory decay. Previous studies either predefined a short-run and long-run memory or fixed a parametric exponential memory using a predefined half-life period. In real-life relational event networks however it is generally unknown how the memory of actors about the past events fades as time goes by. For this reason it is not recommendable to fix this in an ad hoc manner, but instead we should learn the shape of memory decay from the observed data. In this paper, a novel semi-parametric approach based on Bayesian Model Averaging is proposed for learning the shape of the memory decay without requiring any parametric assumptions. The method is applied to relational event history data among socio-political actors in India.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا