ترغب بنشر مسار تعليمي؟ اضغط هنا

Beta and Kumaraswamy distributions as non-nested hypotheses in the modeling of continuous bounded data

88   0   0.0 ( 0 )
 نشر من قبل Wagner Barreto-Souza
 تاريخ النشر 2014
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Nowadays, beta and Kumaraswamy distributions are the most popular models to fit continuous bounded data. These models present some characteristics in common and to select one of them in a practical situation can be of great interest. With this in mind, in this paper we propose a method of selection between the beta and Kumaraswamy distributions. We use the logarithm of the likelihood ratio statistic (denoted by $T_n$, where $n$ is the sample size) and obtain its asymptotic distribution under the hypotheses $H_{mathcal B}$ and $H_{mathcal K}$, where $H_{mathcal B}$ ($H_{mathcal K}$) denotes that the data come from the beta (Kumaraswamy) distribution. Since both models has the same number of parameters, based on the Akaike criterion, we choose the model that has the greater log-likelihood value. We here propose to use the probability of correct selection (given by $P(T_n>0)$ or $P(T_n<0)$ depending on the null hypothesis) instead of only to observe the maximized log-likelihood values. We obtain an approximation for the probability of correct selection under the hypotheses $H_{mathcal B}$ and $H_{mathcal K}$ and select the model that maximizes it. A simulation study is presented in order to evaluate the accuracy of the approximated probabilities of correct selection. We illustrate our method of selection in two applications to real data sets involving proportions.

قيم البحث

اقرأ أيضاً

Randomization (a.k.a. permutation) inference is typically interpreted as testing Fishers sharp null hypothesis that all effects are exactly zero. This hypothesis is often criticized as uninteresting and implausible. We show, however, that many random ization tests are also valid for a bounded null hypothesis under which effects are all negative (or positive) for all units but otherwise heterogeneous. The bounded null is closely related to important concepts such as monotonicity and Pareto efficiency. Inverting tests of this hypothesis yields confidence intervals for the maximum (or minimum) individual treatment effect. We then extend randomization tests to infer other quantiles of individual effects, which can be used to infer the proportion of units with effects larger (or smaller) than any threshold. The proposed confidence intervals for all quantiles of individual effects are simultaneously valid, in the sense that no correction due to multiple analyses is needed. In sum, we provide a broader justification for Fisher randomization tests, and develop exact nonparametric inference for quantiles of heterogeneous individual effects. We illustrate our methods with simulations and applications, where we find that Stephenson rank statistics often provide the most informative results.
148 - Zhichao Jiang , Peng Ding 2016
Models based on multivariate t distributions are widely applied to analyze data with heavy tails. However, all the marginal distributions of the multivariate t distributions are restricted to have the same degrees of freedom, making these models unab le to describe different marginal heavy-tailedness. We generalize the traditional multivariate t distributions to non-elliptically contoured multivariate t distributions, allowing for different marginal degrees of freedom. We apply the non-elliptically contoured multivariate t distributions to three widely-used models: the Heckman selection model with different degrees of freedom for selection and outcome equations, the multivariate Robit model with different degrees of freedom for marginal responses, and the linear mixed-effects model with different degrees of freedom for random effects and within-subject errors. Based on the Normal mixture representation of our t distribution, we propose efficient Bayesian inferential procedures for the model parameters based on data augmentation and parameter expansion. We show via simulation studies and real examples that the conclusions are sensitive to the existence of different marginal heavy-tailedness.
Random forests have become an established tool for classification and regression, in particular in high-dimensional settings and in the presence of complex predictor-response relationships. For bounded outcome variables restricted to the unit interva l, however, classical random forest approaches may severely suffer as they do not account for the heteroscedasticity in the data. A random forest approach is proposed for relating beta distributed outcomes to explanatory variables. The approach explicitly makes use of the likelihood function of the beta distribution for the selection of splits during the tree-building procedure. In each iteration of the tree-building algorithm one chooses the combination of explanatory variable and splitting rule that maximizes the log-likelihood function of the beta distribution with the parameter estimates derived from the nodes of the currently built tree. Several simulation studies demonstrate the properties of the method and compare its performance to classical random forest approaches as well as to parametric regression models.
Statistical models of real world data typically involve continuous probability distributions such as normal, Laplace, or exponential distributions. Such distributions are supported by many probabilistic modelling formalisms, including probabilistic d atabase systems. Yet, the traditional theoretical framework of probabilistic databases focusses entirely on finite probabilistic databases. Only recently, we set out to develop the mathematical theory of infinite probabilistic databases. The present paper is an exposition of two recent papers which are cornerstones of this theory. In (Grohe, Lindner; ICDT 2020) we propose a very general framework for probabilistic databases, possibly involving continuous probability distributions, and show that queries have a well-defined semantics in this framework. In (Grohe, Kaminski, Katoen, Lindner; PODS 2020) we extend the declarative probabilistic programming language Generative Datalog, proposed by (Barany et al.~2017) for discrete probability distributions, to continuous probability distributions and show that such programs yield generative models of continuous probabilistic databases.
91 - Z. Bai , D. Jiang , J. Yao 2012
For a multivariate linear model, Wilks likelihood ratio test (LRT) constitutes one of the cornerstone tools. However, the computation of its quantiles under the null or the alternative requires complex analytic approximations and more importantly, th ese distributional approximations are feasible only for moderate dimension of the dependent variable, say $ple 20$. On the other hand, assuming that the data dimension $p$ as well as the number $q$ of regression variables are fixed while the sample size $n$ grows, several asymptotic approximations are proposed in the literature for Wilks $bLa$ including the widely used chi-square approximation. In this paper, we consider necessary modifications to Wilks test in a high-dimensional context, specifically assuming a high data dimension $p$ and a large sample size $n$. Based on recent random matrix theory, the correction we propose to Wilks test is asymptotically Gaussian under the null and simulations demonstrate that the corrected LRT has very satisfactory size and power, surely in the large $p$ and large $n$ context, but also for moderately large data dimensions like $p=30$ or $p=50$. As a byproduct, we give a reason explaining why the standard chi-square approximation fails for high-dimensional data. We also introduce a new procedure for the classical multiple sample significance test in MANOVA which is valid for high-dimensional data.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا