ترغب بنشر مسار تعليمي؟ اضغط هنا

Box-Cox symmetric distributions and applications to nutritional data

67   0   0.0 ( 0 )
 نشر من قبل Giovana Fumes
 تاريخ النشر 2016
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

We introduce the Box-Cox symmetric class of distributions, which is useful for modeling positively skewed, possibly heavy-tailed, data. The new class of distributions includes the Box-Cox t, Box-Cox Cole-Gree, Box-Cox power exponential distributions, and the class of the log-symmetric distributions as special cases. It provides easy parameter interpretation, which makes it convenient for regression modeling purposes. Additionally, it provides enough flexibility to handle outliers. The usefulness of the Box-Cox symmetric models is illustrated in applications to nutritional data.



قيم البحث

اقرأ أيضاً

We propose and study the class of Box-Cox elliptical distributions. It provides alternative distributions for modeling multivariate positive, marginally skewed and possibly heavy-tailed data. This new class of distributions has as a special case the class of log-elliptical distributions, and reduces to the Box-Cox symmetric class of distributions in the univariate setting. The parameters are interpretable in terms of quantiles and relative dispersions of the marginal distributions and of associations between pairs of variables. The relation between the scale parameters and quantiles makes the Box-Cox elliptical distributions attractive for regression modeling purposes. Applications to data on vitamin intake are presented and discussed.
Tractable generalizations of the Gaussian distribution play an important role for the analysis of high-dimensional data. One very general super-class of Normal distributions is the class of $ u$-spherical distributions whose random variables can be r epresented as the product $x = rcdot u$ of a uniformly distribution random variable $u$ on the $1$-level set of a positively homogeneous function $ u$ and arbitrary positive radial random variable $r$. Prominent subclasses of $ u$-spherical distributions are spherically symmetric distributions ($ u(x)=|x|_2$) which have been further generalized to the class of $L_p$-spherically symmetric distributions ($ u(x)=|x|_p$). Both of these classes contain the Gaussian as a special case. In general, however, $ u$-spherical distributions are computationally intractable since, for instance, the normalization constant or fast sampling algorithms are unknown for an arbitrary $ u$. In this paper we introduce a new subclass of $ u$-spherical distributions by choosing $ u$ to be a nested cascade of $L_p$-norms. This class is still computationally tractable, but includes all the aforementioned subclasses as a special case. We derive a general expression for $L_p$-nested symmetric distributions as well as the uniform distribution on the $L_p$-nested unit sphere, including an explicit expression for the normalization constant. We state several general properties of $L_p$-nested symmetric distributions, investigate its marginals, maximum likelihood fitting and discuss its tight links to well known machine learning methods such as Independent Component Analysis (ICA), Independent Subspace Analysis (ISA) and mixed norm regularizers. Finally, we derive a fast and exact sampling algorithm for arbitrary $L_p$-nested symmetric distributions, and introduce the Nested Radial Factorization algorithm (NRF), which is a form of non-linear ICA.
115 - Wenpo Yao , Wenli Yao 2021
We compare the two basic ordinal patterns, i.e., the original and amplitude permutations, used to characterize vector structures. The original permutation consists of the indexes of reorganized values in the original vector. By contrast, the amplitud e permutation comprises the positions of values in the reordered vector, and it directly reflects the temporal structure. To accurately convey the structural characteristics of vectors, we modify indexes of equal values in permutations to be the same as, for example, the smallest or largest indexes in each group of equalities. Overall, we clarify the relationship between the original and amplitude permutations. And the results have implications for time- and amplitude-symmetric vectors and will lead to further theoretical and experimental studies.
Survival analysis is a challenging variation of regression modeling because of the presence of censoring, where the outcome measurement is only partially known, due to, for example, loss to follow up. Such problems come up frequently in medical appli cations, making survival analysis a key endeavor in biostatistics and machine learning for healthcare, with Cox regression models being amongst the most commonly employed models. We describe a new approach for survival analysis regression models, based on learning mixtures of Cox regressions to model individual survival distributions. We propose an approximation to the Expectation Maximization algorithm for this model that does hard assignments to mixture groups to make optimization efficient. In each group assignment, we fit the hazard ratios within each group using deep neural networks, and the baseline hazard for each mixture component non-parametrically. We perform experiments on multiple real world datasets, and look at the mortality rates of patients across ethnicity and gender. We emphasize the importance of calibration in healthcare settings and demonstrate that our approach outperforms classical and modern survival analysis baselines, both in terms of discriminative performance and calibration, with large gains in performance on the minority demographics.
Over the last two decades, many exciting variable selection methods have been developed for finding a small group of covariates that are associated with the response from a large pool. Can the discoveries from these data mining approaches be spurious due to high dimensionality and limited sample size? Can our fundamental assumptions about the exogeneity of the covariates needed for such variable selection be validated with the data? To answer these questions, we need to derive the distributions of the maximum spurious correlations given a certain number of predictors, namely, the distribution of the correlation of a response variable $Y$ with the best $s$ linear combinations of $p$ covariates $mathbf{X}$, even when $mathbf{X}$ and $Y$ are independent. When the covariance matrix of $mathbf{X}$ possesses the restricted eigenvalue property, we derive such distributions for both a finite $s$ and a diverging $s$, using Gaussian approximation and empirical process techniques. However, such a distribution depends on the unknown covariance matrix of $mathbf{X}$. Hence, we use the multiplier bootstrap procedure to approximate the unknown distributions and establish the consistency of such a simple bootstrap approach. The results are further extended to the situation where the residuals are from regularized fits. Our approach is then used to construct the upper confidence limit for the maximum spurious correlation and to test the exogeneity of the covariates. The former provides a baseline for guarding against false discoveries and the latter tests whether our fundamental assumptions for high-dimensional model selection are statistically valid. Our techniques and results are illustrated with both numerical examples and real data analysis.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا