ترغب بنشر مسار تعليمي؟ اضغط هنا

On modelling asymmetric data using two-piece sinh-arcsinh distributions

112   0   0.0 ( 0 )
 نشر من قبل Francisco Javier Rubio
 تاريخ النشر 2013
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

We introduce the univariate two--piece sinh-arcsinh distribution, which contains two shape parameters that separately control skewness and kurtosis. We show that this new model can capture higher levels of asymmetry than the original sinh-arcsinh distribution (Jones and Pewsey, 2009), in terms of some asymmetry measures, while keeping flexibility of the tails and tractability. We illustrate the performance of the proposed model with real data, and compare it to appropriate alternatives. Although we focus on the study of the univariat



قيم البحث

اقرأ أيضاً

Chromosome conformation capture experiments such as Hi-C are used to map the three-dimensional spatial organization of genomes. One specific feature of the 3D organization is known as topologically associating domains (TADs), which are densely intera cting, contiguous chromatin regions playing important roles in regulating gene expression. A few algorithms have been proposed to detect TADs. In particular, the structure of Hi-C data naturally inspires application of community detection methods. However, one of the drawbacks of community detection is that most methods take exchangeability of the nodes in the network for granted; whereas the nodes in this case, i.e. the positions on the chromosomes, are not exchangeable. We propose a network model for detecting TADs using Hi-C data that takes into account this non-exchangeability. In addition, our model explicitly makes use of cell-type specific CTCF binding sites as biological covariates and can be used to identify conserved TADs across multiple cell types. The model leads to a likelihood objective that can be efficiently optimized via relaxation. We also prove that when suitably initialized, this model finds the underlying TAD structure with high probability. Using simulated data, we show the advantages of our method and the caveats of popular community detection methods, such as spectral clustering, in this application. Applying our method to real Hi-C data, we demonstrate the domains identified have desirable epigenetic features and compare them across different cell types.
In this technical note, we address an unresolved challenge in neuroimaging statistics: how to determine which of several datasets is the best for inferring neuronal responses. Comparisons of this kind are important for experimenters when choosing an imaging protocol - and for developers of new acquisition methods. However, the hypothesis that one dataset is better than another cannot be tested using conventional statistics (based on likelihood ratios), as these require the data to be the same under each hypothesis. Here we present Bayesian data comparison, a principled framework for evaluating the quality of functional imaging data, in terms of the precision with which neuronal connectivity parameters can be estimated and competing models can be disambiguated. For each of several candidate datasets, neuronal responses are inferred using Dynamic Casual Modelling (DCM) - a commonly used Bayesian procedure for modelling neuroimaging data. Next, the parameters from subject-specific models are summarised at the group level using a Bayesian General Linear Model (GLM). A series of measures, which we introduce here, are then used to evaluate each dataset in terms of the precision of (group-level) parameter estimates and the ability of the data to distinguish similar models. To exemplify the approach, we compared four datasets that were acquired in a study evaluating multiband fMRI acquisition schemes. To enable people to reproduce these analyses using their own data and experimental paradigms, we provide general-purpose Matlab code via the SPM software.
Spatio-temporal modelling of tree defoliation data of German forest condition survey is presented. In the present study generalized additive mixed models were used to estimate the spatio-temporal trends of defoliation of the main tree species from 19 89 to 2015 and to examine the suitability of different monitoring grid resolutions. Although data has been collected since 1989, this is the first time the spatio-temporal modelling for entire Germany has been carried out. Besides the space-time component, stand age showed a significant effect on defoliation. The mean age and the species-specific relation between defoliation and age determined the general level of defoliation whereas fluctuations of defoliation were primarily related to weather conditions. The study indicates a strong association between drought stress and defoliation of all four main tree species. Besides direct effects of weather conditions, indirect effects seem to play a further role. Defoliation of the comparably drought-tolerant species pine and oak was primarily affected by insect infestations following drought whereas considerable time for regeneration was required by beech following drought stress and recurring substantial fructification. South-eastern Germany has emerged as the region with the highest defoliation since the drought year 2003. This region was characterized by the strongest water deficits in 2003 compared to the long-term reference period. The present study gives evidence that the focus has moved from air pollution to climate change. Furthermore, the spatio-temporal model was used to carry out a simulation study to compare different survey grid resolutions. This grid examination indicated that an 8 x 8 km grid instead of the standard 16 x 16 km grid is necessary for spatio-temporal trend estimation and for detecting hot-spots in defoliation in space and time, especially regarding oak.
Because of its mathematical tractability, the Gaussian mixture model holds a special place in the literature for clustering and classification. For all its benefits, however, the Gaussian mixture model poses problems when the data is skewed or contai ns outliers. Because of this, methods have been developed over the years for handling skewed data, and fall into two general categories. The first is to consider a mixture of more flexible skewed distributions, and the second is based on incorporating a transformation to near normality. Although these methods have been compared in their respective papers, there has yet to be a detailed comparison to determine when one method might be more suitable than the other. Herein, we provide a detailed comparison on many benchmarking datasets, as well as describe a novel method to assess cluster separation.
We propose a versatile joint regression framework for count responses. The method is implemented in the R add-on package GJRM and allows for modelling linear and non-linear dependence through the use of several copulae. Moreover, the parameters of th e marginal distributions of the count responses and of the copula can be specified as flexible functions of covariates. Motivated by a football application, we also discuss an extension which forces the regression coefficients of the marginal (linear) predictors to be equal via a suitable penalisation. Model fitting is based on a trust region algorithm which estimates simultaneously all the parameters of the joint models. We investigate the proposals empirical performance in two simulation studies, the first one designed for arbitrary count data, the other one reflecting football-specific settings. Finally, the method is applied to FIFA World Cup data, showing its competitiveness to the standard approach with regard to predictive performance.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا