ترغب بنشر مسار تعليمي؟ اضغط هنا

The joint modeling of mean and dispersion (JMMD) provides an efficient method to obtain useful models for the mean and dispersion, especially in problems of robust design experiments. However, in the literature on JMMD there are few works dedicated t o variable selection and this theme is still a challenge. In this article, we propose a procedure for selecting variables in JMMD, based on hypothesis testing and the quality of the models fit. A criterion for checking the goodness of fit is used, in each iteration of the selection process, as a filter for choosing the terms that will be evaluated by a hypothesis test. Three types of criteria were considered for checking the quality of the model fit in our variable selection procedure. The criteria used were: the extended Akaike information criterion, the corrected Akaike information criterion and a specific criterion for the JMMD, proposed by us, a type of extended adjusted coefficient of determination. Simulation studies were carried out to verify the efficiency of our variable selection procedure. In all situations considered, the proposed procedure proved to be effective and quite satisfactory. The variable selection process was applied to a real example from an industrial experiment.
In a network meta-analysis, some of the collected studies may deviate markedly from the others, for example having very unusual effect sizes. These deviating studies can be regarded as outlying with respect to the rest of the network and can be influ ential on the pooled results. Thus, it could be inappropriate to synthesize those studies without further investigation. In this paper, we propose two Bayesian methods to detect outliers in a network meta-analysis via: (a) a mean-shifted outlier model and (b), posterior predictive p-values constructed from ad-hoc discrepancy measures. The former method uses Bayes factors to formally test each study against outliers while the latter provides a score of outlyingness for each study in the network, which allows to numerically quantify the uncertainty associated with being outlier. Furthermore, we present a simple method based on informative priors as part of the network meta-analysis model to down-weight the detected outliers. We conduct extensive simulations to evaluate the effectiveness of the proposed methodology while comparing it to some alternative, available outlier diagnostic tools. Two real networks of interventions are then used to demonstrate our methods in practice.
Missing data is a common problem which has consistently plagued statisticians and applied analytical researchers. While replacement methods like mean-based or hot deck imputation have been well researched, emerging imputation techniques enabled throu gh improved computational resources have had limited formal assessment. This study formally considers five more recently developed imputation methods: Amelia, Mice, mi, Hmisc and missForest - compares their performances using RMSE against actual values and against the well-established mean-based replacement approach. The RMSE measure was consolidated by method using a ranking approach. Our results indicate that the missForest algorithm performed best and the mi algorithm performed worst.
We introduce and illustrate through numerical examples the R package texttt{SIHR} which handles the statistical inference for (1) linear and quadratic functionals in the high-dimensional linear regression and (2) linear functional in the high-dimensi onal logistic regression. The focus of the proposed algorithms is on the point estimation, confidence interval construction and hypothesis testing. The inference methods are extended to multiple regression models. We include real data applications to demonstrate the packages performance and practicality.
125 - Mine Dogucu , Jingchen Hu 2021
With the advances in tools and the rise of popularity, Bayesian statistics is becoming more important for undergraduates. In this study, we surveyed whether an undergraduate Bayesian course is offered or not in our sample of 152 high-ranking research universities and liberal arts colleges. For each identified Bayesian course, we examined how it fits into the institutions undergraduate curricula, such as majors and prerequisites. Through a series of course syllabi analyses, we explored the topics covered and their popularity in these courses, the adopted teaching and learning tools, such as software. This paper presents our findings on the current practices of Bayesian education at the undergraduate level. Based on our findings, we provide recommendations for programs that may consider offering Bayesian education to their students.
133 - Pete Philipson 2021
Assessing the relative merits of sportsmen and women whose careers took place far apart in time via a suitable statistical model is a complex task as any comparison is compromised by fundamental changes to the sport and society and often handicapped by the popularity of inappropriate traditional metrics. In this work we focus on cricket and the ranking of Test match bowlers using bowling data from the first Test in 1877 onwards. A truncated, mean-parameterised Conway-Maxwell-Poisson model is developed to handle the under- and overdispersed nature of the data, which are in the form of small counts, and to extract the innate ability of individual bowlers. Inferences are made using a Bayesian approach by deploying a Markov Chain Monte Carlo algorithm to obtain parameter estimates and confidence intervals. The model offers a good fit and indicates that the commonly used bowling average is a flawed measure.
Some years ago, Snapinn and Jiang[1] considered the interpretation and pitfalls of absolute versus relative treatment effect measures in analyses of time-to-event outcomes. Through specific examples and analytical considerations based solely on the e xponential and the Weibull distributions they reach two conclusions: 1) that the commonly used criteria for clinical effectiveness, the ARR (Absolute Risk Reduction) and the median (survival time) difference (MD) directly contradict each other and 2) cost-effectiveness depends only the hazard ratio(HR) and the shape parameter (in the Weibull case) but not the overall baseline risk of the population. Though provocative, the first conclusion does not apply to either the two special cases considered or even more generally, while the second conclusion is strictly correct only for the exponential case. Therefore, the implication inferred by the authors i.e. all measures of absolute treatment effect are of little value compared with the relative measure of the hazard ratio, is not of general validity and hence both absolute and relative measures should continue to be used when appraising clinical evidence.
In this paper we face the problem of representation of functional data with the tools of algebraic topology. We represent functions by means of merge trees and this representation is compared with that offered by persistence diagrams. We show that th ese two tree structures, although not equivalent, are both invariant under homeomorphic re-parametrizations of the functions they represent, thus allowing for a statistical analysis which is indifferent to functional misalignment. We employ a novel metric for merge trees and we prove a few theoretical results related to its specific implementation when merge trees represent functions. To showcase the good properties of our topological approach to functional data analysis, we first go through a few examples using data generated {em in silico} and employed to illustrate and compare the different representations provided by merge trees and persistence diagrams, and then we test it on the Aneurisk65 dataset replicating, from our different perspective, the supervised classification analysis which contributed to make this dataset a benchmark for methods dealing with misaligned functional data.
125 - Akisato Suzuki 2021
Which type of statistical uncertainty -- Frequentist statistical (in)significance with a p-value, or a Bayesian probability -- helps evidence-based policymaking better? To investigate this, I ran a survey experiment on a sample from the population of Ireland and obtained 517 responses. The experiment asked these participants to decide to or not to introduce a new bus line as a policy to reduce traffic jams. The treatment was the different types of statistical uncertainty information: statistical (in)significance with a p-value, and the probability that the estimate is correct. In each type, uncertainty was set either low or non-low. It turned out that participants shown the Frequentist information exhibited a much more deterministic tendency to adopting or not adopting the policy than those shown the Bayesian information, given the actual difference between the low-uncertainty and non-low-uncertainty the experimental scenarios implied. This finding suggests that policy-relevant quantitative research should present the uncertainty of statistical estimates using the probability of associated policy effects rather than statistical (in)significance, to allow the general public and policymakers to correctly evaluate the continuous nature of statistical uncertainty.
67 - Amanda Ng 2021
In this paper, I propose a general procedure for multivariate distribution-free nonparametric testing derived from the concept of ranks that are based upon measure transportation in the context of multiple change point analysis. I will use this algor ithm to estimate both the number of change points and their locations within an observed multivariate time series. In this paper, the change point problem is observed in a general setting in which both the given distribution and number of change points are unknown, rather than assume the observed time series follows a specific distribution or contains only one change point as many works in this area of study assume. The intention of this is to develop a technique for accurately identifying the changes in a distribution while making as few suppositions as possible. The rank energy statistic used here is based on energy statistics and has the potential to detect any change in a distribution. I present the properties of this new algorithm, which can be used to analyze various datasets, including hierarchical clustering, testing multivariate normality, gene selection, and microarray data analysis. This algorithm has also been implemented in the R package recp, which is available on GitHub.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا