Do you want to publish a course? Click here

We propose a Bayesian nonparametric model to infer population admixture, extending the Hierarchical Dirichlet Process to allow for correlation between loci due to Linkage Disequilibrium. Given multilocus genotype data from a sample of individuals, the model allows inferring classifying individuals as unadmixed or admixed, inferring the number of subpopulations ancestral to an admixed population and the population of origin of chromosomal regions. Our model does not assume any specific mutation process and can be applied to most of the commonly used genetic markers. We present a MCMC algorithm to perform posterior inference from the model and discuss methods to summarise the MCMC output for the analysis of population admixture. We demonstrate the performance of the proposed model in simulations and in a real application, using genetic data from the EDAR gene, which is considered to be ancestry-informative due to well-known variations in allele frequency as well as phenotypic effects across ancestry. The structure analysis of this dataset leads to the identification of a rare haplotype in Europeans.
Many have argued that statistics students need additional facility to express statistical computations. By introducing students to commonplace tools for data management, visualization, and reproducible analysis in data science and applying these to real-world scenarios, we prepare them to think statistically. In an era of increasingly big data, it is imperative that students develop data-related capacities, beginning with the introductory course. We believe that the integration of these precursors to data science into our curricula-early and often-will help statisticians be part of the dialogue regarding Big Data and Big Questions.
A learning environment, the tutor-web (http://tutor-web.net), has been developed and used for educational research. The system is accessible and free to use for anyone having access to the Web. It is based on open source software and the teaching material is licensed under the Creative Commons Attribution-ShareAlike License. The system has been used for computer-assisted education in statistics and mathematics. It offers a unique way to structure and link together teaching material and includes interactive quizzes with the primary purpose of increasing learning rather than mere evaluation. The system was used in a course on basic statistics in the University of Iceland, spring 2013. A randomized trial was conducted to investigate the difference in learning between students doing regular homework and students using the system. The difference between the groups was not found to be significant.
Long memory plays an important role in many fields by determining the behaviour and predictability of systems; for instance, climate, hydrology, finance, networks and DNA sequencing. In particular, it is important to test if a process is exhibiting long memory since that impacts the accuracy and confidence with which one may predict future events on the basis of a small amount of historical data. A major force in the development and study of long memory was the late Benoit B. Mandelbrot. Here we discuss the original motivation of the development of long memory and Mandelbrots influence on this fascinating field. We will also elucidate the sometimes contrasting approaches to long memory in different scientific communities
A freely available educational application (a mobile website) is presented. This provides access to educational material and drilling on selected topics within mathematics and statistics with an emphasis on tablets and mobile phones. The application adapts to the students performance, selecting from easy to difficult questions, or older material etc. These adaptations are based on statistical models and analyses of data from testing precursors of the system within several courses, from calculus and introductory statistics through multiple linear regression. The application can be used in both on-line and off-line modes. The behavior of the application is determined by parameters, the effects of which can be estimated statistically. Results presented include analyses of how the internal algorithms relate to passing a course and general incremental improvement in knowledge during a semester.
We investigate a Poisson sampling design in the presence of unknown selection probabilities when applied to a population of unknown size for multiple sampling occasions. The fixed-population model is adopted and extended upon for inference. The complete minimal sufficient statistic is derived for the sampling model parameters and fixed-population parameter vector. The Rao-Blackwell version of population quantity estimators is detailed. An application is applied to an emprical population. The extended inferential framework is found to have much potential and utility for empirical studies.
Mean profiles are widely used as indicators of the electricity consumption habits of customers. Currently, in Electricite De France (EDF), class load profiles are estimated using point-wise mean function. Unfortunately, it is well known that the mean is highly sensitive to the presence of outliers, such as one or more consumers with unusually high-levels of consumption. In this paper, we propose an alternative to the mean profile: the $L_1$-median profile which is more robust. When dealing with large datasets of functional data (load curves for example), survey sampling approaches are useful for estimating the median profile avoiding storing the whole data. We propose here estimators of the median trajectory using several sampling strategies and estimators. A comparison between them is illustrated by means of a test population. We develop a stratification based on the linearized variable which substantially improves the accuracy of the estimator compared to simple random sampling without replacement. We suggest also an improved estimator that takes into account auxiliary information. Some potential areas for future research are also highlighted.
This paper proposes consistent and asymptotically Gaussian estimators for the drift, the diffusion coefficient and the Hurst exponent of the discretely observed fractional Ornstein-Uhlenbeck process. For the estimation of the drift, the results are obtained only in the case when 1/2 < H < 3/4. This paper also provides ready-to-use software for the R statistical environment based on the YUIMA package.
In disease mapping, the aim is to estimate the spatial pattern in disease risk over an extended geographical region, so that areas with elevated risks can be identified. A Bayesian hierarchical approach is typically used to produce such maps, which models the risk surface with a set of spatially smooth random effects. However, in complex urban settings there are likely to be boundaries in the risk surface, which separate populations that are geographically adjacent but have very different risk profiles. Therefore this paper proposes an approach for detecting such risk boundaries, and tests its effectiveness by simulation. Finally, the model is applied to lung cancer incidence data in Greater Glasgow, Scotland, between 2001 and 2005.
For many decades, statisticians have made attempts to prepare the Bayesian omelette without breaking the Bayesian eggs; that is, to obtain probabilistic likelihood-based inferences without relying on informative prior distributions. A recent example is Murray Aitkins recent book, {em Statistical Inference}, which presents an approach to statistical hypothesis testing based on comparisons of posterior distributions of likelihoods under competing models. Aitkin develops and illustrates his method using some simple examples of inference from iid data and two-way tests of independence. We analyze in this note some consequences of the inferential paradigm adopted therein, discussing why the approach is incompatible with a Bayesian perspective and why we do not find it relevant for applied work.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا