ترغب بنشر مسار تعليمي؟ اضغط هنا

Multiscale Analysis of Count Data through Topic Alignment

89   0   0.0 ( 0 )
 نشر من قبل Kris Sankaran
 تاريخ النشر 2021
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Topic modeling is a popular method used to describe biological count data. With topic models, the user must specify the number of topics $K$. Since there is no definitive way to choose $K$ and since a true value might not exist, we develop techniques to study the relationships across models with different $K$. This can show how many topics are consistently present across different models, if a topic is only transiently present, or if a topic splits in two when $K$ increases. This strategy gives more insight into the process generating the data than choosing a single value of $K$ would. We design a visual representation of these cross-model relationships, which we call a topic alignment, and present three diagnostics based on it. We show the effectiveness of these tools for interpreting the topics on simulated and real data, and we release an accompanying R package, href{https://lasy.github.io/alto}{texttt{alto}}.

قيم البحث

اقرأ أيضاً

The celebrated Abakaliki smallpox data have appeared numerous times in the epidemic modelling literature, but in almost all cases only a specific subset of the data is considered. There is one previous analysis of the full data set, but this relies o n approximation methods to derive a likelihood. The data themselves continue to be of interest due to concerns about the possible re-emergence of smallpox as a bioterrorism weapon. We present the first full Bayesian analysis using data-augmentation Markov chain Monte Carlo methods which avoid the need for likelihood approximations. Results include estimates of basic model parameters as well as reproduction numbers and the likely path of infection. Model assessment is carried out using simulation-based methods.
Generalized autoregressive moving average (GARMA) models are a class of models that was developed for extending the univariate Gaussian ARMA time series model to a flexible observation-driven model for non-Gaussian time series data. This work present s Bayesian approach for GARMA models with Poisson, binomial and negative binomial distributions. A simulation study was carried out to investigate the performance of Bayesian estimation and Bayesian model selection criteria. Also three real datasets were analysed using the Bayesian approach on GARMA models.
Infectious diseases on farms pose both public and animal health risks, so understanding how they spread between farms is crucial for developing disease control strategies to prevent future outbreaks. We develop novel Bayesian nonparametric methodolog y to fit spatial stochastic transmission models in which the infection rate between any two farms is a function that depends on the distance between them, but without assuming a specified parametric form. Making nonparametric inference in this context is challenging since the likelihood function of the observed data is intractable because the underlying transmission process is unobserved. We adopt a fully Bayesian approach by assigning a transformed Gaussian Process prior distribution to the infection rate function, and then develop an efficient data augmentation Markov Chain Monte Carlo algorithm to perform Bayesian inference. We use the posterior predictive distribution to simulate the effect of different disease control methods and their economic impact. We analyse a large outbreak of Avian Influenza in the Netherlands and infer the between-farm infection rate, as well as the unknown infection status of farms which were pre-emptively culled. We use our results to analyse ring-culling strategies, and conclude that although effective, ring-culling has limited impact in high density areas.
133 - Pete Philipson 2021
Assessing the relative merits of sportsmen and women whose careers took place far apart in time via a suitable statistical model is a complex task as any comparison is compromised by fundamental changes to the sport and society and often handicapped by the popularity of inappropriate traditional metrics. In this work we focus on cricket and the ranking of Test match bowlers using bowling data from the first Test in 1877 onwards. A truncated, mean-parameterised Conway-Maxwell-Poisson model is developed to handle the under- and overdispersed nature of the data, which are in the form of small counts, and to extract the innate ability of individual bowlers. Inferences are made using a Bayesian approach by deploying a Markov Chain Monte Carlo algorithm to obtain parameter estimates and confidence intervals. The model offers a good fit and indicates that the commonly used bowling average is a flawed measure.
In the political decision process and control of COVID-19 (and other epidemic diseases), mathematical models play an important role. It is crucial to understand and quantify the uncertainty in models and their predictions in order to take the right d ecisions and trustfully communicate results and limitations. We propose to do uncertainty quantification in SIR-type models using the efficient framework of generalized Polynomial Chaos. Through two particular case studies based on Danish data for the spread of Covid-19 we demonstrate the applicability of the technique. The test cases are related to peak time estimation and superspeading and illustrate how very few model evaluations can provide insightful statistics.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا