No Arabic abstract
The tau statistic $tau$ uses geolocation and, usually, symptom onset time to assess global spatiotemporal clustering from epidemiological data. We test different factors that could affect graphical hypothesis tests of clustering or bias clustering range estimates based on the statistic, by comparison with a baseline analysis of an open access measles dataset. From re-analysing this data we find that the spatial bootstrap sampling method used to construct the confidence interval for the tau estimate and confidence interval (CI) type can bias clustering range estimates. We suggest that the bias-corrected and accelerated (BCa) CI is essential for asymmetric sample bootstrap distributions of tau estimates. We also find evidence against no spatiotemporal clustering, $p$-value $in$ [0,0.014] (global envelope test). We develop a tau-specific modification of the Loh & Stein spatial bootstrap sampling method, which gives more precise bootstrapped tau estimates and a 20% higher estimated clustering endpoint than previously published (36.0m; 95% BCa CI (14.9, 46.6), vs 30m) and an equivalent increase in the clustering area of elevated disease odds by 44%. What appears a modest radial bias in the range estimate is more than doubled on the areal scale, which public health resources are proportional to. This difference could have important consequences for control. Correct practice of hypothesis testing of no clustering and clustering range estimation of the tau statistic are illustrated in the Graphical abstract. We advocate proper implementation of this useful statistic, ultimately to reduce inaccuracies in control policy decisions made during disease clustering analysis.
Introduction The tau statistic is a recent second-order correlation function that can assess the magnitude and range of global spatiotemporal clustering from epidemiological data containing geolocations of individual cases and, usually, disease onset times. This is the first review of its use, and the aspects of its computation and presentation that could affect inferences drawn and bias estimates of the statistic. Methods Using Google Scholar we searched papers or preprints that cited the papers that first defined/reformed the statistic. We tabulated their key characteristics to understand the statistics development since 2012. Results Only half of the 16 studies found were considered to be using true tau statistics, but their inclusion in the review still provided important insights into their analysis motivations. All papers that used graphical hypothesis testing and parameter estimation used incorrect methods. There is a lack of clarity over how to choose the time-relatedness interval to relate cases and the distance band set, that are both required to calculate the statistic. Some studies demonstrated nuanced applications of the tau statistic in settings with unusual data or time relation variables, which enriched understanding of its possibilities. A gap was noticed in the estimators available to account for variable person-time at risk. Discussion Our review comprehensively covers current uses of the tau statistic for descriptive analysis, graphical hypothesis testing, and parameter estimation of spatiotemporal clustering. We also define a new estimator of the tau statistic for disease rates. For the tau statistic there are still open questions on its implementation which we hope this review inspires others to research.
It will be recalled that the classical bivariate normal distributions have normal marginals and normal conditionals. It is natural to ask whether a similar phenomenon can be encountered involving Poisson marginals and conditionals. Reference to Arnold, Castillo and Sarabias (1999) book on conditionally specified models will confirm that Poisson marginals will be encountered, together with both conditionals being of the Poisson form, only in the case in which the variables are independent. Instead, in the present article we will be focusing on bivariate distributions with one marginal and the other family of conditionals being of the Poisson form. Such distributions are called Pseudo-Poisson distributions. We discuss distributional features of such models, explore inferential aspects and include an example of applications of the Pseudo-Poisson model to sets of over-dispersed data.
We propose a framework for Bayesian non-parametric estimation of the rate at which new infections occur assuming that the epidemic is partially observed. The developed methodology relies on modelling the rate at which new infections occur as a function which only depends on time. Two different types of prior distributions are proposed namely using step-functions and B-splines. The methodology is illustrated using both simulated and real datasets and we show that certain aspects of the epidemic such as seasonality and super-spreading events are picked up without having to explicitly incorporate them into a parametric model.
We propose a new method for clustering of functional data using a $k$-means framework. We work within the elastic functional data analysis framework, which allows for decomposition of the overall variation in functional data into amplitude and phase components. We use the amplitude component to partition functions into shape clusters using an automated approach. To select an appropriate number of clusters, we additionally propose a novel Bayesian Information Criterion defined using a mixture model on principal components estimated using functional Principal Component Analysis. The proposed method is motivated by the problem of posterior exploration, wherein samples obtained from Markov chain Monte Carlo algorithms are naturally represented as functions. We evaluate our approach using a simulated dataset, and apply it to a study of acute respiratory infection dynamics in San Luis Potos{i}, Mexico.
Copulas provide a modular parameterization of multivariate distributions that decouples the modeling of marginals from the dependencies between them. Gaussian Mixture Copula Model (GMCM) is a highly flexible copula that can model many kinds of multi-modal dependencies, as well as asymmetric and tail dependencies. They have been effectively used in clustering non-Gaussian data and in Reproducibility Analysis, a meta-analysis method designed to verify the reliability and consistency of multiple high-throughput experiments. Parameter estimation for GMCM is challenging due to its intractable likelihood. The best previous methods have maximized a proxy-likelihood through a Pseudo Expectation Maximization (PEM) algorithm. They have no guarantees of convergence or convergence to the correct parameters. In this paper, we use Automatic Differentiation (AD) tools to develop a method, called AD-GMCM, that can maximize the exact GMCM likelihood. In our simulation studies and experiments with real data, AD-GMCM finds more accurate parameter estimates than PEM and yields better performance in clustering and Reproducibility Analysis. We discuss the advantages of an AD-based approach, to address problems related to monotonic increase of likelihood and parameter identifiability in GMCM. We also analyze, for GMCM, two well-known cases of degeneracy of maximum likelihood in GMM that can lead to spurious clustering solutions. Our analysis shows that, unlike GMM, GMCM is not affected in one of the cases.