No Arabic abstract
It will be recalled that the classical bivariate normal distributions have normal marginals and normal conditionals. It is natural to ask whether a similar phenomenon can be encountered involving Poisson marginals and conditionals. Reference to Arnold, Castillo and Sarabias (1999) book on conditionally specified models will confirm that Poisson marginals will be encountered, together with both conditionals being of the Poisson form, only in the case in which the variables are independent. Instead, in the present article we will be focusing on bivariate distributions with one marginal and the other family of conditionals being of the Poisson form. Such distributions are called Pseudo-Poisson distributions. We discuss distributional features of such models, explore inferential aspects and include an example of applications of the Pseudo-Poisson model to sets of over-dispersed data.
The tau statistic $tau$ uses geolocation and, usually, symptom onset time to assess global spatiotemporal clustering from epidemiological data. We test different factors that could affect graphical hypothesis tests of clustering or bias clustering range estimates based on the statistic, by comparison with a baseline analysis of an open access measles dataset. From re-analysing this data we find that the spatial bootstrap sampling method used to construct the confidence interval for the tau estimate and confidence interval (CI) type can bias clustering range estimates. We suggest that the bias-corrected and accelerated (BCa) CI is essential for asymmetric sample bootstrap distributions of tau estimates. We also find evidence against no spatiotemporal clustering, $p$-value $in$ [0,0.014] (global envelope test). We develop a tau-specific modification of the Loh & Stein spatial bootstrap sampling method, which gives more precise bootstrapped tau estimates and a 20% higher estimated clustering endpoint than previously published (36.0m; 95% BCa CI (14.9, 46.6), vs 30m) and an equivalent increase in the clustering area of elevated disease odds by 44%. What appears a modest radial bias in the range estimate is more than doubled on the areal scale, which public health resources are proportional to. This difference could have important consequences for control. Correct practice of hypothesis testing of no clustering and clustering range estimation of the tau statistic are illustrated in the Graphical abstract. We advocate proper implementation of this useful statistic, ultimately to reduce inaccuracies in control policy decisions made during disease clustering analysis.
Motivated by the current Coronavirus Disease (COVID-19) pandemic, which is due to the SARS-CoV-2 virus, and the important problem of forecasting daily deaths and cumulative deaths, this paper examines the construction of prediction regions or intervals under the Poisson regression model and for an over-dispersed Poisson regression model. For the Poisson regression model, several prediction regions are developed and their performance are compared through simulation studies. The methods are applied to the problem of forecasting daily and cumulative deaths in the United States (US) due to COVID-19. To examine their performance relative to what actually happened, daily deaths data until May 15th were used to forecast cumulative deaths by June 1st. It was observed that there is over-dispersion in the observed data relative to the Poisson regression model. An over-dispersed Poisson regression model is therefore proposed. This new model builds on frailty ideas in Survival Analysis and over-dispersion is quantified through an additional parameter. The Poisson regression model is a hidden model in this over-dispersed Poisson regression model and obtains as a limiting case when the over-dispersion parameter increases to infinity. A prediction region for the cumulative number of US deaths due to COVID-19 by July 16th, given the data until July 2nd, is presented. Finally, the paper discusses limitations of proposed procedures and mentions open research problems, as well as the dangers and pitfalls when forecasting on a long horizon, with focus on this pandemic where events, both foreseen and unforeseen, could have huge impacts on point predictions and prediction regions.
Statistical uncertainty has many components, such as measurement errors, temporal variation, or sampling. Not all of these sources are relevant when considering a specific application, since practitioners might view some attributes of observations as fixed. We study the statistical inference problem arising when data is drawn conditionally on some attributes. These attributes are assumed to be sampled from a super-population but viewed as fixed when conducting uncertainty quantification. The estimand is thus defined as the parameter of a conditional distribution. We propose methods to construct conditionally valid p-values and confidence intervals for these conditional estimands based on asymptotically linear estimators. In this setting, a given estimator is conditionally unbiased for potentially many conditional estimands, which can be seen as parameters of different populations. Testing different populations raises questions of multiple testing. We discuss simple procedures that control novel conditional error rates. In addition, we introduce a bias correction technique that enables transfer of estimators across conditional distributions arising from the same super-population. This can be used to infer parameters and estimators on future datasets based on some new data. The validity and applicability of the proposed methods are demonstrated on simulated and real-world data.
In this work we introduce the concept of Bures-Wasserstein barycenter $Q_*$, that is essentially a Frechet mean of some distribution $mathbb{P}$ supported on a subspace of positive semi-definite Hermitian operators $mathbb{H}_{+}(d)$. We allow a barycenter to be restricted to some affine subspace of $mathbb{H}_{+}(d)$ and provide conditions ensuring its existence and uniqueness. We also investigate convergence and concentration properties of an empirical counterpart of $Q_*$ in both Frobenius norm and Bures-Wasserstein distance, and explain, how obtained results are connected to optimal transportation theory and can be applied to statistical inference in quantum mechanics.
Analyses of environmental phenomena often are concerned with understanding unlikely events such as floods, heatwaves, droughts or high concentrations of pollutants. Yet the majority of the causal inference literature has focused on modelling means, rather than (possibly high) quantiles. We define a general estimator of the population quantile treatment (or exposure) effects (QTE) -- the weighted QTE (WQTE) -- of which the population QTE is a special case, along with a general class of balancing weights incorporating the propensity score. Asymptotic properties of the proposed WQTE estimators are derived. We further propose and compare propensity score regression and two weighted methods based on these balancing weights to understand the causal effect of an exposure on quantiles, allowing for the exposure to be binary, discrete or continuous. Finite sample behavior of the three estimators is studied in simulation. The proposed methods are applied to data taken from the Bavarian Danube catchment area to estimate the 95% QTE of phosphorus on copper concentration in the river.