Comparing the differences in outcomes (that is, in dependent variables) between two subpopulations is often most informative when comparing outcomes only for individuals from the subpopulations who are similar according to independent variables. The
independent variables are generally known as scores, as in propensity scores for matching or as in the probabilities predicted by statistical or machine-learned models, for example. If the outcomes are discrete, then some averaging is necessary to reduce the noise arising from the outcomes varying randomly over those discrete values in the observed data. The traditional method of averaging is to bin the data according to the scores and plot the average outcome in each bin against the average score in the bin. However, such binning can be rather arbitrary and yet greatly impacts the interpretation of displayed deviation between the subpopulations and assessment of its statistical significance. Fortunately, such binning is entirely unnecessary in plots of cumulative differences and in the associated scalar summary metrics that are analogous to the workhorse statistics of comparing probability distributions -- those due to Kolmogorov and Smirnov and their refinements due to Kuiper. The present paper develops such cumulative methods for the common case in which no score of any member of the subpopulations being compared is exactly equal to the score of any other member of either subpopulation.
The aim of this paper is to prove a Large Deviation Principle (LDP) for cumulative processes also known as coumpound renewal processes. These processes cumulate independent random variables occuring in time interval given by a renewal process. Our re
sult extends the one obtained in Lefevere et al. (2011) in the sense that we impose no specific dependency between the cumulated random variables and the renewal process. The proof is inspired from Lefevere et al. (2011) but deals with additional difficulties due to the general framework that is considered here. In the companion paper Cattiaux-Costa-Colombani (2021) we apply this principle to Hawkes processes with inhibition. Under some assumptions Hawkes processes are indeed cumulative processes, but they do not enter the framework of Lefevere et al. (2011).
When reporting the results of clinical studies, some researchers may choose the five-number summary (including the sample median, the first and third quartiles, and the minimum and maximum values) rather than the sample mean and standard deviation, p
articularly for skewed data. For these studies, when included in a meta-analysis, it is often desired to convert the five-number summary back to the sample mean and standard deviation. For this purpose, several methods have been proposed in the recent literature and they are increasingly used nowadays. In this paper, we propose to further advance the literature by developing a smoothly weighted estimator for the sample standard deviation that fully utilizes the sample size information. For ease of implementation, we also derive an approximation formula for the optimal weight, as well as a shortcut formula for the sample standard deviation. Numerical results show that our new estimator provides a more accurate estimate for normal data and also performs favorably for non-normal data. Together with the optimal sample mean estimator in Luo et al., our new methods have dramatically improved the existing methods for data transformation, and they are capable to serve as rules of thumb in meta-analysis for studies reported with the five-number summary. Finally for practical use, an Excel spreadsheet and an online calculator are also provided for implementing our optimal estimators.
Population size estimation based on capture-recapture experiment under triple record system is an interesting problem in various fields including epidemiology, population studies, etc. In many real life scenarios, there exists inherent dependency bet
ween capture and recapture attempts. We propose a novel model that successfully incorporates the possible dependency and the associated parameters possess nice interpretations. We provide estimation methodology for the population size and the associated model parameters based on maximum likelihood method. The proposed model is applied to analyze real data sets from public health and census coverage evaluation study. The performance of the proposed estimate is evaluated through extensive simulation study and the results are compared with the existing competitors. The results exhibit superiority of the proposed model over the existing competitors both in real data analysis and simulation study.
This study proposes a novel hierarchical prior for inferring possibly low-rank matrices measured with noise. We consider three-component matrix factorization, as in singular value decomposition, and its fully Bayesian inference. The proposed prior is
specified by a scale mixture of exponential distributions that has spike and slab components. The weights for the spike/slab parts are inferred using a special prior based on a cumulative shrinkage process. The proposed prior is designed to increasingly aggressively push less important, or essentially redundant, singular values toward zero, leading to more accurate estimates of low-rank matrices. To ensure the parameter identification, we simulate posterior draws from an approximated posterior, in which the constraints are slightly relaxed, using a No-U-Turn sampler. By means of a set of simulation studies, we show that our proposal is competitive with alternative prior specifications and that it does not incur significant additional computational burden. We apply the proposed approach to sectoral industrial production in the United States to analyze the structural change during the Great Moderation period.