No Arabic abstract
In a statistical analysis in Particle Physics, nuisance parameters can be introduced to take into account various types of systematic uncertainties. The best estimate of such a parameter is often modeled as a Gaussian distributed variable with a given standard deviation (the corresponding systematic error). Although the assigned systematic errors are usually treated as constants, in general they are themselves uncertain. A type of model is presented where the uncertainty in the assigned systematic errors is taken into account. Estimates of the systematic variances are modeled as gamma distributed random variables. The resulting confidence intervals show interesting and useful properties. For example, when averaging measurements to estimate their mean, the size of the confidence interval increases for decreasing goodness-of-fit, and averages have reduced sensitivity to outliers. The basic properties of the model are presented and several examples relevant for Particle Physics are explored.
We derive a Bayesian framework for incorporating selection effects into population analyses. We allow for both measurement uncertainty in individual measurements and, crucially, for selection biases on the population of measurements, and show how to extract the parameters of the underlying distribution based on a set of observations sampled from this distribution. We illustrate the performance of this framework with an example from gravitational-wave astrophysics, demonstrating that the mass ratio distribution of merging compact-object binaries can be extracted from Malmquist-biased observations with substantial measurement uncertainty.
There are many uses for linear fitting; the context here is interpolation and denoising of data, as when you have calibration data and you want to fit a smooth, flexible function to those data. Or you want to fit a flexible function to de-trend a time series or normalize a spectrum. In these contexts, investigators often choose a polynomial basis, or a Fourier basis, or wavelets, or something equally general. They also choose an order, or number of basis functions to fit, and (often) some kind of regularization. We discuss how this basis-function fitting is done, with ordinary least squares and extensions thereof. We emphasize that it is often valuable to choose far more parameters than data points, despite folk rules to the contrary: Suitably regularized models with enormous numbers of parameters generalize well and make good predictions for held-out data; over-fitting is not (mainly) a problem of having too many parameters. It is even possible to take the limit of infinite parameters, at which, if the basis and regularization are chosen correctly, the least-squares fit becomes the mean of a Gaussian process. We recommend cross-validation as a good empirical method for model selection (for example, setting the number of parameters and the form of the regularization), and jackknife resampling as a good empirical method for estimating the uncertainties of the predictions made by the model. We also give advice for building stable computational implementations.
We perform statistical analysis of the single-vehicle data measured on the Dutch freeway A9 and discussed in Ref. [2]. Using tools originating from the Random Matrix Theory we show that the significant changes in the statistics of the traffic data can be explained applying equilibrium statistical physics of interacting particles.
In this paper, after a discussion of general properties of statistical tests, we present the construction of the most powerful hypothesis test for determining the existence of a new phenomenon in counting-type experiments where the observed Poisson process is subject to a Poisson distributed background with unknown mean.
We examine the problem of construction of confidence intervals within the basic single-parameter, single-iteration variation of the method of quasi-optimal weights. Two kinds of distortions of such intervals due to insufficiently large samples are examined, both allowing an analytical investigation. First, a criterion is developed for validity of the assumption of asymptotic normality together with a recipe for the corresponding corrections. Second, a method is derived to take into account the systematic shift of the confidence interval due to the non-linearity of the theoretical mean of the weight as a function of the parameter to be estimated. A numerical example illustrates the two corrections.