No Arabic abstract
PyUnfold is a Python package for incorporating imperfections of the measurement process into a data analysis pipeline. In an ideal world, we would have access to the perfect detector: an apparatus that makes no error in measuring a desired quantity. However, in real life, detectors have finite resolutions, characteristic biases that cannot be eliminated, less than full detection efficiencies, and statistical and systematic uncertainties. By building a matrix that encodes a detectors smearing of the desired true quantity into the measured observable(s), a deconvolution can be performed that provides an estimate of the true variable. This deconvolution process is known as unfolding. The unfolding method implemented in PyUnfold accomplishes this deconvolution via an iterative procedure, providing results based on physical expectations of the desired quantity. Furthermore, tedious book-keeping for both statistical and systematic errors produces precise final uncertainty estimates.
A data-driven convergence criterion for the DAgostini (Richardson-Lucy) iterative unfolding is presented. It relies on the unregularized spectrum (infinite number of iterations), and allows a safe estimation of the bias and undercoverage induced by truncating the algorithm. In addition, situations where the response matrix is not perfectly known are also discussed, and show that in most cases the unregularized spectrum is not an unbiased estimator of the true distribution. Whenever a bias is introduced, either by truncation of by poor knowledge of the response, a way to retrieve appropriate coverage properties is proposed.
Since Bandt and Pompes seminal work, permutation entropy has been used in several applications and is now an essential tool for time series analysis. Beyond becoming a popular and successful technique, permutation entropy inspired a framework for mapping time series into symbolic sequences that triggered the development of many other tools, including an approach for creating networks from time series known as ordinal networks. Despite the increasing popularity, the computational development of these methods is fragmented, and there were still no efforts focusing on creating a unified software package. Here we present ordpy, a simple and open-source Python module that implements permutation entropy and several of the principal methods related to Bandt and Pompes framework to analyze time series and two-dimensional data. In particular, ordpy implements permutation entropy, Tsallis and Renyi permutation entropies, complexity-entropy plane, complexity-entropy curves, missing ordinal patterns, ordinal networks, and missing ordinal transitions for one-dimensional (time series) and two-dimensional (images) data as well as their multiscale generalizations. We review some theoretical aspects of these tools and illustrate the use of ordpy by replicating several literature results.
We present a universal method to include residual un-modeled background shape uncertainties in likelihood based statistical tests for high energy physics and astroparticle physics. This approach provides a simple and natural protection against mismodeling, thus lowering the chances of a false discovery or of an over constrained confidence interval, and allows a natural transition to unbinned space. Unbinned likelihood allows optimal usage of information for the data and the models, and enhances the sensitivity. We show that the asymptotic behavior of the test statistic can be regained in cases where the model fails to describe the true background behavior, and present 1D and 2D case studies for model-driven and data-driven background models. The resulting penalty on sensitivities follows the actual discrepancy between the data and the models, and is asymptotically reduced to zero with increasing knowledge.
A method for correcting for detector smearing effects using machine learning techniques is presented. Compared to the standard approaches the method can use more than one reconstructed variable to infere the value of the unsmeared quantity on event by event basis. The method is implemented using a sequential neural network with a categorical cross entropy as the loss function. It is tested on a toy example and is shown to satisfy basic closure tests. Possible application of the method for analysis of the data from high energy physics experiments is discussed.
A selection of unfolding methods commonly used in High Energy Physics is compared. The methods discussed here are: bin-by-bin correction factors, matrix inversion, template fit, Tikhonov regularisation and two examples of iterative methods. Two procedures to choose the strength of the regularisation are tested, namely the L-curve scan and a scan of global correlation coefficients. The advantages and disadvantages of the unfolding methods and choices of the regularisation strength are discussed using a toy example.