StatPatternRecognition: A C++ Package for Statistical Analysis of High Energy Physics Data

61 0 0.0 ( 0 )

Download Cite

Added by Ilya Narsky

Publication date 2005

fields Physics

and research's language is English

Authors I. Narsky

Data Analysis Statistics and Probability

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Modern analysis of high energy physics (HEP) data needs advanced statistical tools to separate signal from background. A C++ package has been implemented to provide such tools for the HEP community. The package includes linear and quadratic discriminant analysis, decision trees, bump hunting (PRIM), boosting (AdaBoost), bagging and random forest algorithms, and interfaces to the standard backpropagation neural net and radial basis function neural net implemented in the Stuttgart Neural Network Simulator. Supplemental tools such as bootstrap, estimation of data moments, and a test of zero correlation between two variables with a joint elliptical distribution are also provided. The package offers a convenient set of tools for imposing requirements on input data and displaying output. Integrated in the BaBar computing environment, the package maintains a minimal set of external dependencies and therefore can be easily adapted to any other environment. It has been tested on many idealistic and realistic examples.

rate research

ROOT - A C++ Framework for Petabyte Data Storage, Statistical Analysis and Visualization

314 - Ilka Antcheva , Maarten Ballintijn , Bertrand Bellenot 2015

ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web, or a number of different shared file systems. In order to analyze this data, the user can chose out of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, ROOT offers packages for complex data modeling and fitting, as well as multivariate classification based on machine learning techniques. A central piece in these analysis tools are the histogram classes which provide binning of one- and multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks - e.g. data mining in HEP - by using PROOF, which will take care of optimally distributing the work over the available resources in a transparent way.

Data Analysis Statistics and Probability Distributed Parallel and Cluster Computing

Data Unfolding Methods in High Energy Physics

77 - Stefan Schmitt 2016

A selection of unfolding methods commonly used in High Energy Physics is compared. The methods discussed here are: bin-by-bin correction factors, matrix inversion, template fit, Tikhonov regularisation and two examples of iterative methods. Two procedures to choose the strength of the regularisation are tested, namely the L-curve scan and a scan of global correlation coefficients. The advantages and disadvantages of the unfolding methods and choices of the regularisation strength are discussed using a toy example.

Data Analysis Statistics and Probability High Energy Physics - Experiment Nuclear Experiment

Performance of an Operating High Energy Physics Data Grid: D0SAR-Grid

61 - B. Abbott , P. Baringer , T. Bolton 2005

The D0 experiment at Fermilabs Tevatron will record several petabytes of data over the next five years in pursuing the goals of understanding nature and searching for the origin of mass. Computing resources required to analyze these data far exceed capabilities of any one institution. Moreover, the widely scattered geographical distribution of D0 collaborators poses further serious difficulties for optimal use of human and computing resources. These difficulties will exacerbate in future high energy physics experiments, like the LHC. The computing grid has long been recognized as a solution to these problems. This technology is being made a more immediate reality to end users in D0 by developing a grid in the D0 Southern Analysis Region (D0SAR), D0SAR-Grid, using all available resources within it and a home-grown local task manager, McFarm. We will present the architecture in which the D0SAR-Grid is implemented, the use of technology and the functionality of the grid, and the experience from operating the grid in simulation, reprocessing and data analyses for a currently running HEP experiment.

Data Analysis Statistics and Probability Distributed Parallel and Cluster Computing Instrumentation and Detectors

Bayesian data analysis tools for atomic physics

380 - Martino Trassinelli 2016

We present an introduction to some concepts of Bayesian data analysis in the context of atomic physics. Starting from basic rules of probability, we present the Bayes theorem and its applications. In particular we discuss about how to calculate simple and joint probability distributions and the Bayesian evidence, a model dependent quantity that allows to assign probabilities to different hypotheses from the analysis of a same data set. To give some practical examples, these methods are applied to two concrete cases. In the first example, the presence or not of a satellite line in an atomic spectrum is investigated. In the second example, we determine the most probable model among a set of possible profiles from the analysis of a statistically poor spectrum. We show also how to calculate the probability distribution of the main spectral component without having to determine uniquely the spectrum modeling. For these two studies, we implement the program Nested fit to calculate the different probability distributions and other related quantities. Nested fit is a Fortran90/Python code developed during the last years for analysis of atomic spectra. As indicated by the name, it is based on the nested algorithm, which is presented in details together with the program itself.

Data Analysis Statistics and Probability High Energy Physics - Experiment Atomic Physics

ordpy: A Python package for data analysis with permutation entropy and ordinal network methods

86 - Arthur A. B. Pessa , Haroldo V. Ribeiro 2021

Since Bandt and Pompes seminal work, permutation entropy has been used in several applications and is now an essential tool for time series analysis. Beyond becoming a popular and successful technique, permutation entropy inspired a framework for mapping time series into symbolic sequences that triggered the development of many other tools, including an approach for creating networks from time series known as ordinal networks. Despite the increasing popularity, the computational development of these methods is fragmented, and there were still no efforts focusing on creating a unified software package. Here we present ordpy, a simple and open-source Python module that implements permutation entropy and several of the principal methods related to Bandt and Pompes framework to analyze time series and two-dimensional data. In particular, ordpy implements permutation entropy, Tsallis and Renyi permutation entropies, complexity-entropy plane, complexity-entropy curves, missing ordinal patterns, ordinal networks, and missing ordinal transitions for one-dimensional (time series) and two-dimensional (images) data as well as their multiscale generalizations. We review some theoretical aspects of these tools and illustrate the use of ordpy by replicating several literature results.

Data Analysis Statistics and Probability