GNA: new framework for statistical data analysis

66 0 0.0 ( 0 )

Download Cite

Added by Maxim Gonchar

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Anna Fatkina - Maxim Gonchar - Anastasia Kalitkina

Mathematical Software

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We report on the status of GNA --- a new framework for fitting large-scale physical models. GNA utilizes the data flow concept within which a model is represented by a directed acyclic graph. Each node is an operation on an array (matrix multiplication, derivative or cross section calculation, etc). The framework enables the user to create flexible and efficient large-scale lazily evaluated models, handle large numbers of parameters, propagate parameters uncertainties while taking into account possible correlations between them, fit models, and perform statistical analysis. The main goal of the paper is to give an overview of the main concepts and methods as well as reasons behind their design. Detailed technical information is to be published in further works.

rate research

CUDA Support in GNA Data Analysis Framework

173 - Anna Fatkina , Maxim Gonchar , Liudmila Kolupaeva 2018

Usage of GPUs as co-processors is a well-established approach to accelerate costly algorithms operating on matrices and vectors. We aim to further improve the performance of the Global Neutrino Analysis framework (GNA) by adding GPU support in a way that is transparent to the end user. To achieve our goal we use CUDA, a state of the art technology providing GPGPU programming methods. In this paper we describe new features of GNA related to CUDA support. Some specific framework features that influence GPGPU integration are also explained. The paper investigates the feasibility of GPU technology application and shows an example of the achieved acceleration of an algorithm implemented within framework. Benchmarks show a significant performance increase when using GPU transformations. The project is currently in the developmental phase. Our plans include implementation of the set of transformations necessary for the data analysis in the GNA framework and tests of the GPU expediency in the complete analysis chain.

Distributed Parallel and Cluster Computing Computational Engineering

HistFitter software framework for statistical data analysis

156 - M. Baak , G.J. Besjes , D. Cote 2014

We present a software framework for statistical data analysis, called HistFitter, that has been used extensively by the ATLAS Collaboration to analyze big datasets originating from proton-proton collisions at the Large Hadron Collider at CERN. Since 2012 HistFitter has been the standard statistical tool in searches for supersymmetric particles performed by ATLAS. HistFitter is a programmable and flexible framework to build, book-keep, fit, interpret and present results of data models of nearly arbitrary complexity. Starting from an object-oriented configuration, defined by users, the framework builds probability density functions that are automatically fitted to data and interpreted with statistical tests. A key innovation of HistFitter is its design, which is rooted in core analysis strategies of particle physics. The concepts of control, signal and validation regions are woven into its very fabric. These are progressively treated with statistically rigorous built-in methods. Being capable of working with multiple data models at once, HistFitter introduces an additional level of abstraction that allows for easy bookkeeping, manipulation and testing of large collections of signal hypotheses. Finally, HistFitter provides a collection of tools to present results with publication-quality style through a simple command-line interface.

High Energy Physics - Experiment

Hydra: a C++11 framework for data analysis in massively parallel platforms

192 - A. A. Alves Jr , M. D. Sokoloff 2017

Hydra is a header-only, templated and C++11-compliant framework designed to perform the typical bottleneck calculations found in common HEP data analyses on massively parallel platforms. The framework is implemented on top of the C++11 Standard Library and a variadic version of the Thrust library and is designed to run on Linux systems, using OpenMP, CUDA and TBB enabled devices. This contribution summarizes the main features of Hydra. A basic description of the overall design, functionality and user interface is provided, along with some code examples and measurements of performance.

Mathematical Software High Energy Physics - Experiment Computational Physics

ROOT - A C++ Framework for Petabyte Data Storage, Statistical Analysis and Visualization

314 - Ilka Antcheva , Maarten Ballintijn , Bertrand Bellenot 2015

ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web, or a number of different shared file systems. In order to analyze this data, the user can chose out of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, ROOT offers packages for complex data modeling and fitting, as well as multivariate classification based on machine learning techniques. A central piece in these analysis tools are the histogram classes which provide binning of one- and multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks - e.g. data mining in HEP - by using PROOF, which will take care of optimally distributing the work over the available resources in a transparent way.

Data Analysis Statistics and Probability Distributed Parallel and Cluster Computing

A new data analysis framework for the search of continuous gravitational wave signals

115 - O. J. Piccinni , S. Frasca , P. Astone 2018

Continuous gravitational wave signals, like those expected by asymmetric spinning neutron stars, are among the most promising targets for LIGO and Virgo detectors. The development of fast and robust data analysis methods is crucial to increase the chances of a detection. We have developed a new and flexible general data analysis framework for the search of this kind of signals, which allows to reduce the computational cost of the analysis by about two orders of magnitude with respect to current procedures. This can correspond, at fixed computing cost, to a sensitivity gain of up to 10%-20%, depending on the search parameter space. Some possible applications are discussed, with a particular focus on a directed search for sources in the Galactic center. Validation through the injection of artificial signals in the data of Advanced LIGO first observational science run is also shown.

General Relativity and Quantum Cosmology