No Arabic abstract
Grain is a data analysis framework developed to be used with the novel Total Data Readout data acquisition system. In Total Data Readout all the electronics channels are read out asynchronously in singles mode and each data item is timestamped. Event building and analysis has to be done entirely in the software post-processing the data stream. A flexible and efficient event parser and the accompanying software framework have been written entirely in Java. The design and implementation of the software are discussed along with experiences gained in running real-life experiments.
The REST-for-Physics (Rare Event Searches Toolkit for Physics) framework is a ROOT-based solution providing the means to process and analyze experimental or Monte Carlo event data. Special care has been taken on the traceability of the code and the validation of the results produced within the framework, together with the connectivity between code and data stored registered through specific version metadata members. The framework development was originally motivated to cover the needs at Rare Event Searches experiments (experiments looking for phenomena having extremely low occurrence probability like dark matter or neutrino interactions or rare nuclear decays), and its components naturally implement tools to address the challenges in these kinds of experiments; the integration of a detector physics response, the implementation of signal processing routines, or topological algorithms for physical event identification are some examples. Despite this specialization, the framework was conceived thinking in scalability, and other event-oriented applications could benefit from the data processing routines and/or metadata description implemented in REST, being the generic framework tools completely decoupled from dedicated libraries. REST-for-Physics is a consolidated piece of software already serving the needs of different physics experiments - using gaseous Time Projection Chambers (TPCs) as detection technology - for background data analysis and detector characterization, as well as generic detector R&D. Even though REST has been exploited mainly with gaseous TPCs, the code could be easily applied or adapted to other detection technologies. We present in this work an overview of REST-for-Physics, providing a broad perspective to the infrastructure and organization of the project as a whole. The framework and its different components will be described in the text.
A fast physics analysis framework has been developed based on SNiPER to process the increasingly large data sample collected by BESIII. In this framework, a reconstructed event data model with SmartRef is designed to improve the speed of Input/Output operations, and necessary physics analysis tools are migrated from BOSS to SNiPER. A real physics analysis $e^{+}e^{-} rightarrow pi^{+}pi^{-}J/psi$ is used to test the new framework, and achieves a factor of 10.3 improvement in Input/Output speed compared to BOSS. Further tests show that the improvement is mainly attributed to the new reconstructed event data model and the lazy-loading functionality provided by SmartRef.
X-ray scattering experiments using Free Electron Lasers (XFELs) are a powerful tool to determine the molecular structure and function of unknown samples (such as COVID-19 viral proteins). XFEL experiments are a challenge to computing in two ways: i) due to the high cost of running XFELs, a fast turnaround time from data acquisition to data analysis is essential to make informed decisions on experimental protocols; ii) data collection rates are growing exponentially, requiring new scalable algorithms. Here we report our experiences analyzing data from two experiments at the Linac Coherent Light Source (LCLS) during September 2020. Raw data were analyzed on NERSCs Cori XC40 system, using the Superfacility paradigm: our workflow automatically moves raw data between LCLS and NERSC, where it is analyzed using the software package CCTBX. We achieved real time data analysis with a turnaround time from data acquisition to full molecular reconstruction in as little as 10 min -- sufficient time for the experiments operators to make informed decisions. By hosting the data analysis on Cori, and by automating LCLS-NERSC interoperability, we achieved a data analysis rate which matches the data acquisition rate. Completing data analysis with 10 mins is a first for XFEL experiments and an important milestone if we are to keep up with data collection trends.
Orchestrating parametric fitting of multicomponent spectra at scale is an essential yet underappreciated task in high-throughput quantification of materials and chemical composition. To automate the annotation process for spectroscopic and diffraction data collected in counts of hundreds to thousands, we present a systematic approach compatible with high-performance computing infrastructures using the MapReduce model and task-based parallelization. We implement the approach in software and demonstrate linear computational scaling with respect to spectral components using multidimensional experimental materials characterization datasets from photoemission spectroscopy and powder electron diffraction as benchmarks. Our approach enables efficient generation of high-quality data annotation and online spectral analysis and is applicable to a variety of analytical techniques in materials science and chemistry as a building block for closed-loop experimental systems.
We present here Nested_fit, a Bayesian data analysis code developed for investigations of atomic spectra and other physical data. It is based on the nested sampling algorithm with the implementation of an upgraded lawn mower robot method for finding new live points. For a given data set and a chosen model, the program provides the Bayesian evidence, for the comparison of different hypotheses/models, and the different parameter probability distributions. A large database of spectral profiles is already available (Gaussian, Lorentz, Voigt, Log-normal, etc.) and additional ones can easily added. It is written in Fortran, for an optimized parallel computation, and it is accompanied by a Python library for the results visualization.