No Arabic abstract
The Collaborative Analysis Versioning Environment System (CAVES) project concentrates on the interactions between users performing data and/or computing intensive analyses on large data sets, as encountered in many contemporary scientific disciplines. In modern science increasingly larger groups of researchers collaborate on a given topic over extended periods of time. The logging and sharing of knowledge about how analyses are performed or how results are obtained is important throughout the lifetime of a project. Here is where virtual data concepts play a major role. The ability to seamlessly log, exchange and reproduce results and the methods, algorithms and computer programs used in obtaining them enhances in a qualitative way the level of collaboration in a group or between groups in larger organizations. The CAVES project takes a pragmatic approach in assessing the needs of a community of scientists by building series of prototypes with increasing sophistication. In extending the functionality of existing data analysis packages with virtual data capabilities these prototypes provide an easy and habitual entry point for researchers to explore virtual data concepts in real life applications and to provide valuable feedback for refining the system design. The architecture is modular based on Web, Grid and other services which can be plugged in as desired. As a proof of principle we build a first system by extending the very popular data analysis framework ROOT, widely used in high energy physics and other fields, making it virtual data enabled.
A key feature of collaboration in science and software development is to have a {em log} of what and how is being done - for private use and reuse and for sharing selected parts with collaborators, which most often today are distributed geographically on an ever larger scale. Even better if this log is {em automatic}, created on the fly while a scientist or software developer is working in a habitual way, without the need for extra efforts. The {tt CAVES} and {tt CODESH} projects address this problem in a novel way, building on the concepts of {em virtual state} and {em virtual transition} to provide an automatic persistent logbook for sessions of data analysis or software development in a collaborating group. A repository of sessions can be configured dynamically to record and make available the knowledge accumulated in the course of a scientific or software endeavor. Access can be controlled to define logbooks of private sessions and sessions shared within or between collaborating groups.
We present an introduction to some concepts of Bayesian data analysis in the context of atomic physics. Starting from basic rules of probability, we present the Bayes theorem and its applications. In particular we discuss about how to calculate simple and joint probability distributions and the Bayesian evidence, a model dependent quantity that allows to assign probabilities to different hypotheses from the analysis of a same data set. To give some practical examples, these methods are applied to two concrete cases. In the first example, the presence or not of a satellite line in an atomic spectrum is investigated. In the second example, we determine the most probable model among a set of possible profiles from the analysis of a statistically poor spectrum. We show also how to calculate the probability distribution of the main spectral component without having to determine uniquely the spectrum modeling. For these two studies, we implement the program Nested fit to calculate the different probability distributions and other related quantities. Nested fit is a Fortran90/Python code developed during the last years for analysis of atomic spectra. As indicated by the name, it is based on the nested algorithm, which is presented in details together with the program itself.
VISPA is a novel development environment for high energy physics analyses, based on a combination of graphical and textual steering. The primary aim of VISPA is to support physicists in prototyping, performing, and verifying a data analysis of any complexity. We present example screenshots, and describe the underlying software concepts.
The GERDA and Majorana experiments will search for neutrinoless double-beta decay of germanium-76 using isotopically enriched high-purity germanium detectors. Although the experiments differ in conceptual design, they share many aspects in common, and in particular will employ similar data analysis techniques. The collaborations are jointly developing a C++ software library, MGDO, which contains a set of data objects and interfaces to encapsulate, store and manage physical quantities of interest, such as waveforms and high-purity germanium detector geometries. These data objects define a common format for persistent data, whether it is generated by Monte Carlo simulations or an experimental apparatus, to reduce code duplication and to ease the exchange of information between detector systems. MGDO also includes general-purpose analysis tools that can be used for the processing of measured or simulated digital signals. The MGDO design is based on the Object-Oriented programming paradigm and is very flexible, allowing for easy extension and customization of the components. The tools provided by the MGDO libraries are used by both GERDA and Majorana.
Psychological bias towards, or away from, a prior measurement or a theory prediction is an intrinsic threat to any data analysis. While various methods can be used to avoid the bias, e.g. actively not looking at the result, only data blinding is a traceable and thus trustworthy method to circumvent the bias and to convince a public audience that there is not even an accidental psychological bias. Data blinding is nowadays a standard practice in particle physics, but it is particularly difficult for experiments searching for the neutron electric dipole moment, as several cross measurements, in particular of the magnetic field, create a self-consistent network into which it is hard to inject a fake signal. We present an algorithm that modifies the data without influencing the experiment. Results of an automated analysis of the data are used to change the recorded spin state of a few neutrons of each measurement cycle. The flexible algorithm is applied twice to the data, to provide different data to various analysis teams. This gives us the option to sequentially apply various blinding offsets for separate analysis steps with independent teams. The subtle modification of the data allows us to modify the algorithm and to produce a re-blinded data set without revealing the blinding secret. The method was designed for the 2015/2016 measurement campaign of the nEDM experiment at the Paul Scherrer Institute. However, it can be re-used with minor modification for the follow-up experiment n2EDM, and may be suitable for comparable efforts.