Reproducible Experiment Platform

60 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Tatiana Likhomanenko

تاريخ النشر 2015

مجال البحث فيزياء

والبحث باللغة English

تأليف Tatiana Likhomanenko - Alex Rogozhnikov - Alexander Baranov

تحليل البيانات والإحصاءات والاحتمال

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Data analysis in fundamental sciences nowadays is an essential process that pushes frontiers of our knowledge and leads to new discoveries. At the same time we can see that complexity of those analyses increases fast due to a)~enormous volumes of datasets being analyzed, b)~variety of techniques and algorithms one have to check inside a single analysis, c)~distributed nature of research teams that requires special communication media for knowledge and information exchange between individual researchers. There is a lot of resemblance between techniques and problems arising in the areas of industrial information retrieval and particle physics. To address those problems we propose Reproducible Experiment Platform (REP), a software infrastructure to support collaborative ecosystem for computational science. It is a Python based solution for research teams that allows running computational experiments on shared datasets, obtaining repeatable results, and consistent comparisons of the obtained results. We present some key features of REP based on case studies which include trigger optimization and physics analysis studies at the LHCb experiment.

قيم البحث

120 - Marco Meoni , Valentin Kuznetsov , Luca Menichetti 2017

The CERN IT provides a set of Hadoop clusters featuring more than 5 PBytes of raw storage with different open-source, user-level tools available for analytical purposes. The CMS experiment started collecting a large set of computing meta-data, e.g. d ataset, file access logs, since 2015. These records represent a valuable, yet scarcely investigated, set of information that needs to be cleaned, categorized and analyzed. CMS can use this information to discover useful patterns and enhance the overall efficiency of the distributed data, improve CPU and site utilization as well as tasks completion time. Here we present evaluation of Apache Spark platform for CMS needs. We discuss two main use-cases CMS analytics and ML studies where efficient process billions of records stored on HDFS plays an important role. We demonstrate that both Scala and Python (PySpark) APIs can be successfully used to execute extremely I/O intensive queries and provide valuable data insight from collected meta-data.

تحليل البيانات والإحصاءات والاحتمال الفيزياء الحسابية

Predicting dataset popularity for the CMS experiment

52 - Valentin Kuznetsov , Ting Li , Luca Giommi 2016

The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics ana lysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure.

تحليل البيانات والإحصاءات والاحتمال فيزياء الطاقة العالية - التجربة

Data blinding for the nEDM experiment at PSI

96 - N. J. Ayres , G. Ban , G. Bison 2019

Psychological bias towards, or away from, a prior measurement or a theory prediction is an intrinsic threat to any data analysis. While various methods can be used to avoid the bias, e.g. actively not looking at the result, only data blinding is a tr aceable and thus trustworthy method to circumvent the bias and to convince a public audience that there is not even an accidental psychological bias. Data blinding is nowadays a standard practice in particle physics, but it is particularly difficult for experiments searching for the neutron electric dipole moment, as several cross measurements, in particular of the magnetic field, create a self-consistent network into which it is hard to inject a fake signal. We present an algorithm that modifies the data without influencing the experiment. Results of an automated analysis of the data are used to change the recorded spin state of a few neutrons of each measurement cycle. The flexible algorithm is applied twice to the data, to provide different data to various analysis teams. This gives us the option to sequentially apply various blinding offsets for separate analysis steps with independent teams. The subtle modification of the data allows us to modify the algorithm and to produce a re-blinded data set without revealing the blinding secret. The method was designed for the 2015/2016 measurement campaign of the nEDM experiment at the Paul Scherrer Institute. However, it can be re-used with minor modification for the follow-up experiment n2EDM, and may be suitable for comparable efforts.

تحليل البيانات والإحصاءات والاحتمال فيزياء الطاقة العالية - التجربة التجربة النووية

Off-line data quality monitoring for the GERDA experiment

259 - P. Zavarise , M. Agostini , A. A. Machado 2011

GERDA is an experiment searching for the neutrinoless {beta}{beta} decay of Ge-76. The experiment uses an array of high-purity germanium detectors, enriched in Ge-76, directly immersed in liquid argon. GERDA recently started the physics data taking u sing eight enriched coaxial detectors. The status of the experiment has to be closely monitored in order to promptly identify possible instabilities or problems. The on-line slow control system is complemented by a regular off-line monitoring of data quality. This ensures that data are qualified to be used in the physics analysis and allows to reject data sets which do not meet the minimum quality standards. The off-line data monitoring is entirely performed within the software framework GELATIO. In addition, a relational database, complemented by a web-based interface, was developed to support the off-line monitoring and to automatically provide information to daily assess data quality. The concept and the performance of the off-line monitoring tools were tested and validated during the one-year commissioning phase.

تحليل البيانات والإحصاءات والاحتمال

Experiment Software and Projects on the Web with VISPA

78 - Martin Erdmann , Benjamin Fischer , Robert Fischer 2017

The Visual Physics Analysis (VISPA) project defines a toolbox for accessing software via the web. It is based on latest web technologies and provides a powerful extension mechanism that enables to interface a wide range of applications. Beyond basic applications such as a code editor, a file browser, or a terminal, it meets the demands of sophisticated experiment-specific use cases that focus on physics data analyses and typically require a high degree of interactivity. As an example, we developed a data inspector that is capable of browsing interactively through event content of several data formats, e.g., MiniAOD which is utilized by the CMS collaboration. The VISPA extension mechanism can also be used to embed external web-based applications that benefit from dynamic allocation of user-defined computing resources via SSH. For example, by wrapping the JSROOT project, ROOT files located on any remote machine can be inspected directly through a VISPA server instance. We introduced domains that combine groups of users and role-based permissions. Thereby, tailored projects are enabled, e.g. for teaching where access to students homework is restricted to a team of tutors, or for experiment-specific data that may only be accessible for members of the collaboration. We present the extension mechanism including corresponding applications and give an outlook onto the new permission system.

تحليل البيانات والإحصاءات والاحتمال

سجل دخول لتتمكن من نشر تعليقات