Building a scalable python distribution for HEP data analysis

113 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل David Lange

تاريخ النشر 2018

مجال البحث فيزياء

والبحث باللغة English

تأليف David Lange

الفيزياء الحسابية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

There are numerous approaches to building analysis applications across the high-energy physics community. Among them are Python-based, or at least Python-driven, analysis workflows. We aim to ease the adoption of a Python-based analysis toolkit by making it easier for non-expert users to gain access to Python tools for scientific analysis. Experimental software distributions and individual user analysis have quite different requirements. Distributions tend to worry most about stability, usability and reproducibility, while the users usually strive to be fast and nimble. We discuss how we built and now maintain a python distribution for analysis while satisfying requirements both a large software distribution (in our case, that of CMSSW) and user, or laptop, level analysis. We pursued the integration of tools used by the broader data science community as well as HEP developed (e.g., histogrammar, root_numpy) Python packages. We discuss concepts we investigated for package integration and testing, as well as issues we encountered through this process. Distribution and platform support are important topics. We discuss our approach and progress towards a sustainable infrastructure for supporting this Python stack for the CMS user community and for the broader HEP user community.

قيم البحث

139 - Scott A. Norris 2014

We introduce a Python framework designed to automate the most common tasks associated with the extraction and upscaling of the statistics of single-impact crater functions to inform coefficients of continuum equations describing surface morphology ev olution. Designed with ease-of-use in mind, the framework allows users to extract meaningful statistical estimates with very short Python programs. Wrappers to interface with specific simulation packages, routines for statistical extraction of output, and fitting and differentiation libraries are all hidden behind simple, high-level user-facing functions. In addition, the framework is extensible, allowing advanced users to specify the collection of specialized statistics or the creation of customized plots. The framework is hosted on the BitBucket service under an open-source license, with the aim of helping non-specialists easily extract preliminary estimates of relevant crater function results associated with a particular experimental system.

الفيزياء الحسابية علم المواد الهندسة الحاسوبية، المالية،العلوم

HEP Software Foundation Community White Paper Working Group - Data Analysis and Interpretation

434 - Lothar Bauerdick , Riccardo Maria Bianchi , Brian Bockelman 2018

At the heart of experimental high energy physics (HEP) is the development of facilities and instrumentation that provide sensitivity to new phenomena. Our understanding of nature at its most fundamental level is advanced through the analysis and inte rpretation of data from sophisticated detectors in HEP experiments. The goal of data analysis systems is to realize the maximum possible scientific potential of the data within the constraints of computing and human resources in the least time. To achieve this goal, future analysis systems should empower physicists to access the data with a high level of interactivity, reproducibility and throughput capability. As part of the HEP Software Foundation Community White Paper process, a working group on Data Analysis and Interpretation was formed to assess the challenges and opportunities in HEP data analysis and develop a roadmap for activities in this area over the next decade. In this report, the key findings and recommendations of the Data Analysis and Interpretation Working Group are presented.

الفيزياء الحسابية فيزياء الطاقة العالية - التجربة

ProMC: Input-output data format for HEP applications using varint encoding

115 - S.V. Chekanov , E. May , K. Strand 2013

A new data format for Monte Carlo (MC) events, or any structural data, including experimental data, is discussed. The format is designed to store data in a compact binary form using variable-size integer encoding as implemented in the Googles Protoco l Buffers package. This approach is implemented in the ProMC library which produces smaller file sizes for MC records compared to the existing input-output libraries used in high-energy physics (HEP). Other important features of the proposed format are a separation of abstract data layouts from concrete programming implementations, self-description and random access. Data stored in ProMC files can be written, read and manipulated in a number of programming languages, such C++, JAVA, FORTRAN and PYTHON.

الفيزياء الحسابية علوم الكمبيوتر فيزياء الطاقة العالية - التجربة

Using Big Data Technologies for HEP Analysis

88 - Matteo Cremonesi , Claudio Bellini , Bianny Bian 2019

The HEP community is approaching an era were the excellent performances of the particle accelerators in delivering collision at high rate will force the experiments to record a large amount of information. The growing size of the datasets could poten tially become a limiting factor in the capability to produce scientific results timely and efficiently. Recently, new technologies and new approaches have been developed in industry to answer to the necessity to retrieve information as quickly as possible to analyze PB and EB datasets. Providing the scientists with these modern computing tools will lead to rethinking the principles of data analysis in HEP, making the overall scientific process faster and smoother. In this paper, we are presenting the latest developments and the most recent results on the usage of Apache Spark for HEP analysis. The study aims at evaluating the efficiency of the application of the new tools both quantitatively, by measuring the performances, and qualitatively, focusing on the user experience. The first goal is achieved by developing a data reduction facility: working together with CERN Openlab and Intel, CMS replicates a real physics search using Spark-based technologies, with the ambition of reducing 1 PB of public data in 5 hours, collected by the CMS experiment, to 1 TB of data in a format suitable for physics analysis. The second goal is achieved by implementing multiple physics use-cases in Apache Spark using as input preprocessed datasets derived from official CMS data and simulation. By performing different end-analyses up to the publication plots on different hardware, feasibility, usability and portability are compared to the ones of a traditional ROOT-based workflow.

النظم الموزعة والتوازية والحوسبة العنقودية

HEP Software Foundation Community White Paper Working Group -- Conditions Data

144 - Paul Laycock , Marko Bracko , Marco Clemencic 2019

To produce the best physics results, high energy physics experiments require access to calibration and other non-event data during event data processing. These conditions data are typically stored in databases that provide versioning functionality, a llowing physicists to make improvements while simultaneously guaranteeing the reproducibility of their results. With the increased complexity of modern experiments, and the evolution of computing models that demand large scale access to conditions data, the solutions for managing this access have evolved over time. In this white paper we give an overview of the conditions data access problem, present convergence on a common solution and present some considerations for the future.

الفيزياء الحسابية فيزياء الطاقة العالية - التجربة

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة المستنصرية

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Building a scalable python distribution for HEP data analysis

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً