بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

ROOT I/O compression improvements for HEP analysis

83 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Oksana Shadura

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Oksana Shadura

علوم الكمبيوتر

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We overview recent changes in the ROOT I/O system, increasing performance and enhancing it and improving its interaction with other data analysis ecosystems. Both the newly introduced compression algorithms, the much faster bulk I/O data path, and a few additional techniques have the potential to significantly to improve experiments software performance. The need for efficient lossless data compression has grown significantly as the amount of HEP data collected, transmitted, and stored has dramatically increased during the LHC era. While compression reduces storage space and, potentially, I/O bandwidth usage, it should not be applied blindly: there are significant trade-offs between the increased CPU cost for reading and writing files and the reduce storage space.

قيم البحث

66 - Oksana Shadura 2019

The LHCs Run3 will push the envelope on data-intensive workflows and, since at the lowest level this data is managed using the ROOT software framework, preparations for managing this data are starting already. At the beginning of LHC Run 1, all ROOT data was compressed with the ZLIB algorithm; since then, ROOT has added support for additional algorithms such as LZMA and LZ4, each with unique strengths. This work must continue as industry introduces new techniques - ROOT can benefit saving disk space or reducing the I/O and bandwidth for online and offline needs of experiments by introducing better compression algorithms. In addition to alternate algorithms, we have been exploring alternate techniques to improve parallelism and apply pre-conditioners to the serialized data. We have performed a survey of the performance of the new compression techniques. Our survey includes various use cases of data compression of ROOT files provided by different LHC experiments. We also provide insight into solutions applied to resolve bottlenecks in compression algorithms, resulting in improved ROOT performance.

الأداء النظم الموزعة والتوازية والحوسبة العنقودية

Evolution of the ROOT Tree I/O

69 - Jakob Blomer , Philippe Canal , Axel Naumann 2020

The ROOT TTree data format encodes hundreds of petabytes of High Energy and Nuclear Physics events. Its columnar layout drives rapid analyses, as only those parts (branches) that are really used in a given analysis need to be read from storage. Its u nique feature is the seamless C++ integration, which allows users to directly store their event classes without explicitly defining data schemas. In this contribution, we present the status and plans of the future ROOT 7 event I/O. Along with the ROOT 7 interface modernization, we aim for robust, where possible compile-time safe C++ interfaces to read and write event data. On the performance side, we show first benchmarks using ROOTs new experimental I/O subsystem that combines the best of TTrees with recent advances in columnar data formats. A core ingredient is a strong separation of the high-level logical data layout (C++ classes) from the low-level physical data layout (storage backed nested vectors of simple types). We show how the new, optimized physical data layout speeds up serialization and deserialization and facilitates parallel, vectorized and bulk operations. This lets ROOT I/O run optimally on the upcoming ultra-fast NVRAM storage devices, as well as file-less storage systems such as object stores.

قواعد البيانات فيزياء الطاقة العالية - التجربة

Increasing Parallelism in the ROOT I/O Subsystem

49 - Guilherme Amadio , Brian Bockelman , Philippe Canal 2018

When processing large amounts of data, the rate at which reading and writing can take place is a critical factor. High energy physics data processing relying on ROOT is no exception. The recent parallelisation of LHC experiments software frameworks a nd the analysis of the ever increasing amount of collision data collected by experiments further emphasized this issue underlying the need of increasing the implicit parallelism expressed within the ROOT I/O. In this contribution we highlight the improvements of the ROOT I/O subsystem which targeted a satisfactory scaling behaviour in a multithreaded context. The effect of parallelism on the individual steps which are chained by ROOT to read and write data, namely (de)compression, (de)serialisation, access to storage backend, are discussed. Performance measurements are discussed through real life examples coming from CMS production workflows on traditional server platforms and highly parallel architectures such as Intel Xeon Phi.

النظم الموزعة والتوازية والحوسبة العنقودية

A New Rational Approach to the Square Root of 5

103 - Shenghui Su , Jianhua Zheng , 2021

In this paper, authors construct a new type of sequence which is named an extra-super increasing sequence, and give the definitions of the minimal super increasing sequence {a[0], a[1], ..., a[n]} and minimal extra-super increasing sequence {z[0], z[ 1], ..., z[n]}. Find that there always exists a fit n which makes (z[n] / z[n-1] - a[n] / a[n-1])= PHI, where PHI is the golden ratio conjugate with a finite precision in the range of computer expression. Further, derive the formula radic(5) = 2(z[n] / z[n-1] - a[n] / a[n-1]) + 1, where n corresponds to the demanded precision. Experiments demonstrate that the approach to radic(5) through a term ratio difference is more smooth and expeditious than through a Taylor power series, and convince the authors that lim(n to infinity) (z[n] / z[n-1] - a[n] / a[n-1]) = PHI holds.

علوم الكمبيوتر

Optimizing ROOT IO For Analysis

53 - Brian Bockelman , Zhe Zhang , Jim Pivarski 2017

The ROOT I/O (RIO) subsystem is foundational to most HEP experiments - it provides a file format, a set of APIs/semantics, and a reference implementation in C++. It is often found at the base of an experiments framework and is used to serialize the e xperiments data; in the case of an LHC experiment, this may be hundreds of petabytes of files! Individual physicists will further use RIO to perform their end-stage analysis, reading from intermediate files they generate from experiment data. RIO is thus incredibly flexible: it must serve as a file format for archival (optimized for space) and for working data (optimized for read speed). To date, most of the technical work has focused on improving the former use case. We present work designed to help improve RIO for analysis. We analyze the real-world impact of LZ4 to decrease decompression times (and the corresponding cost in disk space). We introduce new APIs that read RIO data in bulk, removing the per-event overhead of a C++ function call. We compare the performance with the existing RIO APIs for simple structure data and show how this can be complimentary with efforts to improve the parallelism of the RIO stack.

النظم الموزعة والتوازية والحوسبة العنقودية

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الأكاديمية العربية للعلوم والتكنولوجيا والنقل البحري

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

ROOT I/O compression improvements for HEP analysis

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً