ترغب بنشر مسار تعليمي؟ اضغط هنا

Evolution of the ROOT Tree I/O

70   0   0.0 ( 0 )
 نشر من قبل Jakob Blomer
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

The ROOT TTree data format encodes hundreds of petabytes of High Energy and Nuclear Physics events. Its columnar layout drives rapid analyses, as only those parts (branches) that are really used in a given analysis need to be read from storage. Its unique feature is the seamless C++ integration, which allows users to directly store their event classes without explicitly defining data schemas. In this contribution, we present the status and plans of the future ROOT 7 event I/O. Along with the ROOT 7 interface modernization, we aim for robust, where possible compile-time safe C++ interfaces to read and write event data. On the performance side, we show first benchmarks using ROOTs new experimental I/O subsystem that combines the best of TTrees with recent advances in columnar data formats. A core ingredient is a strong separation of the high-level logical data layout (C++ classes) from the low-level physical data layout (storage backed nested vectors of simple types). We show how the new, optimized physical data layout speeds up serialization and deserialization and facilitates parallel, vectorized and bulk operations. This lets ROOT I/O run optimally on the upcoming ultra-fast NVRAM storage devices, as well as file-less storage systems such as object stores.



قيم البحث

اقرأ أيضاً

When processing large amounts of data, the rate at which reading and writing can take place is a critical factor. High energy physics data processing relying on ROOT is no exception. The recent parallelisation of LHC experiments software frameworks a nd the analysis of the ever increasing amount of collision data collected by experiments further emphasized this issue underlying the need of increasing the implicit parallelism expressed within the ROOT I/O. In this contribution we highlight the improvements of the ROOT I/O subsystem which targeted a satisfactory scaling behaviour in a multithreaded context. The effect of parallelism on the individual steps which are chained by ROOT to read and write data, namely (de)compression, (de)serialisation, access to storage backend, are discussed. Performance measurements are discussed through real life examples coming from CMS production workflows on traditional server platforms and highly parallel architectures such as Intel Xeon Phi.
82 - Oksana Shadura 2020
We overview recent changes in the ROOT I/O system, increasing performance and enhancing it and improving its interaction with other data analysis ecosystems. Both the newly introduced compression algorithms, the much faster bulk I/O data path, and a few additional techniques have the potential to significantly to improve experiments software performance. The need for efficient lossless data compression has grown significantly as the amount of HEP data collected, transmitted, and stored has dramatically increased during the LHC era. While compression reduces storage space and, potentially, I/O bandwidth usage, it should not be applied blindly: there are significant trade-offs between the increased CPU cost for reading and writing files and the reduce storage space.
120 - Dong Wen , Lu Qin , Ying Zhang 2015
Core decomposition is a fundamental graph problem with a large number of applications. Most existing approaches for core decomposition assume that the graph is kept in memory of a machine. Nevertheless, many real-world graphs are big and may not resi de in memory. In the literature, there is only one work for I/O efficient core decomposition that avoids loading the whole graph in memory. However, this approach is not scalable to handle big graphs because it cannot bound the memory size and may load most parts of the graph in memory. In addition, this approach can hardly handle graph updates. In this paper, we study I/O efficient core decomposition following a semi-external model, which only allows node information to be loaded in memory. This model works well in many web-scale graphs. We propose a semi-external algorithm and two optimized algorithms for I/O efficient core decomposition using very simple structures and data access model. To handle dynamic graph updates, we show that our algorithm can be naturally extended to handle edge deletion. We also propose an I/O efficient core maintenance algorithm to handle edge insertion, and an improved algorithm to further reduce I/O and CPU cost by investigating some new graph properties. We conduct extensive experiments on 12 real large graphs. Our optimal algorithm significantly outperform the existing I/O efficient algorithm in terms of both processing time and memory consumption. In many memory-resident graphs, our algorithms for both core decomposition and maintenance can even outperform the in-memory algorithm due to the simple structures and data access model used. Our algorithms are very scalable to handle web-scale graphs. As an example, we are the first to handle a web graph with 978.5 million nodes and 42.6 billion edges using less than 4.2 GB memory.
ROOT is a large code base with a complex set of build-time dependencies; there is a significant difference in compilation time between the core of ROOT and the full-fledged deployment. We present results on a delayed build for internal ROOT packages and external packages. This gives the ability to offer a lightweight core of ROOT, later extended by building additional modules to extend the functionality of ROOT. As a part of this work, we have improved the separation of ROOT code into distinct modules and packages with minimal dependencies. This approach gives users better flexibility and the possibility to combine various build features without rebuilding from scratch. Dependency hell is a common problem found in software and particularly in HEP software ecosystem. We would like to discuss an improvement of artifact management (lazy-install) system as a solution to the dependency hell problem. HEP software stack usually consists of multiple sub-projects with dependencies. The development model is often distributed, independent and non-coherent among the sub-projects. We believe that software should be designed to take advantage of other software components that are already available, or have already been designed and implemented for use elsewhere rather than reinventing the wheel. In our contribution, we will present our approach to artifact management system of ROOT together with a set of examples and use cases.
66 - Oksana Shadura 2019
The LHCs Run3 will push the envelope on data-intensive workflows and, since at the lowest level this data is managed using the ROOT software framework, preparations for managing this data are starting already. At the beginning of LHC Run 1, all ROOT data was compressed with the ZLIB algorithm; since then, ROOT has added support for additional algorithms such as LZMA and LZ4, each with unique strengths. This work must continue as industry introduces new techniques - ROOT can benefit saving disk space or reducing the I/O and bandwidth for online and offline needs of experiments by introducing better compression algorithms. In addition to alternate algorithms, we have been exploring alternate techniques to improve parallelism and apply pre-conditioners to the serialized data. We have performed a survey of the performance of the new compression techniques. Our survey includes various use cases of data compression of ROOT files provided by different LHC experiments. We also provide insight into solutions applied to resolve bottlenecks in compression algorithms, resulting in improved ROOT performance.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا