ﻻ يوجد ملخص باللغة العربية
New techniques in X-ray scattering science experiments produce large data sets that can require millions of high-performance processing hours per week of computation for analysis. In such applications, data is typically moved from X-ray detectors to a large parallel file system shared by all nodes of a petascale supercomputer and then is read repeatedly as different science application tasks proceed. However, this straightforward implementation causes significant contention in the file system. We propose an alternative approach in which data is instead staged into and cached in compute node memory for extended periods, during which time various processing tasks may efficiently access it. We describe here such a big data staging framework, based on MPI-IO and the Swift parallel scripting language. We discuss a range of large-scale data management issues involved in X-ray scattering science and measure the performance benefits of the new staging framework for high-energy diffraction microscopy, an important emerging application in data-intensive X-ray scattering. We show that our framework accelerates scientific processing turnaround from three months to under 10 minutes, and that our I/O technique reduces input overheads by a factor of 5 on 8K Blue Gene/Q nodes.
In this work we present kiwiPy, a Python library designed to support robust message based communication for high-throughput, big-data, applications while being general enough to be useful wherever high-volumes of messages need to be communicated in a
Transparently checkpointing MPI for fault tolerance and load balancing is a long-standing problem in HPC. The problem has been complicated by the need to provide checkpoint-restart services for all combinations of an MPI implementation over all netwo
The ROOT I/O (RIO) subsystem is foundational to most HEP experiments - it provides a file format, a set of APIs/semantics, and a reference implementation in C++. It is often found at the base of an experiments framework and is used to serialize the e
The HEP community is approaching an era were the excellent performances of the particle accelerators in delivering collision at high rate will force the experiments to record a large amount of information. The growing size of the datasets could poten
We present a simple library which equips MPI implementations with truly asynchronous non-blocking point-to-point operations, and which is independent of the underlying communication infrastructure. It utilizes the MPI profiling interface (PMPI) and t