No Arabic abstract
Information and data exchange is an important aspect of scientific progress. In computational materials science, a prerequisite for smooth data exchange is standardization, which means using agreed conventions for, e.g., units, zero base lines, and file formats. There are two main strategies to achieve this goal. One accepts the heterogeneous nature of the community which comprises scientists from physics, chemistry, bio-physics, and materials science, by complying with the diverse ecosystem of computer codes and thus develops converters for the input and output files of all important codes. These converters then translate the data of all important codes into a standardized, code-independent format. The other strategy is to provide standardized open libraries that code developers can adopt for shaping their inputs, outputs, and restart files, directly into the same code-independent format. We like to emphasize in this paper that these two strategies can and should be regarded as complementary, if not even synergetic. The main concepts and software developments of both strategies are very much identical, and, obviously, both approaches should give the same final result. In this paper, we present the appropriate format and conventions that were agreed upon by two teams, the Electronic Structure Library (ESL) of CECAM and the NOMAD (NOvel MAterials Discovery) Laboratory, a European Centre of Excellence (CoE). This discussion includes also the definition of hierarchical metadata describing state-of-the-art electronic-structure calculations.
Materials Cloud is a platform designed to enable open and seamless sharing of resources for computational science, driven by applications in materials modelling. It hosts 1) archival and dissemination services for raw and curated data, together with their provenance graph, 2) modelling services and virtual machines, 3) tools for data analytics, and pre-/post-processing, and 4) educational materials. Data is citable and archived persistently, providing a comprehensive embodiment of the FAIR principles that extends to computational workflows. Materials Cloud leverages the AiiDA framework to record the provenance of entire simulation pipelines (calculations performed, codes used, data generated) in the form of graphs that allow to retrace and reproduce any computed result. When an AiiDA database is shared on Materials Cloud, peers can browse the interconnected record of simulations, download individual files or the full database, and start their research from the results of the original authors. The infrastructure is agnostic to the specific simulation codes used and can support diverse applications in computational science that transcend its initial materials domain.
The prediction of material properties through electronic-structure simulations based on density-functional theory has become routinely common, thanks, in part, to the steady increase in the number and robustness of available simulation packages. This plurality of codes and methods aiming to solve similar problems is both a boon and a burden. While providing great opportunities for cross-verification, these packages adopt different methods, algorithms, and paradigms, making it challenging to choose, master, and efficiently use any one for a given task. Leveraging recent advances in managing reproducible scientific workflows, we demonstrate how developing common interfaces for workflows that automatically compute material properties can tackle the challenge mentioned above, greatly simplifying interoperability and cross-verification. We introduce design rules for reproducible and reusable code-agnostic workflow interfaces to compute well-defined material properties, which we implement for eleven different quantum engines and use to compute three different material properties. Each implementation encodes carefully selected simulation parameters and workflow logic, making the implementers expertise of the quantum engine directly available to non-experts. Full provenance and reproducibility of the workflows is guaranteed through the use of the AiiDA infrastructure. All workflows are made available as open-source and come pre-installed with the Quantum Mobile virtual machine, making their use straightforward.
As the amount of scientific data continues to grow at ever faster rates, the research community is increasingly in need of flexible computational infrastructure that can support the entirety of the data science lifecycle, including long-term data storage, data exploration and discovery services, and compute capabilities to support data analysis and re-analysis, as new data are added and as scientific pipelines are refined. We describe our experience developing data commons-- interoperable infrastructure that co-locates data, storage, and compute with common analysis tools--and present several cases studies. Across these case studies, several common requirements emerge, including the need for persistent digital identifier and metadata services, APIs, data portability, pay for compute capabilities, and data peering agreements between data commons. Though many challenges, including sustainability and developing appropriate standards remain, interoperable data commons bring us one step closer to effective Data Science as Service for the scientific research community.
Combinatorial experiments involve synthesis of sample libraries with lateral composition gradients requiring spatially-resolved characterization of structure and properties. Due to maturation of combinatorial methods and their successful application in many fields, the modern combinatorial laboratory produces diverse and complex data sets requiring advanced analysis and visualization techniques. In order to utilize these large data sets to uncover new knowledge, the combinatorial scientist must engage in data science. For data science tasks, most laboratories adopt common-purpose data management and visualization software. However, processing and cross-correlating data from various measurement tools is no small task for such generic programs. Here we describe COMBIgor, a purpose-built open-source software package written in the commercial Igor Pro environment, designed to offer a systematic approach to loading, storing, processing, and visualizing combinatorial data sets. It includes (1) methods for loading and storing data sets from combinatorial libraries, (2) routines for streamlined data processing, and (3) data analysis and visualization features to construct figures. Most importantly, COMBIgor is designed to be easily customized by a laboratory, group, or individual in order to integrate additional instruments and data-processing algorithms. Utilizing the capabilities of COMBIgor can significantly reduce the burden of data management on the combinatorial scientist.
Laser plasma accelerators have the potential to reduce the size of future linacs for high energy physics by more than an order of magnitude, due to their high gradient. Research is in progress at current facilities, including the BELLA PetaWatt laser at LBNL, towards high quality 10 GeV beams and staging of multiple modules, as well as control of injection and beam quality. The path towards high-energy physics applications will likely involve hundreds of such stages, with beam transport, conditioning and focusing. Current research focuses on addressing physics and R&D challenges required for a detailed conceptual design of a future collider. Here, the tools used to model these accelerators and their resource requirements are summarized, both for current work and to support R&D addressing issues related to collider concepts.