No Arabic abstract
Materials Cloud is a platform designed to enable open and seamless sharing of resources for computational science, driven by applications in materials modelling. It hosts 1) archival and dissemination services for raw and curated data, together with their provenance graph, 2) modelling services and virtual machines, 3) tools for data analytics, and pre-/post-processing, and 4) educational materials. Data is citable and archived persistently, providing a comprehensive embodiment of the FAIR principles that extends to computational workflows. Materials Cloud leverages the AiiDA framework to record the provenance of entire simulation pipelines (calculations performed, codes used, data generated) in the form of graphs that allow to retrace and reproduce any computed result. When an AiiDA database is shared on Materials Cloud, peers can browse the interconnected record of simulations, download individual files or the full database, and start their research from the results of the original authors. The infrastructure is agnostic to the specific simulation codes used and can support diverse applications in computational science that transcend its initial materials domain.
Combinatorial experiments involve synthesis of sample libraries with lateral composition gradients requiring spatially-resolved characterization of structure and properties. Due to maturation of combinatorial methods and their successful application in many fields, the modern combinatorial laboratory produces diverse and complex data sets requiring advanced analysis and visualization techniques. In order to utilize these large data sets to uncover new knowledge, the combinatorial scientist must engage in data science. For data science tasks, most laboratories adopt common-purpose data management and visualization software. However, processing and cross-correlating data from various measurement tools is no small task for such generic programs. Here we describe COMBIgor, a purpose-built open-source software package written in the commercial Igor Pro environment, designed to offer a systematic approach to loading, storing, processing, and visualizing combinatorial data sets. It includes (1) methods for loading and storing data sets from combinatorial libraries, (2) routines for streamlined data processing, and (3) data analysis and visualization features to construct figures. Most importantly, COMBIgor is designed to be easily customized by a laboratory, group, or individual in order to integrate additional instruments and data-processing algorithms. Utilizing the capabilities of COMBIgor can significantly reduce the burden of data management on the combinatorial scientist.
Advances in machine learning have impacted myriad areas of materials science, ranging from the discovery of novel materials to the improvement of molecular simulations, with likely many more important developments to come. Given the rapid changes in this field, it is challenging to understand both the breadth of opportunities as well as best practices for their use. In this review, we address aspects of both problems by providing an overview of the areas where machine learning has recently had significant impact in materials science, and then provide a more detailed discussion on determining the accuracy and domain of applicability of some common types of machine learning models. Finally, we discuss some opportunities and challenges for the materials community to fully utilize the capabilities of machine learning.
Materials informatics has emerged as a promisingly new paradigm for accelerating materials discovery and design. It exploits the intelligent power of machine learning methods in massive materials data from experiments or simulations to seek for new materials, functionality, principles, etc. Developing specialized facility to generate, collect, manage, learn and mine large-scale materials data is crucial to materials informatics. We herein developed an artificial-intelligence-aided data-driven infrastructure named Jilin Artificial-intelligence aided Materials-design Integrated Package (JAMIP), which is an open-source Python framework to meet the research requirements of computational materials informatics. It is integrated by materials production factory, high-throughput first-principles calculations engine, automatic tasks submission and monitoring progress, data extraction, management and storage system, and artificial intelligence machine learning based data mining functions. We have integrated specific features such as inorganic crystal structure prototype database to facilitate high-throughput calculations and essential modules associated with machine learning studies of functional materials. We demonstrated how our developed code is useful in exploring materials informatics of optoelectronic semiconductors by taking halide perovskites as typical case. By obeying the principles of automation, extensibility, reliability and intelligence, the JAMIP code is a promisingly powerful tool contributing to the fast-growing field of computational materials informatics.
Information and data exchange is an important aspect of scientific progress. In computational materials science, a prerequisite for smooth data exchange is standardization, which means using agreed conventions for, e.g., units, zero base lines, and file formats. There are two main strategies to achieve this goal. One accepts the heterogeneous nature of the community which comprises scientists from physics, chemistry, bio-physics, and materials science, by complying with the diverse ecosystem of computer codes and thus develops converters for the input and output files of all important codes. These converters then translate the data of all important codes into a standardized, code-independent format. The other strategy is to provide standardized open libraries that code developers can adopt for shaping their inputs, outputs, and restart files, directly into the same code-independent format. We like to emphasize in this paper that these two strategies can and should be regarded as complementary, if not even synergetic. The main concepts and software developments of both strategies are very much identical, and, obviously, both approaches should give the same final result. In this paper, we present the appropriate format and conventions that were agreed upon by two teams, the Electronic Structure Library (ESL) of CECAM and the NOMAD (NOvel MAterials Discovery) Laboratory, a European Centre of Excellence (CoE). This discussion includes also the definition of hierarchical metadata describing state-of-the-art electronic-structure calculations.
Machine learning approaches, enabled by the emergence of comprehensive databases of materials properties, are becoming a fruitful direction for materials analysis. As a result, a plethora of models have been constructed and trained on existing data to predict properties of new systems. These powerful methods allow researchers to target studies only at interesting materials $unicode{x2014}$ neglecting the non-synthesizable systems and those without the desired properties $unicode{x2014}$ thus reducing the amount of resources spent on expensive computations and/or time-consuming experimental synthesis. However, using these predictive models is not always straightforward. Often, they require a panoply of technical expertise, creating barriers for general users. AFLOW-ML (AFLOW $underline{mathrm{M}}$achine $underline{mathrm{L}}$earning) overcomes the problem by streamlining the use of the machine learning methods developed within the AFLOW consortium. The framework provides an open RESTful API to directly access the continuously updated algorithms, which can be transparently integrated into any workflow to retrieve predictions of electronic, thermal and mechanical properties. These types of interconnected cloud-based applications are envisioned to be capable of further accelerating the adoption of machine learning methods into materials development.