No Arabic abstract
The Joint Automated Repository for Various Integrated Simulations (JARVIS) is an integrated infrastructure to accelerate materials discovery and design using density functional theory (DFT), classical force-fields (FF), and machine learning (ML) techniques. JARVIS is motivated by the Materials Genome Initiative (MGI) principles of developing open-access databases and tools to reduce the cost and development time of materials discovery, optimization, and deployment. The major features of JARVIS are: JARVIS-DFT, JARVIS-FF, JARVIS-ML, and JARVIS-Tools. To date, JARVIS consists of 40,000 materials and 1 million calculated properties in JARVIS-DFT, 1,500 materials and 110 force-fields in JARVIS-FF, and 25 ML models for material-property predictions in JARVIS-ML, all of which are continuously expanding. JARVIS-Tools provides scripts and workflows for running and analyzing various simulations. We compare our computational data to experiments or high-fidelity computational methods wherever applicable to evaluate error/uncertainty in predictions. In addition to the existing workflows, the infrastructure can support a wide variety of other technologically important applications as part of the data-driven materials design paradigm. The JARVIS datasets and tools are publicly available at the website: https://jarvis.nist.gov .
Materials informatics has emerged as a promisingly new paradigm for accelerating materials discovery and design. It exploits the intelligent power of machine learning methods in massive materials data from experiments or simulations to seek for new materials, functionality, principles, etc. Developing specialized facility to generate, collect, manage, learn and mine large-scale materials data is crucial to materials informatics. We herein developed an artificial-intelligence-aided data-driven infrastructure named Jilin Artificial-intelligence aided Materials-design Integrated Package (JAMIP), which is an open-source Python framework to meet the research requirements of computational materials informatics. It is integrated by materials production factory, high-throughput first-principles calculations engine, automatic tasks submission and monitoring progress, data extraction, management and storage system, and artificial intelligence machine learning based data mining functions. We have integrated specific features such as inorganic crystal structure prototype database to facilitate high-throughput calculations and essential modules associated with machine learning studies of functional materials. We demonstrated how our developed code is useful in exploring materials informatics of optoelectronic semiconductors by taking halide perovskites as typical case. By obeying the principles of automation, extensibility, reliability and intelligence, the JAMIP code is a promisingly powerful tool contributing to the fast-growing field of computational materials informatics.
Combinatorial experiments involve synthesis of sample libraries with lateral composition gradients requiring spatially-resolved characterization of structure and properties. Due to maturation of combinatorial methods and their successful application in many fields, the modern combinatorial laboratory produces diverse and complex data sets requiring advanced analysis and visualization techniques. In order to utilize these large data sets to uncover new knowledge, the combinatorial scientist must engage in data science. For data science tasks, most laboratories adopt common-purpose data management and visualization software. However, processing and cross-correlating data from various measurement tools is no small task for such generic programs. Here we describe COMBIgor, a purpose-built open-source software package written in the commercial Igor Pro environment, designed to offer a systematic approach to loading, storing, processing, and visualizing combinatorial data sets. It includes (1) methods for loading and storing data sets from combinatorial libraries, (2) routines for streamlined data processing, and (3) data analysis and visualization features to construct figures. Most importantly, COMBIgor is designed to be easily customized by a laboratory, group, or individual in order to integrate additional instruments and data-processing algorithms. Utilizing the capabilities of COMBIgor can significantly reduce the burden of data management on the combinatorial scientist.
Data-driven approaches have been proposed as effective strategies for the inverse design and optimization of photonic structures in recent years. In order to assist data-driven methods for the design of topology of photonic devices, we propose a topological encoding method that transforms photonic structures represented by binary images to a continuous sparse representation. This sparse representation can be utilized for dimensionality reduction and dataset generation, enabling effective analysis and optimization of photonic topologies with data-driven approaches. As a proof of principle, we leverage our encoding method for the design of two dimensional non-paraxial diffractive optical elements with various diffraction intensity distributions. We proved that our encoding method is able to assist machine-learning-based inverse design approach for accurate and global optimization.
The recent surge in the adoption of machine learning techniques for materials design, discovery, and characterization has resulted in an increased interest and application of Image Driven Machine Learning (IDML) approaches. In this work, we review the application of IDML to the field of materials characterization. A hierarchy of six action steps is defined which compartmentalizes a problem statement into well-defined modules. The studies reviewed in this work are analyzed through the decisions adopted by them at each of these steps. Such a review permits a granular assessment of the field, for example the impact of IDML on materials characterization at the nanoscale, the number of images in a typical dataset required to train a semantic segmentation model on electron microscopy images, the prevalence of transfer learning in the domain, etc. Finally, we discuss the importance of interpretability and explainability, and provide an overview of two emerging techniques in the field: semantic segmentation and generative adversarial networks.
In materials science and engineering, one is typically searching for materials that exhibit exceptional performance for a certain function, and the number of these materials is extremely small. Thus, statistically speaking, we are interested in the identification of *rare phenomena*, and the scientific discovery typically resembles the proverbial hunt for the needle in a haystack.