Robust model benchmarking and bias-imbalance in data-driven materials science: a case study on MODNet

55 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Pierre-Paul De Breuck

تاريخ النشر 2021

مجال البحث فيزياء

والبحث باللغة English

تأليف Pierre-Paul De Breuck - Matthew L. Evans - Gian-Marco Rignanese

علم المواد

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

As the number of novel data-driven approaches to material science continues to grow, it is crucial to perform consistent quality, reliability and applicability assessments of model performance. In this paper, we benchmark the Materials Optimal Descriptor Network (MODNet) method and architecture against the recently released MatBench v0.1, a curated test suite of materials datasets. MODNet is shown to outperform current leaders on 6 of the 13 tasks, whilst closely matching the current leaders on a further 2 tasks; MODNet performs particularly well when the number of samples is below 10,000. Attention is paid to two topics of concern when benchmarking models. First, we encourage the reporting of a more diverse set of metrics as it leads to a more comprehensive and holistic comparison of model performance. Second, an equally important task is the uncertainty assessment of a model towards a target domain. Significant variations in validation errors can be observed, depending on the imbalance and bias in the training set (i.e., similarity between training and application space). By using an ensemble MODNet model, confidence intervals can be built and the uncertainty on individual predictions can be quantified. Imbalance and bias issues are often overlooked, and yet are important for successful real-world applications of machine learning in materials science and condensed matter.

قيم البحث

91 - Qiaohao Liang , Aldair E. Gongora , Zekun Ren 2021

In the field of machine learning (ML) for materials optimization, active learning algorithms, such as Bayesian Optimization (BO), have been leveraged for guiding autonomous and high-throughput experimentation systems. However, very few studies have e valuated the efficiency of BO as a general optimization algorithm across a broad range of experimental materials science domains. In this work, we evaluate the performance of BO algorithms with a collection of surrogate model and acquisition function pairs across five diverse experimental materials systems, namely carbon nanotube polymer blends, silver nanoparticles, lead-halide perovskites, as well as additively manufactured polymer structures and shapes. By defining acceleration and enhancement metrics for general materials optimization objectives, we find that for surrogate model selection, Gaussian Process (GP) with anisotropic kernels (automatic relevance detection, ARD) and Random Forests (RF) have comparable performance and both outperform the commonly used GP without ARD. We discuss the implicit distributional assumptions of RF and GP, and the benefits of using GP with anisotropic kernels in detail. We provide practical insights for experimentalists on surrogate model selection of BO during materials optimization campaigns.

علم المواد التعلم الآلي تحليل البيانات والإحصاءات والاحتمال

COMBIgor: data analysis package for combinatorial materials science

205 - Kevin R. Talley , Sage R. Bauers , Celeste L. Melamed 2019

Combinatorial experiments involve synthesis of sample libraries with lateral composition gradients requiring spatially-resolved characterization of structure and properties. Due to maturation of combinatorial methods and their successful application in many fields, the modern combinatorial laboratory produces diverse and complex data sets requiring advanced analysis and visualization techniques. In order to utilize these large data sets to uncover new knowledge, the combinatorial scientist must engage in data science. For data science tasks, most laboratories adopt common-purpose data management and visualization software. However, processing and cross-correlating data from various measurement tools is no small task for such generic programs. Here we describe COMBIgor, a purpose-built open-source software package written in the commercial Igor Pro environment, designed to offer a systematic approach to loading, storing, processing, and visualizing combinatorial data sets. It includes (1) methods for loading and storing data sets from combinatorial libraries, (2) routines for streamlined data processing, and (3) data analysis and visualization features to construct figures. Most importantly, COMBIgor is designed to be easily customized by a laboratory, group, or individual in order to integrate additional instruments and data-processing algorithms. Utilizing the capabilities of COMBIgor can significantly reduce the burden of data management on the combinatorial scientist.

علم المواد الفيزياء الحسابية

Computation and data driven discovery of topological phononic materials

93 - Jiangxu Li , Jiaxi Liu , Stanley A. Baronett 2020

The discovery of topological quantum states marks a new chapter in both condensed matter physics and materials sciences. By analogy to spin electronic system, topological concepts have been extended into phonons, boosting the birth of topological pho nonics (TPs). Here, we present a high-throughput screening and data-driven approach to compute and evaluate TPs among over 10,000 materials. We have clarified 5014 TP materials and classified them into single Weyl, high degenerate Weyl, and nodal-line (ring) TPs. Among them, three representative cases of TPs have been discussed in detail. Furthermore, we suggest 322 TP materials with potential clean nontrivial surface states, which are favorable for experimental characterizations. This work significantly increases the current library of TP materials, which enables an in-depth investigation of their structure-property relations and opens new avenues for future device design related to TPs.

علم المواد الفيزياء ميسكالي وننكالي

Multiscale modeling of materials: Computing, data science,uncertainty and goal-oriented optimization

70 - Nikola Kovachki , Burigede Liu , Xingsheng Sun 2021

The recent decades have seen various attempts at accelerating the process of developing materials targeted towards specific applications. The performance required for a particular application leads to the choice of a particular material system whose properties are optimized by manipulating its underlying microstructure through processing. The specific configuration of the structure is then designed by characterizing the material in detail, and using this characterization along with physical principles in system level simulations and optimization. These have been advanced by multiscale modeling of materials, high-throughput experimentations, materials data-bases, topology optimization and other ideas. Still, developing materials for extreme applications involving large deformation, high strain rates and high temperatures remains a challenge. This article reviews a number of recent methods that advance the goal of designing materials targeted by specific applications.

علم المواد

Importance of feature engineering and database selection in a machine learning model: A case study on carbon crystal structures

81 - Franz M. Rohrhofer , Santanu Saha , Simone Di Cataldo 2021

Drive towards improved performance of machine learning models has led to the creation of complex features representing a database of condensed matter systems. The complex features, however, do not offer an intuitive explanation on which physical attr ibutes do improve the performance. The effect of the database on the performance of the trained model is often neglected. In this work we seek to understand in depth the effect that the choice of features and the properties of the database have on a machine learning application. In our experiments, we consider the complex phase space of carbon as a test case, for which we use a set of simple, human understandable and cheaply computable features for the aim of predicting the total energy of the crystal structure. Our study shows that (i) the performance of the machine learning model varies depending on the set of features and the database, (ii) is not transferable to every structure in the phase space and (iii) depends on how well structures are represented in the database.

علم المواد التعلم الآلي الفيزياء الحسابية

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة الافتراضية السورية

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Robust model benchmarking and bias-imbalance in data-driven materials science: a case study on MODNet

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً