ترغب بنشر مسار تعليمي؟ اضغط هنا

Importance of feature engineering and database selection in a machine learning model: A case study on carbon crystal structures

82   0   0.0 ( 0 )
 نشر من قبل Santanu Saha
 تاريخ النشر 2021
والبحث باللغة English




اسأل ChatGPT حول البحث

Drive towards improved performance of machine learning models has led to the creation of complex features representing a database of condensed matter systems. The complex features, however, do not offer an intuitive explanation on which physical attributes do improve the performance. The effect of the database on the performance of the trained model is often neglected. In this work we seek to understand in depth the effect that the choice of features and the properties of the database have on a machine learning application. In our experiments, we consider the complex phase space of carbon as a test case, for which we use a set of simple, human understandable and cheaply computable features for the aim of predicting the total energy of the crystal structure. Our study shows that (i) the performance of the machine learning model varies depending on the set of features and the database, (ii) is not transferable to every structure in the phase space and (iii) depends on how well structures are represented in the database.



قيم البحث

اقرأ أيضاً

We propose a novel active learning scheme for automatically sampling a minimum number of uncorrelated configurations for fitting the Gaussian Approximation Potential (GAP). Our active learning scheme consists of an unsupervised machine learning (ML) scheme coupled to Bayesian optimization technique that evaluates the GAP model. We apply this scheme to a Hafnium dioxide (HfO2) dataset generated from a melt-quench ab initio molecular dynamics (AIMD) protocol. Our results show that the active learning scheme, with no prior knowledge of the dataset is able to extract a configuration that reaches the required energy fit tolerance. Further, molecular dynamics (MD) simulations performed using this active learned GAP model on 6144-atom systems of amorphous and liquid state elucidate the structural properties of HfO2 with near ab initio precision and quench rates (i.e. 1.0 K/ps) not accessible via AIMD. The melt and amorphous x-ray structural factors generated from our simulation are in good agreement with experiment. Additionally, the calculated diffusion constants are in good agreement with previous ab initio studies.
Longitudinal Dispersion(LD) is the dominant process of scalar transport in natural streams. An accurate prediction on LD coefficient(Dl) can produce a performance leap in related simulation. The emerging machine learning(ML) techniques provide a self -adaptive tool for this problem. However, most of the existing studies utilize an unproved quaternion feature set, obtained through simple theoretical deduction. Few studies have put attention on its reliability and rationality. Besides, due to the lack of comparative comparison, the proper choice of ML models in different scenarios still remains unknown. In this study, the Feature Gradient selector was first adopted to distill the local optimal feature sets directly from multivariable data. Then, a global optimal feature set (the channel width, the flow velocity, the channel slope and the cross sectional area) was proposed through numerical comparison of the distilled local optimums in performance with representative ML models. The channel slope is identified to be the key parameter for the prediction of LDC. Further, we designed a weighted evaluation metric which enables comprehensive model comparison. With the simple linear model as the baseline, a benchmark of single and ensemble learning models was provided. Advantages and disadvantages of the methods involved were also discussed. Results show that the support vector machine has significantly better performance than other models. Decision tree is not suitable for this problem due to poor generalization ability. Notably, simple models show superiority over complicated model on this low-dimensional problem, for their better balance between regression and generalization.
We describe a first open-access database of experimentally investigated hybrid organic-inorganic materials with two-dimensional (2D) perovskite-like crystal structure. The database includes 515 compounds, containing 180 different organic cations, 10 metals (Pb, Sn, Bi, Cd, Cu, Fe, Ge, Mn, Pd, Sb) and 3 halogens (I, Br, Cl) known so far and will be regularly updated. The database contains a geometrical and crystal chemical analysis of the structures, which are useful to reveal quantitative structure-property relationships for this class of compounds. We show that the penetration depth of spacer organic cation into the inorganic layer and M-X-M bond angles increase in the number of inorganic layers (n). The machine learning model is developed and trained on the database, for the prediction of a band gap with accuracy within 0.1 eV. Another machine learning model is trained for the prediction of atomic partial charges with accuracy within 0.01 e. We show that the predicted values of band gaps decrease with an increase of the n and with an increase of M-X-M angles for single-layered perovskites. In general, the proposed database and machine learning models are shown to be useful tools for the rational design of new 2D hybrid perovskite materials.
Along with the development of AI democratization, the machine learning approach, in particular neural networks, has been applied to wide-range applications. In different application scenarios, the neural network will be accelerated on the tailored co mputing platform. The acceleration of neural networks on classical computing platforms, such as CPU, GPU, FPGA, ASIC, has been widely studied; however, when the scale of the application consistently grows up, the memory bottleneck becomes obvious, widely known as memory-wall. In response to such a challenge, advanced quantum computing, which can represent 2^N states with N quantum bits (qubits), is regarded as a promising solution. It is imminent to know how to design the quantum circuit for accelerating neural networks. Most recently, there are initial works studying how to map neural networks to actual quantum processors. To better understand the state-of-the-art design and inspire new design methodology, this paper carries out a case study to demonstrate an end-to-end implementation. On the neural network side, we employ the multilayer perceptron to complete image classification tasks using the standard and widely used MNIST dataset. On the quantum computing side, we target IBM Quantum processors, which can be programmed and simulated by using IBM Qiskit. This work targets the acceleration of the inference phase of a trained neural network on the quantum processor. Along with the case study, we will demonstrate the typical procedure for mapping neural networks to quantum circuits.
In this study, we present a novel approach along with the needed computational strategies for efficient and scalable feature engineering of the crystal structure in compounds of different chemical compositions. This approach utilizes a versatile and extensible framework for the quantification of a three-dimensional (3-D) voxelized crystal structure in the form of 2-point spatial correlations of multiple atomic attributes and performs principal component analysis to extract the low-dimensional features that could be used to build surrogate models for material properties of interest. An application of the proposed feature engineering framework is demonstrated on a case study involving the prediction of the formation energies of crystalline compounds using two vastly different surrogate model building strategies - local Gaussian process regression and neural networks. Specifically, it is shown that the top 25 features (i.e., principal component scores) identified by the proposed framework serve as good regressors for the formation energy of the crystalline substance for both model building strategies.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا