Fast and interpretable classification of small X-ray diffraction datasets using data augmentation and deep neural networks

418 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Felipe Oviedo

تاريخ النشر 2018

مجال البحث فيزياء

والبحث باللغة English

تأليف Felipe Oviedo - Zekun Ren - Shijing Sun

تحليل البيانات والإحصاءات والاحتمال علم المواد التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

X-ray diffraction (XRD) data acquisition and analysis is among the most time-consuming steps in the development cycle of novel thin-film materials. We propose a machine-learning-enabled approach to predict crystallographic dimensionality and space group from a limited number of thin-film XRD patterns. We overcome the scarce-data problem intrinsic to novel materials development by coupling a supervised machine learning approach with a model agnostic, physics-informed data augmentation strategy using simulated data from the Inorganic Crystal Structure Database (ICSD) and experimental data. As a test case, 115 thin-film metal halides spanning 3 dimensionalities and 7 space-groups are synthesized and classified. After testing various algorithms, we develop and implement an all convolutional neural network, with cross validated accuracies for dimensionality and space-group classification of 93% and 89%, respectively. We propose average class activation maps, computed from a global average pooling layer, to allow high model interpretability by human experimentalists, elucidating the root causes of misclassification. Finally, we systematically evaluate the maximum XRD pattern step size (data acquisition rate) before loss of predictive accuracy occurs, and determine it to be 0.16{deg}, which enables an XRD pattern to be obtained and classified in 5.5 minutes or less.

قيم البحث

104 - Weizong Xu , James M. LeBeau 2017

We establish a series of deep convolutional neural networks to automatically analyze position averaged convergent beam electron diffraction patterns. The networks first calibrate the zero-order disk size, center position, and rotation without the nee d for pretreating the data. With the aligned data, additional networks then measure the sample thickness and tilt. The performance of the network is explored as a function of a variety of variables including thickness, tilt, and dose. A methodology to explore the response of the neural network to various pattern features is also presented. Processing patterns at a rate of $sim$0.1 s/pattern, the network is shown to be orders of magnitude faster than a brute force method while maintaining accuracy. The approach is thus suitable for automatically processing big, 4D STEM data. We also discuss the generality of the method to other materials/orientations as well as a hybrid approach that combines the features of the neural network with least squares fitting for even more robust analysis. The source code is available at https://github.com/subangstrom/DeepDiffraction.

تحليل البيانات والإحصاءات والاحتمال علم المواد

A deep learned nanowire segmentation model using synthetic data augmentation

113 - Binbin Lin , Nima Emami , Bai-Xiang Xu 2021

Automatized object identification and feature analysis of experimental image data are indispensable for data-driven material science; deep-learning-based segmentation algorithms have been shown to be a promising technique to achieve this goal. Howeve r, acquiring high-resolution experimental images and assigning labels in order to train such algorithms is challenging and costly in terms of both time and labor. In the present work, we apply synthetic images, which resemble the experimental image data in terms of geometrical and visual features, to train state-of-art deep learning-based Mask R-CNN algorithms to segment vanadium pentoxide (V2O5) nanowires, a canonical cathode material, within optical intensity-based images from spectromicroscopy. The performance evaluation demonstrates that even though the deep learning model is trained on pure synthetically generated structures, it can segment real optical intensity-based spectromicroscopy images of complex V2O5 nanowire structures in overlapped particle networks, thus providing reliable statistical information. The model can further be used to segment nanowires in scanning electron microscopy (SEM) images, which are fundamentally different from the training dataset known to the model. The proposed methodology of using a purely synthetic dataset to train the deep learning model can be extended to any optical intensity-based images of variable particle morphology, extent of agglomeration, material class, and beyond.

تحليل البيانات والإحصاءات والاحتمال معالجة الصور والفيديو

Fragment Graphical Variational AutoEncoding for Screening Molecules with Small Data

139 - John Armitage , Leszek J. Spalek , Malgorzata Nguyen 2019

In the majority of molecular optimization tasks, predictive machine learning (ML) models are limited due to the unavailability and cost of generating big experimental datasets on the specific task. To circumvent this limitation, ML models are trained on big theoretical datasets or experimental indicators of molecular suitability that are either publicly available or inexpensive to acquire. These approaches produce a set of candidate molecules which have to be ranked using limited experimental data or expert knowledge. Under the assumption that structure is related to functionality, here we use a molecular fragment-based graphical autoencoder to generate unique structural fingerprints to efficiently search through the candidate set. We demonstrate that fragment-based graphical autoencoding reduces the error in predicting physical characteristics such as the solubility and partition coefficient in the small data regime compared to other extended circular fingerprints and string based approaches. We further demonstrate that this approach is capable of providing insight into real world molecular optimization problems, such as searching for stabilization additives in organic semiconductors by accurately predicting 92% of test molecules given 69 training examples. This task is a model example of black box molecular optimization as there is minimal theoretical and experimental knowledge to accurately predict the suitability of the additives.

تحليل البيانات والإحصاءات والاحتمال علم المواد التعلم الآلي

Training artificial neural networks for precision orientation and strain mapping using 4D electron diffraction datasets

95 - Renliang Yuan , Jiong Zhang , Lingfeng He 2021

Techniques for training artificial neural networks (ANNs) and convolutional neural networks (CNNs) using simulated dynamical electron diffraction patterns are described. The premise is based on the following facts. First, given a suitable crystal str ucture model and scattering potential, electron diffraction patterns can be simulated accurately using dynamical diffraction theory. Secondly, using simulated diffraction patterns as input, ANNs can be trained for the determination of crystal structural properties, such as crystal orientation and local strain. Further, by applying the trained ANNs to four-dimensional diffraction datasets (4D-DD) collected using the scanning electron nanodiffraction (SEND) or 4D scanning transmission electron microscopy (4D-STEM) techniques, the crystal structural properties can be mapped at high spatial resolution. Here, we demonstrate the ANN-enabled possibilities for the analysis of crystal orientation and strain at high precision and benchmark the performance of ANNs and CNNs by comparing with previous methods. A factor of thirty improvement in angular resolution at 0.009 degrees (0.16 mrad) for orientation mapping, sensitivity at 0.04% or less for strain mapping, and improvements in computational performance are demonstrated.

الفيزياء ميسكالي وننكالي علم المواد

Towards Highly Accurate Coral Texture Images Classification Using Deep Convolutional Neural Networks and Data Augmentation

73 - Anabel Gomez-Rios , Siham Tabik , Julian Luengo 2018

The recognition of coral species based on underwater texture images pose a significant difficulty for machine learning algorithms, due to the three following challenges embedded in the nature of this data: 1) datasets do not include information about the global structure of the coral; 2) several species of coral have very similar characteristics; and 3) defining the spatial borders between classes is difficult as many corals tend to appear together in groups. For this reason, the classification of coral species has always required an aid from a domain expert. The objective of this paper is to develop an accurate classification model for coral texture images. Current datasets contain a large number of imbalanced classes, while the images are subject to inter-class variation. We have analyzed 1) several Convolutional Neural Network (CNN) architectures, 2) data augmentation techniques and 3) transfer learning. We have achieved the state-of-the art accuracies using different variations of ResNet on the two current coral texture datasets, EILAT and RSMAS.

الرؤية الحاسوبية وتمييز الأنماط