No Arabic abstract
In the majority of molecular optimization tasks, predictive machine learning (ML) models are limited due to the unavailability and cost of generating big experimental datasets on the specific task. To circumvent this limitation, ML models are trained on big theoretical datasets or experimental indicators of molecular suitability that are either publicly available or inexpensive to acquire. These approaches produce a set of candidate molecules which have to be ranked using limited experimental data or expert knowledge. Under the assumption that structure is related to functionality, here we use a molecular fragment-based graphical autoencoder to generate unique structural fingerprints to efficiently search through the candidate set. We demonstrate that fragment-based graphical autoencoding reduces the error in predicting physical characteristics such as the solubility and partition coefficient in the small data regime compared to other extended circular fingerprints and string based approaches. We further demonstrate that this approach is capable of providing insight into real world molecular optimization problems, such as searching for stabilization additives in organic semiconductors by accurately predicting 92% of test molecules given 69 training examples. This task is a model example of black box molecular optimization as there is minimal theoretical and experimental knowledge to accurately predict the suitability of the additives.
X-ray diffraction (XRD) data acquisition and analysis is among the most time-consuming steps in the development cycle of novel thin-film materials. We propose a machine-learning-enabled approach to predict crystallographic dimensionality and space group from a limited number of thin-film XRD patterns. We overcome the scarce-data problem intrinsic to novel materials development by coupling a supervised machine learning approach with a model agnostic, physics-informed data augmentation strategy using simulated data from the Inorganic Crystal Structure Database (ICSD) and experimental data. As a test case, 115 thin-film metal halides spanning 3 dimensionalities and 7 space-groups are synthesized and classified. After testing various algorithms, we develop and implement an all convolutional neural network, with cross validated accuracies for dimensionality and space-group classification of 93% and 89%, respectively. We propose average class activation maps, computed from a global average pooling layer, to allow high model interpretability by human experimentalists, elucidating the root causes of misclassification. Finally, we systematically evaluate the maximum XRD pattern step size (data acquisition rate) before loss of predictive accuracy occurs, and determine it to be 0.16{deg}, which enables an XRD pattern to be obtained and classified in 5.5 minutes or less.
Monte-Carlo (MC) methods, based on random updates and the trial-and-error principle, are well suited to retrieve particle size distributions from small-angle scattering patterns of dilute solutions of scatterers. The size sensitivity of size determination methods in relation to the range of scattering vectors covered by the data is discussed. Improvements are presented to existing MC methods in which the particle shape is assumed to be known. A discussion of the problems with the ambiguous convergence criteria of the MC methods are given and a convergence criterion is proposed, which also allows the determination of uncertainties on the determined size distributions.
We present general algorithms to convert scattering data of linear and area detectors recorded in various scattering geometries to reciprocal space coordinates. The presented algorithms work for any goniometer configuration including popular four-circle, six-circle and kappa goniometers. We avoid the use of commonly employed approximations and therefore provide algorithms which work also for large detectors at small sample detector distances. A recipe for determining the necessary detector parameters including mostly ignored misalignments is given. The algorithms are implemented in a freely available open-source package.
We present an open-source program free to download for academic use with full user-friendly graphical interface for performing flexible and robust background subtraction and dipole fitting on magnetization data. For magnetic samples with small moment sizes or sample environments with large or asymmetric magnetic backgrounds, it can become necessary to separate background and sample contributions to each measured raw voltage measurement before fitting the dipole signal to extract magnetic moments. Originally designed for use with pressure cells on a Quantum Design MPMS3 SQUID magnetometer, SquidLab is a modular object-oriented platform implemented in Matlab with a range of importers for different widely-available magnetometer systems (including MPMS, MPMS-XL, MPMS-IQuantum, MPMS3 and S700X models), and has been tested with a broad variety of background and signal types. The software allows background subtraction of baseline signals, signal preprocessing, and performing fits to dipole data using Levenberg-Marquadt non-linear least squares, or a singular value decomposition linear algebra algorithm which excels at picking out noisy or weak dipole signals. A plugin system allows users to easily extend the built-in functionality with their own importers, processes or fitting algorithms. SquidLab can be downloaded, under Academic License, from the University of Warwick depository (wrap.warwick.ac.uk/129665).
Bayesian inference is a widely used and powerful analytical technique in fields such as astronomy and particle physics but has historically been underutilized in some other disciplines including semiconductor devices. In this work, we introduce Bayesim, a Python package that utilizes adaptive grid sampling to efficiently generate a probability distribution over multiple input parameters to a forward model using a collection of experimental measurements. We discuss the implementation choices made in the code, showcase two examples in photovoltaics, and discuss general prerequisites for the approach to apply to other systems.