ترغب بنشر مسار تعليمي؟ اضغط هنا

Machine learning with persistent homology and chemical word embeddings improves prediction accuracy and interpretability in metal-organic frameworks

166   0   0.0 ( 0 )
 نشر من قبل Aditi Krishnapriyan
 تاريخ النشر 2020
والبحث باللغة English




اسأل ChatGPT حول البحث

Machine learning has emerged as a powerful approach in materials discovery. Its major challenge is selecting features that create interpretable representations of materials, useful across multiple prediction tasks. We introduce an end-to-end machine learning model that automatically generates descriptors that capture a complex representation of a materials structure and chemistry. This approach builds on computational topology techniques (namely, persistent homology) and word embeddings from natural language processing. It automatically encapsulates geometric and chemical information directly from the material system. We demonstrate our approach on multiple nanoporous metal-organic framework datasets by predicting methane and carbon dioxide adsorption across different conditions. Our results show considerable improvement in both accuracy and transferability across targets compared to models constructed from the commonly-used, manually-curated features, consistently achieving an average 25-30% decrease in root-mean-squared-deviation and an average increase of 40-50% in R2 scores. A key advantage of our approach is interpretability: Our model identifies the pores that correlate best to adsorption at different pressures, which contributes to understanding atomic-level structure--property relationships for materials design.



قيم البحث

اقرأ أيضاً

We report a workflow and the output of a natural language processing (NLP)-based procedure to mine the extant metal-organic framework (MOF) literature describing structurally characterized MOFs and their solvent removal and thermal stabilities. We ob tain over 2,000 solvent removal stability measures from text mining and 3,000 thermal decomposition temperatures from thermogravimetric analysis data. We assess the validity of our NLP methods and the accuracy of our extracted data by comparing to a hand-labeled subset. Machine learning (ML, i.e. artificial neural network) models trained on this data using graph- and pore-geometry-based representations enable prediction of stability on new MOFs with quantified uncertainty. Our web interface, MOFSimplify, provides users access to our curated data and enables them to harness that data for predictions on new MOFs. MOFSimplify also encourages community feedback on existing data and on ML model predictions for community-based active learning for improved MOF stability models.
158 - Aditya Nandy , Chenru Duan , 2021
Although the tailored metal active sites and porous architectures of MOFs hold great promise for engineering challenges ranging from gas separations to catalysis, a lack of understanding of how to improve their stability limits their use in practice. To overcome this limitation, we extract thousands of published reports of the key aspects of MOF stability necessary for their practical application: the ability to withstand high temperatures without degrading and the capacity to be activated by removal of solvent molecules. From nearly 4,000 manuscripts, we use natural language processing and automated image analysis to obtain over 2,000 solvent-removal stability measures and 3,000 thermal degradation temperatures. We analyze the relationships between stability properties and the chemical and geometric structures in this set to identify limits of prior heuristics derived from smaller sets of MOFs. By training predictive machine learning (ML, i.e., Gaussian process and artificial neural network) models to encode the structure-property relationships with graph- and pore-structure-based representations, we are able to make predictions of stability orders of magnitude faster than conventional physics-based modeling or experiment. Interpretation of important features in ML models provides insights that we use to identify strategies to engineer increased stability into typically unstable 3d-containing MOFs that are frequently targeted for catalytic applications. We expect our approach to accelerate the time to discovery of stable, practical MOF materials for a wide range of applications.
We apply persistent homology to the task of discovering and characterizing phase transitions, using lattice spin models from statistical physics for working examples. Persistence images provide a useful representation of the homological data for cond ucting statistical tasks. To identify the phase transitions, a simple logistic regression on these images is sufficient for the models we consider, and interpretable order parameters are then read from the weights of the regression. Magnetization, frustration and vortex-antivortex structure are identified as relevant features for characterizing phase transitions.
The enormous structural and chemical diversity of metal-organic frameworks (MOFs) forces researchers to actively use simulation techniques on an equal footing with experiments. MOFs are widely known for outstanding adsorption properties, so precise d escription of host-guest interactions is essential for high-throughput screening aimed at ranking the most promising candidates. However, highly accurate ab initio calculations cannot be routinely applied to model thousands of structures due to the demanding computational costs. On the other side, methods based on force field (FF) parametrization suffer from low transferability. To resolve this accuracy-efficiency dilemma, we apply the machine learning (ML) approach. The trained models reproduce atom-in-material quantities, including partial charges, polarizabilities, dispersion coefficients, quantum Drude oscillator and electron cloud parameters within the accuracy of underlying density functional theory method. The aforementioned FF precursors make it possible to thoroughly describe non-covalent interactions typical for MOF-adsorbate systems: electrostatic, dispersion, polarization, and short-range repulsion. The presented approach can also significantly facilitate hybrid atomistic simulations/ML workflows.
In topological data analysis, persistent homology is used to study the shape of data. Persistent homology computations are completely characterized by a set of intervals called a bar code. It is often said that the long intervals represent the topolo gical signal and the short intervals represent noise. We give evidence to dispute this thesis, showing that the short intervals encode geometric information. Specifically, we prove that persistent homology detects the curvature of disks from which points have been sampled. We describe a general computational framework for solving inverse problems using the average persistence landscape, a continuous mapping from metric spaces with a probability measure to a Hilbert space. In the present application, the average persistence landscapes of points sampled from disks of constant curvature results in a path in this Hilbert space which may be learned using standard tools from statistical and machine learning.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا