Identifying the fragment structure of the organic compounds by deeply learning the original NMR data

211 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Weihua Deng Professor

تاريخ النشر 2021

مجال البحث علم الأحياء الهندسة المعلوماتية

والبحث باللغة English

تأليف Chongcan Li - Yong Cong -

الأساليب الكمية التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We preprocess the raw NMR spectrum and extract key characteristic features by using two different methodologies, called equidistant sampling and peak sampling for subsequent substructure pattern recognition; meanwhile may provide the alternative strategy to address the imbalance issue of the NMR dataset frequently encountered in dataset collection of statistical modeling and establish two conventional SVM and KNN models to assess the capability of two feature selection, respectively. Our results in this study show that the models using the selected features of peak sampling outperform the ones using the other. Then we build the Recurrent Neural Network (RNN) model trained by Data B collected from peak sampling. Furthermore, we illustrate the easier optimization of hyper parameters and the better generalization ability of the RNN deep learning model by comparison with traditional machine learning SVM and KNN models in detail.

قيم البحث

72 - Stefan Kuhn , Eda Tumer , Simon Colreavy-Donnelly 2021

This paper presents a method to identify substructures in NMR spectra of mixtures, specifically 2D spectra, using a bespoke image-based Convolutional Neural Network application. This is done using HSQC and HMBC spectra separately and in combination. The application can reliably detect substructures in pure compounds, using a simple network. It can work for mixtures when trained on pure compounds only. HMBC data and the combination of HMBC and HSQC show better results than HSQC alone.

الأساليب الكمية الذكاء الاصطناعي التعلم الآلي

Estimate Metabolite Taxonomy and Structure with a Fragment-Centered Database and Fragment Network

91 - Hansen Zhao , Xu Zhao , Huan Yao 2021

Metabolite structure identification has become the major bottleneck of the mass spectrometry based metabolomics research. Till now, number of mass spectra databases and search algorithms have been developed to address this issue. However, two critica l problems still exist: the low chemical component record coverage in databases and significant MS/MS spectra variations related to experiment equipment and parameter settings. In this work, we considered the molecule fragment as basic building blocks of the metabolic components which had relatively consistent signatures in MS/MS spectra. And from a bottom-up point of view, we built a fragment centered database, MSFragDB, by reorganizing the data from the Human Metabolome Database (HMDB) and developed an intensity-free searching algorithm to search and rank the most relative metabolite according to the users input. We also proposed the concept of fragment network, a graph structure that encoded the relationship between the molecule fragments to find close motif that indicated a specific chemical structure. Although based on the same dataset as the HMDB, validation results implied that the MSFragDB had a higher hit ratio and furthermore, estimated possible taxonomy that a query spectrum belongs to when the corresponding chemical component was missing in the database. Aid by the Fragment Network, the MSFragDB was also proved to be able to estimate the right structure while the MS/MS spectrum suffers from the precursor-contamination. The strategy proposed is general and can be adopted in existing databases. We believe MSFragDB and Fragment Network can improve the performance of structure identification with existing data. The beta version of the database is freely available at www.xrzhanglab.com/msfragdb/.

الأساليب الكمية الشبكات الجزيئية

Predicting Patient COVID-19 Disease Severity by means of Statistical and Machine Learning Analysis of Blood Cell Transcriptome Data

95 - Sakifa Aktar , Md. Martuza Ahamad , Md. Rashed-Al-Mahfuz 2020

Introduction: For COVID-19 patients accurate prediction of disease severity and mortality risk would greatly improve care delivery and resource allocation. There are many patient-related factors, such as pre-existing comorbidities that affect disease severity. Since rapid automated profiling of peripheral blood samples is widely available, we investigated how such data from the peripheral blood of COVID-19 patients might be used to predict clinical outcomes. Methods: We thus investigated such clinical datasets from COVID-19 patients with known outcomes by combining statistical comparison and correlation methods with machine learning algorithms; the latter included decision tree, random forest, variants of gradient boosting machine, support vector machine, K-nearest neighbour and deep learning methods. Results: Our work revealed several clinical parameters measurable in blood samples, which discriminated between healthy people and COVID-19 positive patients and showed predictive value for later severity of COVID-19 symptoms. We thus developed a number of analytic methods that showed accuracy and precision for disease severity and mortality outcome predictions that were above 90%. Conclusions: In sum, we developed methodologies to analyse patient routine clinical data which enables more accurate prediction of COVID-19 patient outcomes. This type of approaches could, by employing standard hospital laboratory analyses of patient blood, be utilised to identify, COVID-19 patients at high risk of mortality and so enable their treatment to be optimised.

الأساليب الكمية التعلم الآلي

Deep Learning for Identifying Metastatic Breast Cancer

85 - Dayong Wang , Aditya Khosla , Rishab Gargeya 2016

The International Symposium on Biomedical Imaging (ISBI) held a grand challenge to evaluate computational systems for the automated detection of metastatic breast cancer in whole slide images of sentinel lymph node biopsies. Our team won both competi tions in the grand challenge, obtaining an area under the receiver operating curve (AUC) of 0.925 for the task of whole slide image classification and a score of 0.7051 for the tumor localization task. A pathologist independently reviewed the same images, obtaining a whole slide image classification AUC of 0.966 and a tumor localization score of 0.733. Combining our deep learning systems predictions with the human pathologists diagnoses increased the pathologists AUC to 0.995, representing an approximately 85 percent reduction in human error rate. These results demonstrate the power of using deep learning to produce significant improvements in the accuracy of pathological diagnoses.

الأساليب الكمية الرؤية الحاسوبية وتمييز الأنماط

A mathematical model for breath gas analysis of volatile organic compounds with special emphasis on acetone

357 - Julian King , Karl Unterkofler , Gerald Teschl 2010

Recommended standardized procedures for determining exhaled lower respiratory nitric oxide and nasal nitric oxide have been developed by task forces of the European Respiratory Society and the American Thoracic Society. These recommendations have pav ed the way for the measurement of nitric oxide to become a diagnostic tool for specific clinical applications. It would be desirable to develop similar guidelines for the sampling of other trace gases in exhaled breath, especially volatile organic compounds (VOCs) which reflect ongoing metabolism. The concentrations of water-soluble, blood-borne substances in exhaled breath are influenced by: (i) breathing patterns affecting gas exchange in the conducting airways; (ii) the concentrations in the tracheo-bronchial lining fluid; (iii) the alveolar and systemic concentrations of the compound. The classical Farhi equation takes only the alveolar concentrations into account. Real-time measurements of acetone in end-tidal breath under an ergometer challenge show characteristics which cannot be explained within the Farhi setting. Here we develop a compartment model that reliably captures these profiles and is capable of relating breath to the systemic concentrations of acetone. By comparison with experimental data it is inferred that the major part of variability in breath acetone concentrations (e.g., in response to moderate exercise or altered breathing patterns) can be attributed to airway gas exchange, with minimal changes of the underlying blood and tissue concentrations. Moreover, it is deduced that measured end-tidal breath concentrations of acetone determined during resting conditions and free breathing will be rather poor indicators for endogenous levels. Particularly, the current formulation includes the classical Farhi and the Scheid series inhomogeneity model as special limiting cases.

الأساليب الكمية