No Arabic abstract
This paper presents a method to identify substructures in NMR spectra of mixtures, specifically 2D spectra, using a bespoke image-based Convolutional Neural Network application. This is done using HSQC and HMBC spectra separately and in combination. The application can reliably detect substructures in pure compounds, using a simple network. It can work for mixtures when trained on pure compounds only. HMBC data and the combination of HMBC and HSQC show better results than HSQC alone.
Obstructive sleep Apnea (OSA) is a form of sleep disordered breathing characterized by frequent episodes of upper airway collapse during sleep. Pediatric OSA occurs in 1-5% of children and can related to other serious health conditions such as high blood pressure, behavioral issues, or altered growth. OSA is often diagnosed by studying the patients sleep cycle, the pattern with which they progress through various sleep states such as wakefulness, rapid eye-movement, and non-rapid eye-movement. The sleep state data is obtained using an overnight polysomnography test that the patient undergoes at a hospital or sleep clinic, where a technician manually labels each 30 second time interval, also called an epoch, with the current sleep state. This process is laborious and prone to human error. We seek an automatic method of classifying the sleep state, as well as a method to analyze the sleep cycles. This article is a pilot study in sleep state classification using two approaches: first, well use methods from the field of topological data analysis to classify the sleep state and second, well model sleep states as a Markov chain and visually analyze the sleep patterns. In the future, we will continue to build on this work to improve our methods.
We present an automated method to track and identify neurons in C. elegans, called fast Deep Learning Correspondence or fDLC, based on the transformer network architecture. The model is trained once on empirically derived synthetic data and then predicts neural correspondence across held-out real animals via transfer learning. The same pre-trained model both tracks neurons across time and identifies corresponding neurons across individuals. Performance is evaluated against hand-annotated datasets, including NeuroPAL [1]. Using only position information, the method achieves 80.0% accuracy at tracking neurons within an individual and 65.8% accuracy at identifying neurons across individuals. Accuracy is even higher on a published dataset [2]. Accuracy reaches 76.5% when using color information from NeuroPAL. Unlike previous methods, fDLC does not require straightening or transforming the animal into a canonical coordinate system. The method is fast and predicts correspondence in 10 ms making it suitable for future real-time applications.
Metaproteomics are becoming widely used in microbiome research for gaining insights into the functional state of the microbial community. Current metaproteomics studies are generally based on high-throughput tandem mass spectrometry (MS/MS) coupled with liquid chromatography. The identification of peptides and proteins from MS data involves the computational procedure of searching MS/MS spectra against a predefined protein sequence database and assigning top-scored peptides to spectra. Existing computational tools are still far from being able to extract all the information out of large MS/MS datasets acquired from metaproteome samples. In this paper, we proposed a deep-learning-based algorithm, called DeepFilter, for improving the rate of confident peptide identifications from a collection of tandem mass spectra. Compared with other post-processing tools, including Percolator, Q-ranker, PeptideProphet, and Iprophet, DeepFilter identified 20% and 10% more peptide-spectrum-matches and proteins, respectively, on marine microbial and soil microbial metaproteome samples with false discovery rate at 1%.
We preprocess the raw NMR spectrum and extract key characteristic features by using two different methodologies, called equidistant sampling and peak sampling for subsequent substructure pattern recognition; meanwhile may provide the alternative strategy to address the imbalance issue of the NMR dataset frequently encountered in dataset collection of statistical modeling and establish two conventional SVM and KNN models to assess the capability of two feature selection, respectively. Our results in this study show that the models using the selected features of peak sampling outperform the ones using the other. Then we build the Recurrent Neural Network (RNN) model trained by Data B collected from peak sampling. Furthermore, we illustrate the easier optimization of hyper parameters and the better generalization ability of the RNN deep learning model by comparison with traditional machine learning SVM and KNN models in detail.
Purpose: In this work, we present a collaboration to create a validation dataset of pathologist annotations for algorithms that process whole slide images (WSIs). We focus on data collection and evaluation of algorithm performance in the context of estimating the density of stromal tumor infiltrating lymphocytes (sTILs) in breast cancer. Methods: We digitized 64 glass slides of hematoxylin- and eosin-stained ductal carcinoma core biopsies prepared at a single clinical site. We created training materials and workflows to crowdsource pathologist image annotations on two modes: an optical microscope and two digital platforms. The workflows collect the ROI type, a decision on whether the ROI is appropriate for estimating the density of sTILs, and if appropriate, the sTIL density value for that ROI. Results: The pilot study yielded an abundant number of cases with nominal sTIL infiltration. Furthermore, we found that the sTIL densities are correlated within a case, and there is notable pathologist variability. Consequently, we outline plans to improve our ROI and case sampling methods. We also outline statistical methods to account for ROI correlations within a case and pathologist variability when validating an algorithm. Conclusion: We have built workflows for efficient data collection and tested them in a pilot study. As we prepare for pivotal studies, we will consider what it will take for the dataset to be fit for a regulatory purpose: study size, patient population, and pathologist training and qualifications. To this end, we will elicit feedback from the FDA via the Medical Device Development Tool program and from the broader digital pathology and AI community. Ultimately, we intend to share the dataset, statistical methods, and lessons learned.