ترغب بنشر مسار تعليمي؟ اضغط هنا

Spectral Machine Learning for Pancreatic Mass Imaging Classification

167   0   0.0 ( 0 )
 نشر من قبل Yiming Liu
 تاريخ النشر 2021
والبحث باللغة English




اسأل ChatGPT حول البحث

We present a novel spectral machine learning (SML) method in screening for pancreatic mass using CT imaging. Our algorithm is trained with approximately 30,000 images from 250 patients (50 patients with normal pancreas and 200 patients with abnormal pancreas findings) based on public data sources. A test accuracy of 94.6 percents was achieved in the out-of-sample diagnosis classification based on a total of approximately 15,000 images from 113 patients, whereby 26 out of 32 patients with normal pancreas and all 81 patients with abnormal pancreas findings were correctly diagnosed. SML is able to automatically choose fundamental images (on average 5 or 9 images for each patient) in the diagnosis classification and achieve the above mentioned accuracy. The computational time is 75 seconds for diagnosing 113 patients in a laptop with standard CPU running environment. Factors that influenced high performance of a well-designed integration of spectral learning and machine learning included: 1) use of eigenvectors corresponding to several of the largest eigenvalues of sample covariance matrix (spike eigenvectors) to choose input attributes in classification training, taking into account only the fundamental information of the raw images with less noise; 2) removal of irrelevant pixels based on mean-level spectral test to lower the challenges of memory capacity and enhance computational efficiency while maintaining superior classification accuracy; 3) adoption of state-of-the-art machine learning classification, gradient boosting and random forest. Our methodology showcases practical utility and improved accuracy of image diagnosis in pancreatic mass screening in the era of AI.



قيم البحث

اقرأ أيضاً

Machine learning (ML) offers a collection of powerful approaches for detecting and modeling associations, often applied to data having a large number of features and/or complex associations. Currently, there are many tools to facilitate implementing custom ML analyses (e.g. scikit-learn). Interest is also increasing in automated ML packages, which can make it easier for non-experts to apply ML and have the potential to improve model performance. ML permeates most subfields of biomedical research with varying levels of rigor and correct usage. Tremendous opportunities offered by ML are frequently offset by the challenge of assembling comprehensive analysis pipelines, and the ease of ML misuse. In this work we have laid out and assembled a complete, rigorous ML analysis pipeline focused on binary classification (i.e. case/control prediction), and applied this pipeline to both simulated and real world data. At a high level, this automated but customizable pipeline includes a) exploratory analysis, b) data cleaning and transformation, c) feature selection, d) model training with 9 established ML algorithms, each with hyperparameter optimization, and e) thorough evaluation, including appropriate metrics, statistical analyses, and novel visualizations. This pipeline organizes the many subtle complexities of ML pipeline assembly to illustrate best practices to avoid bias and ensure reproducibility. Additionally, this pipeline is the first to compare established ML algorithms to ExSTraCS, a rule-based ML algorithm with the unique capability of interpretably modeling heterogeneous patterns of association. While designed to be widely applicable we apply this pipeline to an epidemiological investigation of established and newly identified risk factors for pancreatic cancer to evaluate how different sources of bias might be handled by ML algorithms.
In this paper, we propose an AdaBoost-assisted extreme learning machine for efficient online sequential classification (AOS-ELM). In order to achieve better accuracy in online sequential learning scenarios, we utilize the cost-sensitive algorithm-Ada Boost, which diversifying the weak classifiers, and adding the forgetting mechanism, which stabilizing the performance during the training procedure. Hence, AOS-ELM adapts better to sequentially arrived data compared with other voting based methods. The experiment results show AOS-ELM can achieve 94.41% accuracy on MNIST dataset, which is the theoretical accuracy bound performed by an original batch learning algorithm, AdaBoost-ELM. Moreover, with the forgetting mechanism, the standard deviation of accuracy during the online sequential learning process is reduced to 8.26x.
Current Flash X-ray single-particle diffraction Imaging (FXI) experiments, which operate on modern X-ray Free Electron Lasers (XFELs), can record millions of interpretable diffraction patterns from individual biomolecules per day. Due to the stochast ic nature of the XFELs, those patterns will to a varying degree include scatterings from contaminated samples. Also, the heterogeneity of the sample biomolecules is unavoidable and complicates data processing. Reducing the data volumes and selecting high-quality single-molecule patterns are therefore critical steps in the experimental set-up. In this paper, we present two supervised template-based learning methods for classifying FXI patterns. Our Eigen-Image and Log-Likelihood classifier can find the best-matched template for a single-molecule pattern within a few milliseconds. It is also straightforward to parallelize them so as to fully match the XFEL repetition rate, thereby enabling processing at site.
An explainable machine learning method for point cloud classification, called the PointHop method, is proposed in this work. The PointHop method consists of two stages: 1) local-to-global attribute building through iterative one-hop information excha nge, and 2) classification and ensembles. In the attribute building stage, we address the problem of unordered point cloud data using a space partitioning procedure and developing a robust descriptor that characterizes the relationship between a point and its one-hop neighbor in a PointHop unit. When we put multiple PointHop units in cascade, the attributes of a point will grow by taking its relationship with one-hop neighbor points into account iteratively. Furthermore, to control the rapid dimension growth of the attribute vector associated with a point, we use the Saab transform to reduce the attribute dimension in each PointHop unit. In the classification and ensemble stage, we feed the feature vector obtained from multiple PointHop units to a classifier. We explore ensemble methods to improve the classification performance furthermore. It is shown by experimental results that the PointHop method offers classification performance that is comparable with state-of-the-art methods while demanding much lower training complexity.
This paper has proposed a new baseline deep learning model of more benefits for image classification. Different from the convolutional neural network(CNN) practice where filters are trained by back propagation to represent different patterns of an im age, we are inspired by a method called PCANet in PCANet: A Simple Deep Learning Baseline for Image Classification? to choose filter vectors from basis vectors in frequency domain like Fourier coefficients or wavelets without back propagation. Researchers have demonstrated that those basis in frequency domain can usually provide physical insights, which adds to the interpretability of the model by analyzing the frequencies selected. Besides, the training process will also be more time efficient, mathematically clear and interpretable compared with the black-box training process of CNN.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا