No Arabic abstract
Transfer learning has gained attention in medical image analysis due to limited annotated 3D medical datasets for training data-driven deep learning models in the real world. Existing 3D-based methods have transferred the pre-trained models to downstream tasks, which achieved promising results with only a small number of training samples. However, they demand a massive amount of parameters to train the model for 3D medical imaging. In this work, we propose a novel transfer learning framework, called Medical Transformer, that effectively models 3D volumetric images in the form of a sequence of 2D image slices. To make a high-level representation in 3D-form empowering spatial relations better, we take a multi-view approach that leverages plenty of information from the three planes of 3D volume, while providing parameter-efficient training. For building a source model generally applicable to various tasks, we pre-train the model in a self-supervised learning manner for masked encoding vector prediction as a proxy task, using a large-scale normal, healthy brain magnetic resonance imaging (MRI) dataset. Our pre-trained model is evaluated on three downstream tasks: (i) brain disease diagnosis, (ii) brain age prediction, and (iii) brain tumor segmentation, which are actively studied in brain MRI research. The experimental results show that our Medical Transformer outperforms the state-of-the-art transfer learning methods, efficiently reducing the number of parameters up to about 92% for classification and
The performance of deep neural networks typically increases with the number of training images. However, not all images have the same importance towards improved performance and robustness. In fetal brain MRI, abnormalities exacerbate the variability of the developing brain anatomy compared to non-pathological cases. A small number of abnormal cases, as is typically available in clinical datasets used for training, are unlikely to fairly represent the rich variability of abnormal developing brains. This leads machine learning systems trained by maximizing the average performance to be biased toward non-pathological cases. This problem was recently referred to as hidden stratification. To be suited for clinical use, automatic segmentation methods need to reliably achieve high-quality segmentation outcomes also for pathological cases. In this paper, we show that the state-of-the-art deep learning pipeline nnU-Net has difficulties to generalize to unseen abnormal cases. To mitigate this problem, we propose to train a deep neural network to minimize a percentile of the distribution of per-volume loss over the dataset. We show that this can be achieved by using Distributionally Robust Optimization (DRO). DRO automatically reweights the training samples with lower performance, encouraging nnU-Net to perform more consistently on all cases. We validated our approach using a dataset of 368 fetal brain T2w MRIs, including 124 MRIs of open spina bifida cases and 51 MRIs of cases with other severe abnormalities of brain development.
Deep neural networks have been a prevailing technique in the field of medical image processing. However, the most popular convolutional neural networks (CNNs) based methods for medical image segmentation are imperfect because they model long-range dependencies by stacking layers or enlarging filters. Transformers and the self-attention mechanism are recently proposed to effectively learn long-range dependencies by modeling all pairs of word-to-word attention regardless of their positions. The idea has also been extended to the computer vision field by creating and treating image patches as embeddings. Considering the computation complexity for whole image self-attention, current transformer-based models settle for a rigid partitioning scheme that potentially loses informative relations. Besides, current medical transformers model global context on full resolution images, leading to unnecessary computation costs. To address these issues, we developed a novel method to integrate multi-scale attention and CNN feature extraction using a pyramidal network architecture, namely Pyramid Medical Transformer (PMTrans). The PMTrans captured multi-range relations by working on multi-resolution images. An adaptive partitioning scheme was implemented to retain informative relations and to access different receptive fields efficiently. Experimental results on three medical image datasets (gland segmentation, MoNuSeg, and HECKTOR datasets) showed that PMTrans outperformed the latest CNN-based and transformer-based models for medical image segmentation.
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation. The convolutional operations used in these networks, however, inevitably have limitations in modeling the long-range dependency due to their inductive bias of locality and weight sharing. Although Transformer was born to address this issue, it suffers from extreme computational and spatial complexities in processing high-resolution 3D feature maps. In this paper, we propose a novel framework that efficiently bridges a {bf Co}nvolutional neural network and a {bf Tr}ansformer {bf (CoTr)} for accurate 3D medical image segmentation. Under this framework, the CNN is constructed to extract feature representations and an efficient deformable Transformer (DeTrans) is built to model the long-range dependency on the extracted feature maps. Different from the vanilla Transformer which treats all image positions equally, our DeTrans pays attention only to a small set of key positions by introducing the deformable self-attention mechanism. Thus, the computational and spatial complexities of DeTrans have been greatly reduced, making it possible to process the multi-scale and high-resolution feature maps, which are usually of paramount importance for image segmentation. We conduct an extensive evaluation on the Multi-Atlas Labeling Beyond the Cranial Vault (BCV) dataset that covers 11 major human organs. The results indicate that our CoTr leads to a substantial performance improvement over other CNN-based, transformer-based, and hybrid methods on the 3D multi-organ segmentation task. Code is available at defUrlFont{rmsmallttfamily} url{https://github.com/YtongXie/CoTr}
Perivascular Spaces (PVS) are a recently recognised feature of Small Vessel Disease (SVD), also indicating neuroinflammation, and are an important part of the brains circulation and glymphatic drainage system. Quantitative analysis of PVS on Magnetic Resonance Images (MRI) is important for understanding their relationship with neurological diseases. In this work, we propose a segmentation technique based on the 3D Frangi filtering for extraction of PVS from MRI. Based on prior knowledge from neuroradiological ratings of PVS, we used ordered logit models to optimise Frangi filter parameters in response to the variability in the scanners parameters and study protocols. We optimized and validated our proposed models on two independent cohorts, a dementia sample (N=20) and patients who previously had mild to moderate stroke (N=48). Results demonstrate the robustness and generalisability of our segmentation method. Segmentation-based PVS burden estimates correlated with neuroradiological assessments (Spearmans $rho$ = 0.74, p $<$ 0.001), suggesting the great potential of our proposed method
Magnetic resonance image (MRI) in high spatial resolution provides detailed anatomical information and is often necessary for accurate quantitative analysis. However, high spatial resolution typically comes at the expense of longer scan time, less spatial coverage, and lower signal to noise ratio (SNR). Single Image Super-Resolution (SISR), a technique aimed to restore high-resolution (HR) details from one single low-resolution (LR) input image, has been improved dramatically by recent breakthroughs in deep learning. In this paper, we introduce a new neural network architecture, 3D Densely Connected Super-Resolution Networks (DCSRN) to restore HR features of structural brain MR images. Through experiments on a dataset with 1,113 subjects, we demonstrate that our network outperforms bicubic interpolation as well as other deep learning methods in restoring 4x resolution-reduced images.