No Arabic abstract
Accurate characterisation of visual attributes such as spiculation, lobulation, and calcification of lung nodules is critical in cancer management. The characterisation of these attributes is often subjective, which may lead to high inter- and intra-observer variability. Furthermore, lung nodules are often heterogeneous in the cross-sectional image slices of a 3D volume. Current state-of-the-art methods that score multiple attributes rely on deep learning-based multi-task learning (MTL) schemes. These methods, however, extract shared visual features across attributes and then examine each attribute without explicitly leveraging their inherent intercorrelations. Furthermore, current methods either treat each slice with equal importance without considering their relevance or heterogeneity, which limits performance. In this study, we address these challenges with a new convolutional neural network (CNN)-based MTL model that incorporates multiple attention-based learning modules to simultaneously score 9 visual attributes of lung nodules in computed tomography (CT) image volumes. Our model processes entire nodule volumes of arbitrary depth and uses a slice attention module to filter out irrelevant slices. We also introduce cross-attribute and attribute specialisation attention modules that learn an optimal amalgamation of meaningful representations to leverage relationships between attributes. We demonstrate that our model outperforms previous state-of-the-art methods at scoring attributes using the well-known public LIDC-IDRI dataset of pulmonary nodules from over 1,000 patients. Our model also performs competitively when repurposed for benign-malignant classification. Our attention modules also provide easy-to-interpret weights that offer insights into the predictions of the model.
Multimodal positron emission tomography-computed tomography (PET-CT) is used routinely in the assessment of cancer. PET-CT combines the high sensitivity for tumor detection with PET and anatomical information from CT. Tumor segmentation is a critical element of PET-CT but at present, there is not an accurate automated segmentation method. Segmentation tends to be done manually by different imaging experts and it is labor-intensive and prone to errors and inconsistency. Previous automated segmentation methods largely focused on fusing information that is extracted separately from the PET and CT modalities, with the underlying assumption that each modality contains complementary information. However, these methods do not fully exploit the high PET tumor sensitivity that can guide the segmentation. We introduce a multimodal spatial attention module (MSAM) that automatically learns to emphasize regions (spatial areas) related to tumors and suppress normal regions with physiologic high-uptake. The resulting spatial attention maps are subsequently employed to target a convolutional neural network (CNN) for segmentation of areas with higher tumor likelihood. Our MSAM can be applied to common backbone architectures and trained end-to-end. Our experimental results on two clinical PET-CT datasets of non-small cell lung cancer (NSCLC) and soft tissue sarcoma (STS) validate the effectiveness of the MSAM in these different cancer types. We show that our MSAM, with a conventional U-Net backbone, surpasses the state-of-the-art lung tumor segmentation approach by a margin of 7.6% in Dice similarity coefficient (DSC).
Despite the widespread availability of in-treatment room cone beam computed tomography (CBCT) imaging, due to the lack of reliable segmentation methods, CBCT is only used for gross set up corrections in lung radiotherapies. Accurate and reliable auto-segmentation tools could potentiate volumetric response assessment and geometry-guided adaptive radiation therapies. Therefore, we developed a new deep learning CBCT lung tumor segmentation method. Methods: The key idea of our approach called cross modality educed distillation (CMEDL) is to use magnetic resonance imaging (MRI) to guide a CBCT segmentation network training to extract more informative features during training. We accomplish this by training an end-to-end network comprised of unpaired domain adaptation (UDA) and cross-domain segmentation distillation networks (SDN) using unpaired CBCT and MRI datasets. Feature distillation regularizes the student network to extract CBCT features that match the statistical distribution of MRI features extracted by the teacher network and obtain better differentiation of tumor from background.} We also compared against an alternative framework that used UDA with MR segmentation network, whereby segmentation was done on the synthesized pseudo MRI representation. All networks were trained with 216 weekly CBCTs and 82 T2-weighted turbo spin echo MRI acquired from different patient cohorts. Validation was done on 20 weekly CBCTs from patients not used in training. Independent testing was done on 38 weekly CBCTs from patients not used in training or validation. Segmentation accuracy was measured using surface Dice similarity coefficient (SDSC) and Hausdroff distance at 95th percentile (HD95) metrics.
Diagnosis and treatment of multiple pulmonary nodules are clinically important but challenging. Prior studies on nodule characterization use solitary-nodule approaches on multiple nodular patients, which ignores the relations between nodules. In this study, we propose a multiple instance learning (MIL) approach and empirically prove the benefit to learn the relations between multiple nodules. By treating the multiple nodules from a same patient as a whole, critical relational information between solitary-nodule voxels is extracted. To our knowledge, it is the first study to learn the relations between multiple pulmonary nodules. Inspired by recent advances in natural language processing (NLP) domain, we introduce a self-attention transformer equipped with 3D CNN, named {NoduleSAT}, to replace typical pooling-based aggregation in multiple instance learning. Extensive experiments on lung nodule false positive reduction on LUNA16 database, and malignancy classification on LIDC-IDRI database, validate the effectiveness of the proposed method.
Follow-up serves an important role in the management of pulmonary nodules for lung cancer. Imaging diagnostic guidelines with expert consensus have been made to help radiologists make clinical decision for each patient. However, tumor growth is such a complicated process that it is difficult to stratify high-risk nodules from low-risk ones based on morphologic characteristics. On the other hand, recent deep learning studies using convolutional neural networks (CNNs) to predict the malignancy score of nodules, only provides clinicians with black-box predictions. To this end, we propose a unified framework, named Nodule Follow-Up Prediction Network (NoFoNet), which predicts the growth of pulmonary nodules with high-quality visual appearances and accurate quantitative results, given any time interval from baseline observations. It is achieved by predicting future displacement field of each voxel with a WarpNet. A TextureNet is further developed to refine textural details of WarpNet outputs. We also introduce techniques including Temporal Encoding Module and Warp Segmentation Loss to encourage time-aware and shape-aware representation learning. We build an in-house follow-up dataset from two medical centers to validate the effectiveness of the proposed method. NoFoNet significantly outperforms direct prediction by a U-Net in terms of visual quality; more importantly, it demonstrates accurate differentiating performance between high- and low-risk nodules. Our promising results suggest the potentials in computer aided intervention for lung nodule management.
Accurate and robust segmentation of lung cancers from CTs is needed to more accurately plan and deliver radiotherapy and to measure treatment response. This is particularly difficult for tumors located close to mediastium, due to low soft-tissue contrast. Therefore, we developed a new cross-modality educed distillation (CMEDL) approach, using unpaired CT and MRI scans, whereby a teacher MRI network guides a student CT network to extract features that signal the difference between foreground and background. Our contribution eliminates two requirements of distillation methods: (i) paired image sets by using an image to image (I2I) translation and (ii) pre-training of the teacher network with a large training set by using concurrent training of all networks. Our framework uses an end-to-end trained unpaired I2I translation, teacher, and student segmentation networks. Our framework can be combined with any I2I and segmentation network. We demonstrate our frameworks feasibility using 3 segmentation and 2 I2I methods. All networks were trained with 377 CT and 82 T2w MRI from different sets of patients. Ablation tests and different strategies for incorporating MRI information into CT were performed. Accuracy was measured using Dice similarity (DSC), surface Dice (sDSC), and Hausdorff distance at the 95$^{th}$ percentile (HD95). The CMEDL approach was significantly (p $<$ 0.001) more accurate than non-CMEDL methods, quantitatively and visually. It produced the highest segmentation accuracy (sDSC of 0.83 $pm$ 0.16 and HD95 of 5.20 $pm$ 6.86mm). CMEDL was also more accurate than using either pMRIs or the combination of CTs with pMRIs for segmentation.