No Arabic abstract
Multiple instance learning (MIL) is the preferred approach for whole slide image classification. However, most MIL approaches do not exploit the interdependencies of tiles extracted from a whole slide image, which could provide valuable cues for classification. This paper presents a novel MIL approach that exploits the spatial relationship of tiles for classifying whole slide images. To do so, a sparse map is built from tiles embeddings, and is then classified by a sparse-input CNN. It obtained state-of-the-art performance over popular MIL approaches on the classification of cancer subtype involving 10000 whole slide images. Our results suggest that the proposed approach might (i) improve the representation learning of instances and (ii) exploit the context of instance embeddings to enhance the classification performance. The code of this work is open-source at {github censored for review}.
The whole slide histopathology images (WSIs) play a critical role in gastric cancer diagnosis. However, due to the large scale of WSIs and various sizes of the abnormal area, how to select informative regions and analyze them are quite challenging during the automatic diagnosis process. The multi-instance learning based on the most discriminative instances can be of great benefit for whole slide gastric image diagnosis. In this paper, we design a recalibrated multi-instance deep learning method (RMDL) to address this challenging problem. We first select the discriminative instances, and then utilize these instances to diagnose diseases based on the proposed RMDL approach. The designed RMDL network is capable of capturing instance-wise dependencies and recalibrating instance features according to the importance coefficient learned from the fused features. Furthermore, we build a large whole-slide gastric histopathology image dataset with detailed pixel-level annotations. Experimental results on the constructed gastric dataset demonstrate the significant improvement on the accuracy of our proposed framework compared with other state-of-the-art multi-instance learning methods. Moreover, our method is general and can be extended to other diagnosis tasks of different cancer types based on WSIs.
Weak supervision learning on classification labels has demonstrated high performance in various tasks. When a few pixel-level fine annotations are also affordable, it is natural to leverage both of the pixel-level (e.g., segmentation) and image level (e.g., classification) annotation to further improve the performance. In computational pathology, however, such weak or mixed supervision learning is still a challenging task, since the high resolution of whole slide images makes it unattainable to perform end-to-end training of classification models. An alternative approach is to analyze such data by patch-base model training, i.e., using self-supervised learning to generate pixel-level pseudo labels for patches. However, such methods usually have model drifting issues, i.e., hard to converge, because the noise accumulates during the self-training process. To handle those problems, we propose a mixed supervision learning framework for super high-resolution images to effectively utilize their various labels (e.g., sufficient image-level coarse annotations and a few pixel-level fine labels). During the patch training stage, this framework can make use of coarse image-level labels to refine self-supervised learning and generate high-quality pixel-level pseudo labels. A comprehensive strategy is proposed to suppress pixel-level false positives and false negatives. Three real-world datasets with very large number of images (i.e., more than 10,000 whole slide images) and various types of labels are used to evaluate the effectiveness of mixed supervision learning. We reduced the false positive rate by around one third compared to state of the art while retaining 100% sensitivity, in the task of image-level classification.
Convolutional Neural Networks (CNN) are state-of-the-art models for many image classification tasks. However, to recognize cancer subtypes automatically, training a CNN on gigapixel resolution Whole Slide Tissue Images (WSI) is currently computationally impossible. The differentiation of cancer subtypes is based on cellular-level visual features observed on image patch scale. Therefore, we argue that in this situation, training a patch-level classifier on image patches will perform better than or similar to an image-level classifier. The challenge becomes how to intelligently combine patch-level classification results and model the fact that not all patches will be discriminative. We propose to train a decision fusion model to aggregate patch-level predictions given by patch-level CNNs, which to the best of our knowledge has not been shown before. Furthermore, we formulate a novel Expectation-Maximization (EM) based method that automatically locates discriminative patches robustly by utilizing the spatial relationships of patches. We apply our method to the classification of glioma and non-small-cell lung carcinoma cases into subtypes. The classification accuracy of our method is similar to the inter-observer agreement between pathologists. Although it is impossible to train CNNs on WSIs, we experimentally demonstrate using a comparable non-cancer dataset of smaller images that a patch-based CNN can outperform an image-based CNN.
We propose a Deep learning-based weak label learning method for analysing whole slide images (WSIs) of Hematoxylin and Eosin (H&E) stained tumorcells not requiring pixel-level or tile-level annotations using Self-supervised pre-training and heterogeneity-aware deep Multiple Instance LEarning (DeepSMILE). We apply DeepSMILE to the task of Homologous recombination deficiency (HRD) and microsatellite instability (MSI) prediction. We utilize contrastive self-supervised learning to pre-train a feature extractor on histopathology tiles of cancer tissue. Additionally, we use variability-aware deep multiple instance learning to learn the tile feature aggregation function while modeling tumor heterogeneity. Compared to state-of-the-art genomic label classification methods, DeepSMILE improves classification performance for HRD from $70.43pm4.10%$ to $83.79pm1.25%$ AUC and MSI from $78.56pm6.24%$ to $90.32pm3.58%$ AUC in a multi-center breast and colorectal cancer dataset, respectively. These improvements suggest we can improve genomic label classification performance without collecting larger datasets. In the future, this may reduce the need for expensive genome sequencing techniques, provide personalized therapy recommendations based on widely available WSIs of cancer tissue, and improve patient care with quicker treatment decisions - also in medical centers without access to genome sequencing resources.
Histopathology slides are routinely marked by pathologists using permanent ink markers that should not be removed as they form part of the medical record. Often tumour regions are marked up for the purpose of highlighting features or other downstream processing such an gene sequencing. Once digitised there is no established method for removing this information from the whole slide images limiting its usability in research and study. Removal of marker ink from these high-resolution whole slide images is non-trivial and complex problem as they contaminate different regions and in an inconsistent manner. We propose an efficient pipeline using convolution neural networks that results in ink-free images without compromising information and image resolution. Our pipeline includes a sequential classical convolution neural network for accurate classification of contaminated image tiles, a fast region detector and a domain adaptive cycle consistent adversarial generative model for restoration of foreground pixels. Both quantitative and qualitative results on four different whole slide images show that our approach yields visually coherent ink-free whole slide images.