No Arabic abstract
The ability of deep learning to predict with uncertainty is recognized as key for its adoption in clinical routines. Moreover, performance gain has been enabled by modelling uncertainty according to empirical evidence. While previous work has widely discussed the uncertainty estimation in segmentation and classification tasks, its application on bounding-box-based detection has been limited, mainly due to the challenge of bounding box aligning. In this work, we explore to augment a 2.5D detection CNN with two different bounding-box-level (or instance-level) uncertainty estimates, i.e., predictive variance and Monte Carlo (MC) sample variance. Experiments are conducted for lung nodule detection on LUNA16 dataset, a task where significant semantic ambiguities can exist between nodules and non-nodules. Results show that our method improves the evaluating score from 84.57% to 88.86% by utilizing a combination of both types of variances. Moreover, we show the generated uncertainty enables superior operating points compared to using the probability threshold only, and can further boost the performance to 89.52%. Example nodule detections are visualized to further illustrate the advantages of our method.
Building reliable object detectors that are robust to domain shifts, such as various changes in context, viewpoint, and object appearances, is critical for real-world applications. In this work, we study the effectiveness of auxiliary self-supervised tasks to improve the out-of-distribution generalization of object detectors. Inspired by the principle of maximum entropy, we introduce a novel self-supervised task, instance-level temporal cycle confusion (CycConf), which operates on the region features of the object detectors. For each object, the task is to find the most different object proposals in the adjacent frame in a video and then cycle back to itself for self-supervision. CycConf encourages the object detector to explore invariant structures across instances under various motions, which leads to improved model robustness in unseen domains at test time. We observe consistent out-of-domain performance improvements when training object detectors in tandem with self-supervised tasks on large-scale video datasets (BDD100K and Waymo open data). The joint training framework also establishes a new state-of-the-art on standard unsupervised domain adaptative detection benchmarks (Cityscapes, Foggy Cityscapes, and Sim10K). The code and models are available at https://github.com/xinw1012/cycle-confusion.
Despite the great progress achieved in unsupervised feature embedding, existing contrastive learning methods typically pursue view-invariant representations through attracting positive sample pairs and repelling negative sample pairs in the embedding space, while neglecting to systematically explore instance relations. In this paper, we explore instance relations including intra-instance multi-view relation and inter-instance interpolation relation for unsupervised feature embedding. Specifically, we embed intra-instance multi-view relation by aligning the distribution of the distance between an instances different augmented samples and negative samples. We explore inter-instance interpolation relation by transferring the ratio of information for image sample interpolation from pixel space to feature embedding space. The proposed approach, referred to as EIR, is simple-yet-effective and can be easily inserted into existing view-invariant contrastive learning based methods. Experiments conducted on public benchmarks for image classification and retrieval report state-of-the-art or comparable performance.
We propose a new method of instance-level microtubule (MT) tracking in time-lapse image series using recurrent attention. Our novel deep learning algorithm segments individual MTs at each frame. Segmentation results from successive frames are used to assign correspondences among MTs. This ultimately generates a distinct path trajectory for each MT through the frames. Based on these trajectories, we estimate MT velocities. To validate our proposed technique, we conduct experiments using real and simulated data. We use statistics derived from real time-lapse series of MT gliding assays to simulate realistic MT time-lapse image series in our simulated data. This dataset is employed as pre-training and hyperparameter optimization for our network before training on the real data. Our experimental results show that the proposed supervised learning algorithm improves the precision for MT instance velocity estimation drastically to 71.3% from the baseline result (29.3%). We also demonstrate how the inclusion of temporal information into our deep network can reduce the false negative rates from 67.8% (baseline) down to 28.7% (proposed). Our findings in this work are expected to help biologists characterize the spatial arrangement of MTs, specifically the effects of MT-MT interactions.
To better detect pedestrians of various scales, deep multi-scale methods usually detect pedestrians of different scales by different in-network layers. However, the semantic levels of features from different layers are usually inconsistent. In this paper, we propose a multi-branch and high-level semantic network by gradually splitting a base network into multiple different branches. As a result, the different branches have the same depth and the output features of different branches have similarly high-level semantics. Due to the difference of receptive fields, the different branches are suitable to detect pedestrians of different scales. Meanwhile, the multi-branch network does not introduce additional parameters by sharing convolutional weights of different branches. To further improve detection performance, skip-layer connections among different branches are used to add context to the branch of relatively small receptive filed, and dilated convolution is incorporated into part branches to enlarge the resolutions of output feature maps. When they are embedded into Faster RCNN architecture, the weighted scores of proposal generation network and proposal classification network are further proposed. Experiments on KITTI dataset, Caltech pedestrian dataset, and Citypersons dataset demonstrate the effectiveness of proposed method. On these pedestrian datasets, the proposed method achieves state-of-the-art detection performance. Moreover, experiments on COCO benchmark show the proposed method is also suitable for general object detection.
Image-to-image translation plays a vital role in tackling various medical imaging tasks such as attenuation correction, motion correction, undersampled reconstruction, and denoising. Generative adversarial networks have been shown to achieve the state-of-the-art in generating high fidelity images for these tasks. However, the state-of-the-art GAN-based frameworks do not estimate the uncertainty in the predictions made by the network that is essential for making informed medical decisions and subsequent revision by medical experts and has recently been shown to improve the performance and interpretability of the model. In this work, we propose an uncertainty-guided progressive learning scheme for image-to-image translation. By incorporating aleatoric uncertainty as attention maps for GANs trained in a progressive manner, we generate images of increasing fidelity progressively. We demonstrate the efficacy of our model on three challenging medical image translation tasks, including PET to CT translation, undersampled MRI reconstruction, and MRI motion artefact correction. Our model generalizes well in three different tasks and improves performance over state of the art under full-supervision and weak-supervision with limited data. Code is released here: https://github.com/ExplainableML/UncerGuidedI2I