No Arabic abstract
Annotating images for semantic segmentation requires intense manual labor and is a time-consuming and expensive task especially for domains with a scarcity of experts, such as Forensic Anthropology. We leverage the evolving nature of images depicting the decay process in human decomposition data to design a simple yet effective pseudo-pixel-level label generation technique to reduce the amount of effort for manual annotation of such images. We first identify sequences of images with a minimum variation that are most suitable to share the same or similar annotation using an unsupervised approach. Given one user-annotated image in each sequence, we propagate the annotation to the remaining images in the sequence by merging it with annotations produced by a state-of-the-art CAM-based pseudo label generation technique. To evaluate the quality of our pseudo-pixel-level labels, we train two semantic segmentation models with VGG and ResNet backbones on images labeled using our pseudo labeling method and those of a state-of-the-art method. The results indicate that using our pseudo-labels instead of those generated using the state-of-the-art method in the training process improves the mean-IoU and the frequency-weighted-IoU of the VGG and ResNet-based semantic segmentation models by 3.36%, 2.58%, 10.39%, and 12.91% respectively.
This paper presents SPICE, a Semantic Pseudo-labeling framework for Image ClustEring. Instead of using indirect loss functions required by the recently proposed methods, SPICE generates pseudo-labels via self-learning and directly uses the pseudo-label-based classification loss to train a deep clustering network. The basic idea of SPICE is to synergize the discrepancy among semantic clusters, the similarity among instance samples, and the semantic consistency of local samples in an embedding space to optimize the clustering network in a semantically-driven paradigm. Specifically, a semantic-similarity-based pseudo-labeling algorithm is first proposed to train a clustering network through unsupervised representation learning. Given the initial clustering results, a local semantic consistency principle is used to select a set of reliably labeled samples, and a semi-pseudo-labeling algorithm is adapted for performance boosting. Extensive experiments demonstrate that SPICE clearly outperforms the state-of-the-art methods on six common benchmark datasets including STL10, Cifar10, Cifar100-20, ImageNet-10, ImageNet-Dog, and Tiny-ImageNet. On average, our SPICE method improves the current best results by about 10% in terms of adjusted rand index, normalized mutual information, and clustering accuracy.
We present a multiview pseudo-labeling approach to video learning, a novel framework that uses complementary views in the form of appearance and motion information for semi-supervised learning in video. The complementary views help obtain more reliable pseudo-labels on unlabeled video, to learn stronger video representations than from purely supervised data. Though our method capitalizes on multiple views, it nonetheless trains a model that is shared across appearance and motion input and thus, by design, incurs no additional computation overhead at inference time. On multiple video recognition datasets, our method substantially outperforms its supervised counterpart, and compares favorably to previous work on standard benchmarks in self-supervised video representation learning.
To safely deploy autonomous vehicles, onboard perception systems must work reliably at high accuracy across a diverse set of environments and geographies. One of the most common techniques to improve the efficacy of such systems in new domains involves collecting large labeled datasets, but such datasets can be extremely costly to obtain, especially if each new deployment geography requires additional data with expensive 3D bounding box annotations. We demonstrate that pseudo-labeling for 3D object detection is an effective way to exploit less expensive and more widely available unlabeled data, and can lead to performance gains across various architectures, data augmentation strategies, and sizes of the labeled dataset. Overall, we show that better teacher models lead to better student models, and that we can distill expensive teachers into efficient, simple students. Specifically, we demonstrate that pseudo-label-trained student models can outperform supervised models trained on 3-10 times the amount of labeled examples. Using PointPillars [24], a two-year-old architecture, as our student model, we are able to achieve state of the art accuracy simply by leveraging large quantities of pseudo-labeled data. Lastly, we show that these student models generalize better than supervised models to a new domain in which we only have unlabeled data, making pseudo-label training an effective form of unsupervised domain adaptation.
Recently, ultra-widefield (UWF) 200degree~fundus imaging by Optos cameras has gradually been introduced because of its broader insights for detecting more information on the fundus than regular 30 degree - 60 degree fundus cameras. Compared with UWF fundus images, regular fundus images contain a large amount of high-quality and well-annotated data. Due to the domain gap, models trained by regular fundus images to recognize UWF fundus images perform poorly. Hence, given that annotating medical data is labor intensive and time consuming, in this paper, we explore how to leverage regular fundus images to improve the limited UWF fundus data and annotations for more efficient training. We propose the use of a modified cycle generative adversarial network (CycleGAN) model to bridge the gap between regular and UWF fundus and generate additional UWF fundus images for training. A consistency regularization term is proposed in the loss of the GAN to improve and regulate the quality of the generated data. Our method does not require that images from the two domains be paired or even that the semantic labels be the same, which provides great convenience for data collection. Furthermore, we show that our method is robust to noise and errors introduced by the generated unlabeled data with the pseudo-labeling technique. We evaluated the effectiveness of our methods on several common fundus diseases and tasks, such as diabetic retinopathy (DR) classification, lesion detection and tessellated fundus segmentation. The experimental results demonstrate that our proposed method simultaneously achieves superior generalizability of the learned representations and performance improvements in multiple tasks.
Histopathology image analysis can be considered as a Multiple instance learning (MIL) problem, where the whole slide histopathology image (WSI) is regarded as a bag of instances (i.e, patches) and the task is to predict a single class label to the WSI. However, in many real-life applications such as computational pathology, discovering the key instances that trigger the bag label is of great interest because it provides reasons for the decision made by the system. In this paper, we propose a deep convolutional neural network (CNN) model that addresses the primary task of a bag classification on a WSI and also learns to identify the response of each instance to provide interpretable results to the final prediction. We incorporate the attention mechanism into the proposed model to operate the transformation of instances and learn attention weights to allow us to find key patches. To perform a balanced training, we introduce adaptive weighing in each training bag to explicitly adjust the weight distribution in order to concentrate more on the contribution of hard samples. Based on the learned attention weights, we further develop a solution to boost the classification performance by generating the bags with hard negative instances. We conduct extensive experiments on colon and breast cancer histopathology data and show that our framework achieves state-of-the-art performance.