No Arabic abstract
Direct contour regression for instance segmentation is a challenging task. Previous works usually achieve it by learning to progressively refine the contour prediction or adopting a shape representation with limited expressiveness. In this work, we argue that the difficulty in regressing the contour points in one pass is mainly due to the ambiguity when discretizing a smooth contour into a polygon. To address the ambiguity, we propose a novel differentiable rendering-based approach named textbf{ContourRender}. During training, it first predicts a contour generated by an invertible shape signature, and then optimizes the contour with the more stable silhouette by converting it to a contour mesh and rendering the mesh to a 2D map. This method significantly improves the quality of contour without iterations or cascaded refinements. Moreover, as optimization is not needed during inference, the inference speed will not be influenced. Experiments show the proposed ContourRender outperforms all the contour-based instance segmentation approaches on COCO, while stays competitive with the iteration-based state-of-the-art on Cityscapes. In addition, we specifically select a subset from COCO val2017 named COCO ContourHard-val to further demonstrate the contour quality improvements. Codes, models, and dataset split will be released.
In this paper, we propose a novel top-down instance segmentation framework based on explicit shape encoding, named textbf{ESE-Seg}. It largely reduces the computational consumption of the instance segmentation by explicitly decoding the multiple object shapes with tensor operations, thus performs the instance segmentation at almost the same speed as the object detection. ESE-Seg is based on a novel shape signature Inner-center Radius (IR), Chebyshev polynomial fitting and the strong modern object detectors. ESE-Seg with YOLOv3 outperforms the Mask R-CNN on Pascal VOC 2012 at mAP$^
[email protected] while 7 times faster.
We present a novel explicit shape representation for instance segmentation. Based on how to model the object shape, current instance segmentation systems can be divided into two categories, implicit and explicit models. The implicit methods, which represent the object mask/contour by intractable network parameters, and produce it through pixel-wise classification, are predominant. However, the explicit methods, which parameterize the shape with simple and explainable models, are less explored. Since the operations to generate the final shape are light-weighted, the explicit methods have a clear speed advantage over implicit methods, which is crucial for real-world applications. The proposed USD-Seg adopts a linear model, sparse coding with dictionary, for object shapes. First, it learns a dictionary from a large collection of shape datasets, making any shape being able to be decomposed into a linear combination through the dictionary. Hence the name Universal Shape Dictionary. Then it adds a simple shape vector regression head to ordinary object detector, giving the detector segmentation ability with minimal overhead. For quantitative evaluation, we use both average precision (AP) and the proposed Efficiency of AP (AP$_E$) metric, which intends to also measure the computational consumption of the framework to cater to the requirements of real-world applications. We report experimental results on the challenging COCO dataset, in which our single model on a single Titan Xp GPU achieves 35.8 AP and 27.8 AP$_E$ at 65 fps with YOLOv4 as base detector, 34.1 AP and 28.6 AP$_E$ at 12 fps with FCOS as base detector.
Accurate segmentation of anatomical structures is vital for medical image analysis. The state-of-the-art accuracy is typically achieved by supervised learning methods, where gathering the requisite expert-labeled image annotations in a scalable manner remains a main obstacle. Therefore, annotation-efficient methods that permit to produce accurate anatomical structure segmentation are highly desirable. In this work, we present Contour Transformer Network (CTN), a one-shot anatomy segmentation method with a naturally built-in human-in-the-loop mechanism. We formulate anatomy segmentation as a contour evolution process and model the evolution behavior by graph convolutional networks (GCNs). Training the CTN model requires only one labeled image exemplar and leverages additional unlabeled data through newly introduced loss functions that measure the global shape and appearance consistency of contours. On segmentation tasks of four different anatomies, we demonstrate that our one-shot learning method significantly outperforms non-learning-based methods and performs competitively to the state-of-the-art fully supervised deep learning methods. With minimal human-in-the-loop editing feedback, the segmentation performance can be further improved to surpass the fully supervised methods.
Accurate segmenting nuclei instances is a crucial step in computer-aided image analysis to extract rich features for cellular estimation and following diagnosis as well as treatment. While it still remains challenging because the wide existence of nuclei clusters, along with the large morphological variances among different organs make nuclei instance segmentation susceptible to over-/under-segmentation. Additionally, the inevitably subjective annotating and mislabeling prevent the network learning from reliable samples and eventually reduce the generalization capability for robustly segmenting unseen organ nuclei. To address these issues, we propose a novel deep neural network, namely Contour-aware Informative Aggregation Network (CIA-Net) with multi-level information aggregation module between two task-specific decoders. Rather than independent decoders, it leverages the merit of spatial and texture dependencies between nuclei and contour by bi-directionally aggregating task-specific features. Furthermore, we proposed a novel smooth truncated loss that modulates losses to reduce the perturbation from outliers. Consequently, the network can focus on learning from reliable and informative samples, which inherently improves the generalization capability. Experiments on the 2018 MICCAI challenge of Multi-Organ-Nuclei-Segmentation validated the effectiveness of our proposed method, surpassing all the other 35 competitive teams by a significant margin.
Current instance segmentation methods can be categorized into segmentation-based methods that segment first then do clustering, and proposal-based methods that detect first then predict masks for each instance proposal using repooling. In this work, we propose a one-stage method, named EmbedMask, that unifies both methods by taking advantages of them. Like proposal-based methods, EmbedMask builds on top of detection models making it strong in detection capability. Meanwhile, EmbedMask applies extra embedding modules to generate embeddings for pixels and proposals, where pixel embeddings are guided by proposal embeddings if they belong to the same instance. Through this embedding coupling process, pixels are assigned to the mask of the proposal if their embeddings are similar. The pixel-level clustering enables EmbedMask to generate high-resolution masks without missing details from repooling, and the existence of proposal embedding simplifies and strengthens the clustering procedure to achieve high speed with higher performance than segmentation-based methods. Without any bells and whistles, EmbedMask achieves comparable performance as Mask R-CNN, which is the representative two-stage method, and can produce more detailed masks at a higher speed. Code is available at github.com/yinghdb/EmbedMask.