W-Net: A Deep Model for Fully Unsupervised Image Segmentation

98 0 0.0 ( 0 )

Download Cite

Added by Xide Xia

Publication date 2017

fields Informatics Engineering

and research's language is English

Authors Xide Xia - Brian Kulis

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

While significant attention has been recently focused on designing supervised deep semantic segmentation algorithms for vision tasks, there are many domains in which sufficient supervised pixel-level labels are difficult to obtain. In this paper, we revisit the problem of purely unsupervised image segmentation and propose a novel deep architecture for this problem. We borrow recent ideas from supervised semantic segmentation methods, in particular by concatenating two fully convolutional networks together into an autoencoder--one for encoding and one for decoding. The encoding layer produces a k-way pixelwise prediction, and both the reconstruction error of the autoencoder as well as the normalized cut produced by the encoder are jointly minimized during training. When combined with suitable postprocessing involving conditional random field smoothing and hierarchical segmentation, our resulting algorithm achieves impressive results on the benchmark Berkeley Segmentation Data Set, outperforming a number of competing methods.

rate research

Deep Superpixel Cut for Unsupervised Image Segmentation

334 - Qinghong Lin , Weichan Zhong , Jianglin Lu 2021

Image segmentation, one of the most critical vision tasks, has been studied for many years. Most of the early algorithms are unsupervised methods, which use hand-crafted features to divide the image into many regions. Recently, owing to the great success of deep learning technology, CNNs based methods show superior performance in image segmentation. However, these methods rely on a large number of human annotations, which are expensive to collect. In this paper, we propose a deep unsupervised method for image segmentation, which contains the following two stages. First, a Superpixelwise Autoencoder (SuperAE) is designed to learn the deep embedding and reconstruct a smoothed image, then the smoothed image is passed to generate superpixels. Second, we present a novel clustering algorithm called Deep Superpixel Cut (DSC), which measures the deep similarity between superpixels and formulates image segmentation as a soft partitioning problem. Via backpropagation, DSC adaptively partitions the superpixels into perceptual regions. Experimental results on the BSDS500 dataset demonstrate the effectiveness of the proposed method.

Computer Vision and Pattern Recognition

w-Net: Dual Supervised Medical Image Segmentation Model with Multi-Dimensional Attention and Cascade Multi-Scale Convolution

63 - Bo Wang , Lei Wang , Junyang Chen 2020

Deep learning-based medical image segmentation technology aims at automatic recognizing and annotating objects on the medical image. Non-local attention and feature learning by multi-scale methods are widely used to model network, which drives progress in medical image segmentation. However, those attention mechanism methods have weakly non-local receptive fields strengthened connection for small objects in medical images. Then, the features of important small objects in abstract or coarse feature maps may be deserted, which leads to unsatisfactory performance. Moreover, the existing multi-scale methods only simply focus on different sizes of view, whose sparse multi-scale features collected are not abundant enough for small objects segmentation. In this work, a multi-dimensional attention segmentation model with cascade multi-scale convolution is proposed to predict accurate segmentation for small objects in medical images. As the weight function, multi-dimensional attention modules provide coefficient modification for significant/informative small objects features. Furthermore, The cascade multi-scale convolution modules in each skip-connection path are exploited to capture multi-scale features in different semantic depth. The proposed method is evaluated on three datasets: KiTS19, Pancreas CT of Decathlon-10, and MICCAI 2018 LiTS Challenge, demonstrating better segmentation performances than the state-of-the-art baselines.

Computer Vision and Pattern Recognition

Fully Transformer Networks for Semantic Image Segmentation

386 - Sitong Wu , Tianyi Wu , Fangjian Lin 2021

Transformers have shown impressive performance in various natural language processing and computer vision tasks, due to the capability of modeling long-range dependencies. Recent progress has demonstrated to combine such transformers with CNN-based semantic image segmentation models is very promising. However, it is not well studied yet on how well a pure transformer based approach can achieve for image segmentation. In this work, we explore a novel framework for semantic image segmentation, which is encoder-decoder based Fully Transformer Networks (FTN). Specifically, we first propose a Pyramid Group Transformer (PGT) as the encoder for progressively learning hierarchical features, while reducing the computation complexity of the standard visual transformer(ViT). Then, we propose a Feature Pyramid Transformer (FPT) to fuse semantic-level and spatial-level information from multiple levels of the PGT encoder for semantic image segmentation. Surprisingly, this simple baseline can achieve new state-of-the-art results on multiple challenging semantic segmentation benchmarks, including PASCAL Context, ADE20K and COCO-Stuff. The source code will be released upon the publication of this work.

Computer Vision and Pattern Recognition

Model-based learning of local image features for unsupervised texture segmentation

74 - Martin Kiechle , Martin Storath , Andreas Weinmann 2017

Features that capture well the textural patterns of a certain class of images are crucial for the performance of texture segmentation methods. The manual selection of features or designing new ones can be a tedious task. Therefore, it is desirable to automatically adapt the features to a certain image or class of images. Typically, this requires a large set of training images with similar textures and ground truth segmentation. In this work, we propose a framework to learn features for texture segmentation when no such training data is available. The cost function for our learning process is constructed to match a commonly used segmentation model, the piecewise constant Mumford-Shah model. This means that the features are learned such that they provide an approximately piecewise constant feature image with a small jump set. Based on this idea, we develop a two-stage algorithm which first learns suitable convolutional features and then performs a segmentation. We note that the features can be learned from a small set of images, from a single image, or even from image patches. The proposed method achieves a competitive rank in the Prague texture segmentation benchmark, and it is effective for segmenting histological images.

Computer Vision and Pattern Recognition

Improved Workflow for Unsupervised Multiphase Image Segmentation

452 - Brendan A. West , Taylor S. Hodgdon , Matthew D. Parno 2017

Quantitative image analysis often depends on accurate classification of pixels through a segmentation process. However, imaging artifacts such as the partial volume effect and sensor noise complicate the classification process. These effects increase the pixel intensity variance of each constituent class, causing intensities from one class to overlap with another. This increased variance makes threshold based segmentation methods insufficient due to ambiguous overlap regions in the pixel intensity distributions. The class ambiguity becomes even more complex for systems with more than two constituents, such as unsaturated moist granular media. In this paper, we propose an image processing workflow that improves segmentation accuracy for multiphase systems. First, the ambiguous transition regions between classes are identified and removed, which allows for global thresholding of single-class regions. Then the transition regions are classified using a distance function, and finally both segmentations are combined into one classified image. This workflow includes three methodologies for identifying transition pixels and we demonstrate on a variety of synthetic images that these approaches are able to accurately separate the ambiguous transition pixels from the single-class regions. For situations with typical amounts of image noise, misclassification errors and area differences calculated between each class of the synthetic images and the resultant segmented images range from 0.69-1.48% and 0.01-0.74%, respectively, showing the segmentation accuracy of this approach. We demonstrate that we are able to accurately segment x-ray microtomography images of moist granular media using these computationally efficient methodologies.

Computer Vision and Pattern Recognition