No Arabic abstract
Despite recent improvements using fully convolutional networks, in general, the segmentation produced by most state-of-the-art semantic segmentation methods does not show satisfactory adherence to the object boundaries. We propose a method to refine the segmentation results generated by such deep learning models. Our method takes as input the confidence scores generated by a pixel-dense segmentation network and re-labels pixels with low confidence levels. The re-labeling approach employs a region growing mechanism that aggregates these pixels to neighboring areas with high confidence scores and similar appearance. In order to correct the labels of pixels that were incorrectly classified with high confidence level by the semantic segmentation algorithm, we generate multiple region growing steps through a Monte Carlo sampling of the seeds of the regions. Our method improves the accuracy of a state-of-the-art fully convolutional semantic segmentation approach on the publicly available COCO and PASCAL datasets, and it shows significantly better results on selected sequences of the finely-annotated DAVIS dataset.
Semantic segmentation with fine-grained pixel-level accuracy is a fundamental component of a variety of computer vision applications. However, despite the large improvements provided by recent advances in the architectures of convolutional neural networks, segmentations provided by modern state-of-the-art methods still show limited boundary adherence. We introduce a fully unsupervised post-processing algorithm that exploits Monte Carlo sampling and pixel similarities to propagate high-confidence pixel labels into regions of low-confidence classification. Our algorithm, which we call probabilistic Region Growing Refinement (pRGR), is based on a rigorous mathematical foundation in which clusters are modelled as multivariate normally distributed sets of pixels. Exploiting concepts of Bayesian estimation and variance reduction techniques, pRGR performs multiple refinement iterations at varied receptive fields sizes, while updating cluster statistics to adapt to local image features. Experiments using multiple modern semantic segmentation networks and benchmark datasets demonstrate the effectiveness of our approach for the refinement of segmentation predictions at different levels of coarseness, as well as the suitability of the variance estimates obtained in the Monte Carlo iterations as uncertainty measures that are highly correlated with segmentation accuracy.
Image co-segmentation is an active computer vision task that aims to segment the common objects from a set of images. Recently, researchers design various learning-based algorithms to undertake the co-segmentation task. The main difficulty in this task is how to effectively transfer information between images to make conditional predictions. In this paper, we present CycleSegNet, a novel framework for the co-segmentation task. Our network design has two key components: a region correspondence module which is the basic operation for exchanging information between local image regions, and a cycle refinement module, which utilizes ConvLSTMs to progressively update image representations and exchange information in a cycle and iterative manner. Extensive experiments demonstrate that our proposed method significantly outperforms the state-of-the-art methods on four popular benchmark datasets -- PASCAL VOC dataset, MSRC dataset, Internet dataset, and iCoseg dataset, by 2.6%, 7.7%, 2.2%, and 2.9%, respectively.
With the advanced LIGO and Virgo detectors taking observations the detection of gravitational waves is expected within the next few years. Extracting astrophysical information from gravitational wave detections is a well-posed problem and thoroughly studied when detailed models for the waveforms are available. However, one motivation for the field of gravitational wave astronomy is the potential for new discoveries. Recognizing and characterizing unanticipated signals requires data analysis techniques which do not depend on theoretical predictions for the gravitational waveform. Past searches for short-duration un-modeled gravitational wave signals have been hampered by transient noise artifacts, or glitches, in the detectors. In some cases, even high signal-to-noise simulated astrophysical signals have proven difficult to distinguish from glitches, so that essentially any plausible signal could be detected with at most 2-3 $sigma$ level confidence. We have put forth the BayesWave algorithm to differentiate between generic gravitational wave transients and glitches, and to provide robust waveform reconstruction and characterization of the astrophysical signals. Here we study BayesWaves capabilities for rejecting glitches while assigning high confidence to detection candidates through analytic approximations to the Bayesian evidence. Analytic results are tested with numerical experiments by adding simulated gravitational wave transient signals to LIGO data collected between 2009 and 2010 and found to be in good agreement.
We propose a novel method for semantic segmentation, the task of labeling each pixel in an image with a semantic class. Our method combines the advantages of the two main competing paradigms. Methods based on region classification offer proper spatial support for appearance measurements, but typically operate in two separate stages, none of which targets pixel labeling performance at the end of the pipeline. More recent fully convolutional methods are capable of end-to-end training for the final pixel labeling, but resort to fixed patches as spatial support. We show how to modify modern region-based approaches to enable end-to-end training for semantic segmentation. This is achieved via a differentiable region-to-pixel layer and a differentiable free-form Region-of-Interest pooling layer. Our method improves the state-of-the-art in terms of class-average accuracy with 64.0% on SIFT Flow and 49.9% on PASCAL Context, and is particularly accurate at object boundaries.
The amount and quality of datasets and tools available in the research field of hand pose and shape estimation act as evidence to the significant progress that has been made.However, even the datasets of the highest quality, reported to date, have shortcomings in annotation. We propose a refinement approach, based on differentiable ray tracing,and demonstrate how a high-quality publicly available, multi-camera dataset of hands(InterHand2.6M) can become an even better dataset, with respect to annotation quality. Differentiable ray tracing has not been employed so far to relevant problems and is hereby shown to be superior to the approximative alternatives that have been employed in the past. To tackle the lack of reliable ground truth, as far as quantitative evaluation is concerned, we resort to realistic synthetic data, to show that the improvement we induce is indeed significant. The same becomes evident in real data through visual evaluation.