No Arabic abstract
The fully connected conditional random field (CRF) with Gaussian pairwise potentials has proven popular and effective for multi-class semantic segmentation. While the energy of a dense CRF can be minimized accurately using a linear programming (LP) relaxation, the state-of-the-art algorithm is too slow to be useful in practice. To alleviate this deficiency, we introduce an efficient LP minimization algorithm for dense CRFs. To this end, we develop a proximal minimization framework, where the dual of each proximal problem is optimized via block coordinate descent. We show that each block of variables can be efficiently optimized. Specifically, for one block, the problem decomposes into significantly smaller subproblems, each of which is defined over a single pixel. For the other block, the problem is optimized via conditional gradient descent. This has two advantages: 1) the conditional gradient can be computed in a time linear in the number of pixels and labels; and 2) the optimal step size can be computed analytically. Our experiments on standard datasets provide compelling evidence that our approach outperforms all existing baselines including the previous LP based approach for dense CRFs.
This paper proposes a novel approach for uncertainty quantification in dense Conditional Random Fields (CRFs). The presented approach, called Perturb-and-MPM, enables efficient, approximate sampling from dense multi-label CRFs via random perturbations. An analytic error analysis was performed which identified the main cause of approximation error as well as showed that the error is bounded. Spatial uncertainty maps can be derived from the Perturb-and-MPM model, which can be used to visualize uncertainty in image segmentation results. The method is validated on synthetic and clinical Magnetic Resonance Imaging data. The effectiveness of the approach is demonstrated on the challenging problem of segmenting the tumor core in glioblastoma. We found that areas of high uncertainty correspond well to wrongly segmented image regions. Furthermore, we demonstrate the potential use of uncertainty maps to refine imaging biomarkers in the case of extent of resection and residual tumor volume in brain tumor patients.
In structured prediction, a major challenge for models is to represent the interdependencies within their output structures. For the common case where outputs are structured as a sequence, linear-chain conditional random fields (CRFs) are a widely used model class which can learn local dependencies in output sequences. However, the CRFs Markov assumption makes it impossible for these models to capture nonlocal dependencies, and standard CRFs are unable to respect nonlocal constraints of the data (such as global arity constraints on output labels). We present a generalization of CRFs that can enforce a broad class of constraints, including nonlocal ones, by specifying the space of possible output structures as a regular language $mathcal{L}$. The resulting regular-constrained CRF (RegCCRF) has the same formal properties as a standard CRF, but assigns zero probability to all label sequences not in $mathcal{L}$. Notably, RegCCRFs can incorporate their constraints during training, while related models only enforce constraints during decoding. We prove that constrained training is never worse than constrained decoding, and show using synthetic data that it can be substantially better in practice. Additionally, we demonstrate a practical benefit on downstream tasks by incorporating a RegCCRF into a deep neural model for semantic role labeling, exceeding state-of-the-art results on a standard dataset.
In this paper, we address the problem of dynamic scene deblurring in the presence of motion blur. Restoration of images affected by severe blur necessitates a network design with a large receptive field, which existing networks attempt to achieve through simple increment in the number of generic convolution layers, kernel-size, or the scales at which the image is processed. However, these techniques ignore the non-uniform nature of blur, and they come at the expense of an increase in model size and inference time. We present a new architecture composed of region adaptive dense deformable modules that implicitly discover the spatially varying shifts responsible for non-uniform blur in the input image and learn to modulate the filters. This capability is complemented by a self-attentive module which captures non-local spatial relationships among the intermediate features and enhances the spatially-varying processing capability. We incorporate these modules into a densely connected encoder-decoder design which utilizes pre-trained Densenet filters to further improve the performance. Our network facilitates interpretable modeling of the spatially-varying deblurring process while dispensing with multi-scale processing and large filters entirely. Extensive comparisons with prior art on benchmark dynamic scene deblurring datasets clearly demonstrate the superiority of the proposed networks via significant improvements in accuracy and speed, enabling almost real-time deblurring.
We propose a novel compact linear programming (LP) relaxation for binary sub-modular MRF in the context of object segmentation. Our model is obtained by linearizing an $l_1^+$-norm derived from the quadratic programming (QP) form of the MRF energy. The resultant LP model contains significantly fewer variables and constraints compared to the conventional LP relaxation of the MRF energy. In addition, unlike QP which can produce ambiguous labels, our model can be viewed as a quasi-total-variation minimization problem, and it can therefore preserve the discontinuities in the labels. We further establish a relaxation bound between our LP model and the conventional LP model. In the experiments, we demonstrate our method for the task of interactive object segmentation. Our LP model outperforms QP when converting the continuous labels to binary labels using different threshold values on the entire Oxford interactive segmentation dataset. The computational complexity of our LP is of the same order as that of the QP, and it is significantly lower than the conventional LP relaxation.
Specialized domain knowledge is often necessary to accurately annotate training sets for in-depth analysis, but can be burdensome and time-consuming to acquire from domain experts. This issue arises prominently in automated behavior analysis, in which agent movements or actions of interest are detected from video tracking data. To reduce annotation effort, we present TREBA: a method to learn annotation-sample efficient trajectory embedding for behavior analysis, based on multi-task self-supervised learning. The tasks in our method can be efficiently engineered by domain experts through a process we call task programming, which uses programs to explicitly encode structured knowledge from domain experts. Total domain expert effort can be reduced by exchanging data annotation time for the construction of a small number of programmed tasks. We evaluate this trade-off using data from behavioral neuroscience, in which specialized domain knowledge is used to identify behaviors. We present experimental results in three datasets across two domains: mice and fruit flies. Using embeddings from TREBA, we reduce annotation burden by up to a factor of 10 without compromising accuracy compared to state-of-the-art features. Our results thus suggest that task programming and self-supervision can be an effective way to reduce annotation effort for domain experts.