No Arabic abstract
We propose a simple extension of residual networks that works simultaneously in multiple resolutions. Our network design is inspired by the iterative back-projection algorithm but seeks the more difficult task of learning how to enhance images. Compared to similar approaches, we propose a novel solution to make back-projections run in multiple resolutions by using a data pipeline workflow. Features are updated at multiple scales in each layer of the network. The update dynamic through these layers includes interactions between different resolutions in a way that is causal in scale, and it is represented by a system of ODEs, as opposed to a single ODE in the case of ResNets. The system can be used as a generic multi-resolution approach to enhance images. We test it on several challenging tasks with special focus on super-resolution and raindrop removal. Our results are competitive with state-of-the-arts and show a strong ability of our system to learn both global and local image features.
Multi-Grid Back-Projection (MGBP) is a fully-convolutional network architecture that can learn to restore images and videos with upscaling artifacts. Using the same strategy of multi-grid partial differential equation (PDE) solvers this multiscale architecture scales computational complexity efficiently with increasing output resolutions. The basic processing block is inspired in the iterative back-projection (IBP) algorithm and constitutes a type of cross-scale residual block with feedback from low resolution references. The architecture performs in par with state-of-the-arts alternatives for regression targets that aim to recover an exact copy of a high resolution image or video from which only a downscale image is known. A perceptual quality target aims to create more realistic outputs by introducing artificial changes that can be different from a high resolution original content as long as they are consistent with the low resolution input. For this target we propose a strategy using noise inputs in different resolution scales to control the amount of artificial details generated in the output. The noise input controls the amount of innovation that the network uses to create artificial realistic details. The effectiveness of this strategy is shown in benchmarks and it is explained as a particular strategy to traverse the perception-distortion plane.
Here, we describe our solution for the AIM-2019 Extreme Super-Resolution Challenge, where we won the 1st place in terms of perceptual quality (MOS) similar to the ground truth and achieved the 5th place in terms of high-fidelity (PSNR). To tackle this challenge, we introduce the second generation of MultiGrid BackProjection networks (MGBPv2) whose major modifications make the system scalable and more general than its predecessor. It combines the scalability of the multigrid algorithm and the performance of iterative backprojections. In its original form, MGBP is limited to a small number of parameters due to a strongly recursive structure. In MGBPv2, we make full use of the multigrid recursion from the beginning of the network; we allow different parameters in every module of the network; we simplify the main modules; and finally, we allow adjustments of the number of network features based on the scale of operation. For inference tasks, we introduce an overlapping patch approach to further allow processing of very large images (e.g. 8K). Our training strategies make use of a multiscale loss, combining distortion and/or perception losses on the output as well as downscaled output images. The final system can balance between high quality and high performance.
Non-line-of-sight imaging has attracted more attentions for its wide applications.Even though ultrasensitive cameras or detectors with high time-resolution are available, current back-projection methods are still powerless to acquire a satisfying reconstruction of multiple hidden objects due to severe aliasing artifacts. Here, a novel back-projection method is developed to reconstruct multiple hidden objects. Our method considers decomposing all the ellipsoids in a confidence map into several clusters belonging to different objects (namely ellipsoid mode decomposition), and then reconstructing the objects individually from their ellipsoid modes by filtering and thresholding, respectively. Importantly, the simulated and experimental results demonstrate that this method can effectively eliminate the impacts of aliasing artifacts and exhibits potential advantages in separating, locating and recovering multiple hidden objects, which might be a good base for reconstructing complex non-line-ofsight scenes.
Understanding the connectivity in the brain is an important prerequisite for understanding how the brain processes information. In the Brain/MINDS project, a connectivity study on marmoset brains uses two-photon microscopy fluorescence images of axonal projections to collect the neuron connectivity from defined brain regions at the mesoscopic scale. The processing of the images requires the detection and segmentation of the axonal tracer signal. The objective is to detect as much tracer signal as possible while not misclassifying other background structures as the signal. This can be challenging because of imaging noise, a cluttered image background, distortions or varying image contrast cause problems. We are developing MarmoNet, a pipeline that processes and analyzes tracer image data of the common marmoset brain. The pipeline incorporates state-of-the-art machine learning techniques based on artificial convolutional neural networks (CNN) and image registration techniques to extract and map all relevant information in a robust manner. The pipeline processes new images in a fully automated way. This report introduces the current state of the tracer signal analysis part of the pipeline.
We proposed a novel architecture for the problem of video super-resolution. We integrate spatial and temporal contexts from continuous video frames using a recurrent encoder-decoder module, that fuses multi-frame information with the more traditional, single frame super-resolution path for the target frame. In contrast to most prior work where frames are pooled together by stacking or warping, our model, the Recurrent Back-Projection Network (RBPN) treats each context frame as a separate source of information. These sources are combined in an iterative refinement framework inspired by the idea of back-projection in multiple-image super-resolution. This is aided by explicitly representing estimated inter-frame motion with respect to the target, rather than explicitly aligning frames. We propose a new video super-resolution benchmark, allowing evaluation at a larger scale and considering videos in different motion regimes. Experimental results demonstrate that our RBPN is superior to existing methods on several datasets.