No Arabic abstract
In this paper, we explore the role of Instance Normalization in low-level vision tasks. Specifically, we present a novel block: Half Instance Normalization Block (HIN Block), to boost the performance of image restoration networks. Based on HIN Block, we design a simple and powerful multi-stage network named HINet, which consists of two subnetworks. With the help of HIN Block, HINet surpasses the state-of-the-art (SOTA) on various image restoration tasks. For image denoising, we exceed it 0.11dB and 0.28 dB in PSNR on SIDD dataset, with only 7.5% and 30% of its multiplier-accumulator operations (MACs), 6.8 times and 2.9 times speedup respectively. For image deblurring, we get comparable performance with 22.5% of its MACs and 3.3 times speedup on REDS and GoPro datasets. For image deraining, we exceed it by 0.3 dB in PSNR on the average result of multiple datasets with 1.4 times speedup. With HINet, we won 1st place on the NTIRE 2021 Image Deblurring Challenge - Track2. JPEG Artifacts, with a PSNR of 29.70. The code is available at https://github.com/megvii-model/HINet.
Deep neural networks (DNNs) have achieved significant success in image restoration tasks by directly learning a powerful non-linear mapping from corrupted images to their latent clean ones. However, there still exist two major limitations for these deep learning (DL)-based methods. Firstly, the noises contained in real corrupted images are very complex, usually neglected and largely under-estimated in most current methods. Secondly, existing DL methods are mostly trained on one pre-assumed degradation process for all of the training image pairs, such as the widely used bicubic downsampling assumption in the image super-resolution task, inevitably leading to poor generalization performance when the true degradation does not match with such assumed one. To address these issues, we propose a unified generative model for the image restoration, which elaborately configures the degradation process from the latent clean image to the observed corrupted one. Specifically, different from most of current methods, the pixel-wisely non-i.i.d. Gaussian distribution, being with more flexibility, is adopted in our method to fit the complex real noises. Furthermore, the method is built on the general image degradation process, making it capable of adapting diverse degradations under one single model. Besides, we design a variational inference algorithm to learn all parameters involved in the proposed model with explicit form of objective loss. Specifically, beyond traditional variational methodology, two DNNs are employed to parameterize the posteriori distributions, one to infer the distribution of the latent clean image, and another to infer the distribution of the image noise. Extensive experiments demonstrate the superiority of the proposed method on three classical image restoration tasks, including image denoising, image super-resolution and JPEG image deblocking.
Recently, deep convolutional neural network (CNN) have been widely used in image restoration and obtained great success. However, most of existing methods are limited to local receptive field and equal treatment of different types of information. Besides, existing methods always use a multi-supervised method to aggregate different feature maps, which can not effectively aggregate hierarchical feature information. To address these issues, we propose an attention cube network (A-CubeNet) for image restoration for more powerful feature expression and feature correlation learning. Specifically, we design a novel attention mechanism from three dimensions, namely spatial dimension, channel-wise dimension and hierarchical dimension. The adaptive spatial attention branch (ASAB) and the adaptive channel attention branch (ACAB) constitute the adaptive dual attention module (ADAM), which can capture the long-range spatial and channel-wise contextual information to expand the receptive field and distinguish different types of information for more effective feature representations. Furthermore, the adaptive hierarchical attention module (AHAM) can capture the long-range hierarchical contextual information to flexibly aggregate different feature maps by weights depending on the global context. The ADAM and AHAM cooperate to form an attention in attention structure, which means AHAMs inputs are enhanced by ASAB and ACAB. Experiments demonstrate the superiority of our method over state-of-the-art image restoration methods in both quantitative comparison and visual analysis. Code is available at https://github.com/YCHang686/A-CubeNet.
Liquify is a common technique for image editing, which can be used for image distortion. Due to the uncertainty in the distortion variation, restoring distorted images caused by liquify filter is a challenging task. To edit images in an efficient way, distorted images are expected to be restored automatically. This paper aims at the distorted image restoration, which is characterized by seeking the appropriate warping and completion of a distorted image. Existing methods focus on the hardware assistance or the geometric principle to solve the specific regular deformation caused by natural phenomena, but they cannot handle the irregularity and uncertainty of artificial distortion in this task. To address this issue, we propose a novel generative and discriminative learning method based on deep neural networks, which can learn various reconstruction mappings and represent complex and high-dimensional data. This method decomposes the task into a rectification stage and a refinement stage. The first stage generative network predicts the mapping from the distorted images to the rectified ones. The second stage generative network then further optimizes the perceptual quality. Since there is no available dataset or benchmark to explore this task, we create a Distorted Face Dataset (DFD) by forward distortion mapping based on CelebA dataset. Extensive experimental evaluation on the proposed benchmark and the application demonstrates that our method is an effective way for distorted image restoration.
Image quality assessment (IQA) is the key factor for the fast development of image restoration (IR) algorithms. The most recent IR methods based on Generative Adversarial Networks (GANs) have achieved significant improvement in visual performance, but also presented great challenges for quantitative evaluation. Notably, we observe an increasing inconsistency between perceptual quality and the evaluation results. Then we raise two questions: (1) Can existing IQA methods objectively evaluate recent IR algorithms? (2) When focus on beating current benchmarks, are we getting better IR algorithms? To answer these questions and promote the development of IQA methods, we contribute a large-scale IQA dataset, called Perceptual Image Processing Algorithms (PIPAL) dataset. Especially, this dataset includes the results of GAN-based methods, which are missing in previous datasets. We collect more than 1.13 million human judgments to assign subjective scores for PIPAL images using the more reliable Elo system. Based on PIPAL, we present new benchmarks for both IQA and super-resolution methods. Our results indicate that existing IQA methods cannot fairly evaluate GAN-based IR algorithms. While using appropriate evaluation methods is important, IQA methods should also be updated along with the development of IR algorithms. At last, we improve the performance of IQA networks on GAN-based distortions by introducing anti-aliasing pooling. Experiments show the effectiveness of the proposed method.
Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from low-quality images (e.g., downscaled, noisy and compressed images). While state-of-the-art image restoration methods are based on convolutional neural networks, few attempts have been made with Transformers which show impressive performance on high-level vision tasks. In this paper, we propose a strong baseline model SwinIR for image restoration based on the Swin Transformer. SwinIR consists of three parts: shallow feature extraction, deep feature extraction and high-quality image reconstruction. In particular, the deep feature extraction module is composed of several residual Swin Transformer blocks (RSTB), each of which has several Swin Transformer layers together with a residual connection. We conduct experiments on three representative tasks: image super-resolution (including classical, lightweight and real-world image super-resolution), image denoising (including grayscale and color image denoising) and JPEG compression artifact reduction. Experimental results demonstrate that SwinIR outperforms state-of-the-art methods on different tasks by $textbf{up to 0.14$sim$0.45dB}$, while the total number of parameters can be reduced by $textbf{up to 67%}$.