No Arabic abstract
Local and non-local attention-based methods have been well studied in various image restoration tasks while leading to promising performance. However, most of the existing methods solely focus on one type of attention mechanism (local or non-local). Furthermore, by exploiting the self-similarity of natural images, existing pixel-wise non-local attention operations tend to give rise to deviations in the process of characterizing long-range dependence due to image degeneration. To overcome these problems, in this paper we propose a novel collaborative attention network (COLA-Net) for image restoration, as the first attempt to combine local and non-local attention mechanisms to restore image content in the areas with complex textures and with highly repetitive details respectively. In addition, an effective and robust patch-wise non-local attention model is developed to capture long-range feature correspondences through 3D patches. Extensive experiments on synthetic image denoising, real image denoising and compression artifact reduction tasks demonstrate that our proposed COLA-Net is able to achieve state-of-the-art performance in both peak signal-to-noise ratio and visual perception, while maintaining an attractive computational complexity. The source code is available on https://github.com/MC-E/COLA-Net.
Recently, deep convolutional neural network (CNN) have been widely used in image restoration and obtained great success. However, most of existing methods are limited to local receptive field and equal treatment of different types of information. Besides, existing methods always use a multi-supervised method to aggregate different feature maps, which can not effectively aggregate hierarchical feature information. To address these issues, we propose an attention cube network (A-CubeNet) for image restoration for more powerful feature expression and feature correlation learning. Specifically, we design a novel attention mechanism from three dimensions, namely spatial dimension, channel-wise dimension and hierarchical dimension. The adaptive spatial attention branch (ASAB) and the adaptive channel attention branch (ACAB) constitute the adaptive dual attention module (ADAM), which can capture the long-range spatial and channel-wise contextual information to expand the receptive field and distinguish different types of information for more effective feature representations. Furthermore, the adaptive hierarchical attention module (AHAM) can capture the long-range hierarchical contextual information to flexibly aggregate different feature maps by weights depending on the global context. The ADAM and AHAM cooperate to form an attention in attention structure, which means AHAMs inputs are enhanced by ASAB and ACAB. Experiments demonstrate the superiority of our method over state-of-the-art image restoration methods in both quantitative comparison and visual analysis. Code is available at https://github.com/YCHang686/A-CubeNet.
In this paper, we propose an end-to-end feature fusion at-tention network (FFA-Net) to directly restore the haze-free image. The FFA-Net architecture consists of three key components: 1) A novel Feature Attention (FA) module combines Channel Attention with Pixel Attention mechanism, considering that different channel-wise features contain totally different weighted information and haze distribution is uneven on the different image pixels. FA treats different features and pixels unequally, which provides additional flexibility in dealing with different types of information, expanding the representational ability of CNNs. 2) A basic block structure consists of Local Residual Learning and Feature Attention, Local Residual Learning allowing the less important information such as thin haze region or low-frequency to be bypassed through multiple local residual connections, let main network architecture focus on more effective information. 3) An Attention-based different levels Feature Fusion (FFA) structure, the feature weights are adaptively learned from the Feature Attention (FA) module, giving more weight to important features. This structure can also retain the information of shallow layers and pass it into deep layers. The experimental results demonstrate that our proposed FFA-Net surpasses previous state-of-the-art single image dehazing methods by a very large margin both quantitatively and qualitatively, boosting the best published PSNR metric from 30.23db to 36.39db on the SOTS indoor test dataset. Code has been made available at GitHub.
Convolutional neural network has recently achieved great success for image restoration (IR) and also offered hierarchical features. However, most deep CNN based IR models do not make full use of the hierarchical features from the original low-quality images, thereby achieving relatively-low performance. In this paper, we propose a novel residual dense network (RDN) to address this problem in IR. We fully exploit the hierarchical features from all the convolutional layers. Specifically, we propose residual dense block (RDB) to extract abundant local features via densely connected convolutional layers. RDB further allows direct connections from the state of preceding RDB to all the layers of current RDB, leading to a contiguous memory mechanism. To adaptively learn more effective features from preceding and current local features and stabilize the training of wider network, we proposed local feature fusion in RDB. After fully obtaining dense local features, we use global feature fusion to jointly and adaptively learn global hierarchical features in a holistic way. We demonstrate the effectiveness of RDN with several representative IR applications, single image super-resolution, Gaussian image denoising, image compression artifact reduction, and image deblurring. Experiments on benchmark and real-world datasets show that our RDN achieves favorable performance against state-of-the-art methods for each IR task quantitatively and visually.
In this paper, we propose a residual non-local attention network for high-quality image restoration. Without considering the uneven distribution of information in the corrupted images, previous methods are restricted by local convolutional operation and equal treatment of spatial- and channel-wise features. To address this issue, we design local and non-local attention blocks to extract features that capture the long-range dependencies between pixels and pay more attention to the challenging parts. Specifically, we design trunk branch and (non-)local mask branch in each (non-)local attention block. The trunk branch is used to extract hierarchical features. Local and non-local mask branches aim to adaptively rescale these hierarchical features with mixed attentions. The local mask branch concentrates on more local structures with convolutional operations, while non-local attention considers more about long-range dependencies in the whole feature map. Furthermore, we propose residual local and non-local attention learning to train the very deep network, which further enhance the representation ability of the network. Our proposed method can be generalized for various image restoration applications, such as image denoising, demosaicing, compression artifacts reduction, and super-resolution. Experiments demonstrate that our method obtains comparable or better results compared with recently leading methods quantitatively and visually.
Convolutional neural networks have allowed remarkable advances in single image super-resolution (SISR) over the last decade. Among recent advances in SISR, attention mechanisms are crucial for high-performance SR models. However, the attention mechanism remains unclear on why it works and how it works in SISR. In this work, we attempt to quantify and visualize attention mechanisms in SISR and show that not all attention modules are equally beneficial. We then propose attention in attention network (A$^2$N) for more efficient and accurate SISR. Specifically, A$^2$N consists of a non-attention branch and a coupling attention branch. A dynamic attention module is proposed to generate weights for these two branches to suppress unwanted attention adjustments dynamically, where the weights change adaptively according to the input features. This allows attention modules to specialize to beneficial examples without otherwise penalties and thus greatly improve the capacity of the attention network with few parameters overhead. Experimental results demonstrate that our final model A$^2$N could achieve superior trade-off performances comparing with state-of-the-art networks of similar sizes. Codes are available at https://github.com/haoyuc/A2N.