No Arabic abstract
Image super-resolution is a process to enhance image resolution. It is widely used in medical imaging, satellite imaging, target recognition, etc. In this paper, we conduct continuous modeling and assume that the unknown image intensity function is defined on a continuous domain and belongs to a space with a redundant basis. We propose a new iterative model for single image super-resolution based on an observation: an image is consisted of smooth components and non-smooth components, and we use two classes of approximated Heaviside functions (AHFs) to represent them respectively. Due to sparsity of the non-smooth components, a $L_{1}$ model is employed. In addition, we apply the proposed iterative model to image patches to reduce computation and storage. Comparisons with some existing competitive methods show the effectiveness of the proposed method.
Single image super-resolution task has witnessed great strides with the development of deep learning. However, most existing studies focus on building a more complex neural network with a massive number of layers, bringing heavy computational cost and memory storage. Recently, as Transformer yields brilliant results in NLP tasks, more and more researchers start to explore the application of Transformer in computer vision tasks. But with the heavy computational cost and high GPU memory occupation of the vision Transformer, the network can not be designed too deep. To address this problem, we propose a novel Efficient Super-Resolution Transformer (ESRT) for fast and accurate image super-resolution. ESRT is a hybrid Transformer where a CNN-based SR network is first designed in the front to extract deep features. Specifically, there are two backbones for formatting the ESRT: lightweight CNN backbone (LCB) and lightweight Transformer backbone (LTB). Among them, LCB is a lightweight SR network to extract deep SR features at a low computational cost by dynamically adjusting the size of the feature map. LTB is made up of an efficient Transformer (ET) with a small GPU memory occupation, which benefited from the novel efficient multi-head attention (EMHA). In EMHA, a feature split module (FSM) is proposed to split the long sequence into sub-segments and then these sub-segments are applied by attention operation. This module can significantly decrease the GPU memory occupation. Extensive experiments show that our ESRT achieves competitive results. Compared with the original Transformer which occupies 16057M GPU memory, the proposed ET only occupies 4191M GPU memory with better performance.
We propose a novel single-image super-resolution approach based on the geostatistical method of kriging. Kriging is a zero-bias minimum-variance estimator that performs spatial interpolation based on a weighted average of known observations. Rather than solving for the kriging weights via the traditional method of inverting covariance matrices, we propose a supervised form in which we learn a deep network to generate said weights. We combine the kriging weight generation and kriging process into a joint network that can be learned end-to-end. Our network achieves competitive super-resolution results as other state-of-the-art methods. In addition, since the super-resolution process follows a known statistical framework, we are able to estimate bias and variance, something which is rarely possible for other deep networks.
In the recent years impressive advances were made for single image super-resolution. Deep learning is behind a big part of this success. Deep(er) architecture design and external priors modeling are the key ingredients. The internal contents of the low resolution input image is neglected with deep modeling despite the earlier works showing the power of using such internal priors. In this paper we propose a novel deep convolutional neural network carefully designed for robustness and efficiency at both learning and testing. Moreover, we propose a couple of model adaptation strategies to the internal contents of the low resolution input image and analyze their strong points and weaknesses. By trading runtime and using internal priors we achieve 0.1 up to 0.3dB PSNR improvements over best reported results on standard datasets. Our adaptation especially favors images with repetitive structures or under large resolutions. Moreover, it can be combined with other simple techniques, such as back-projection or enhanced prediction, for further improvements.
While the researches on single image super-resolution (SISR), especially equipped with deep neural networks (DNNs), have achieved tremendous successes recently, they still suffer from two major limitations. Firstly, the real image degradation is usually unknown and highly variant from one to another, making it extremely hard to train a single model to handle the general SISR task. Secondly, most of current methods mainly focus on the downsampling process of the degradation, but ignore or underestimate the inevitable noise contamination. For example, the commonly-used independent and identically distributed (i.i.d.) Gaussian noise distribution always largely deviates from the real image noise (e.g., camera sensor noise), which limits their performance in real scenarios. To address these issues, this paper proposes a model-based unsupervised SISR method to deal with the general SISR task with unknown degradations. Instead of the traditional i.i.d. Gaussian noise assumption, a novel patch-based non-i.i.d. noise modeling method is proposed to fit the complex real noise. Besides, a deep generator parameterized by a DNN is used to map the latent variable to the high-resolution image, and the conventional hyper-Laplacian prior is also elaborately embedded into such generator to further constrain the image gradients. Finally, a Monte Carlo EM algorithm is designed to solve our model, which provides a general inference framework to update the image generator both w.r.t. the latent variable and the network parameters. Comprehensive experiments demonstrate that the proposed method can evidently surpass the current state of the art (SotA) method (about 1dB PSNR) not only with a slighter model (0.34M vs. 2.40M) but also faster speed.
Due to the significant information loss in low-resolution (LR) images, it has become extremely challenging to further advance the state-of-the-art of single image super-resolution (SISR). Reference-based super-resolution (RefSR), on the other hand, has proven to be promising in recovering high-resolution (HR) details when a reference (Ref) image with similar content as that of the LR input is given. However, the quality of RefSR can degrade severely when Ref is less similar. This paper aims to unleash the potential of RefSR by leveraging more texture details from Ref images with stronger robustness even when irrelevant Ref images are provided. Inspired by the recent work on image stylization, we formulate the RefSR problem as neural texture transfer. We design an end-to-end deep model which enriches HR details by adaptively transferring the texture from Ref images according to their textural similarity. Instead of matching content in the raw pixel space as done by previous methods, our key contribution is a multi-level matching conducted in the neural space. This matching scheme facilitates multi-scale neural transfer that allows the model to benefit more from those semantically related Ref patches, and gracefully degrade to SISR performance on the least relevant Ref inputs. We build a benchmark dataset for the general research of RefSR, which contains Ref images paired with LR inputs with varying levels of similarity. Both quantitative and qualitative evaluations demonstrate the superiority of our method over state-of-the-art.