Do you want to publish a course? Click here

SRDiff: Single Image Super-Resolution with Diffusion Probabilistic Models

143   0   0.0 ( 0 )
 Added by Haoying Li
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

Single image super-resolution (SISR) aims to reconstruct high-resolution (HR) images from the given low-resolution (LR) ones, which is an ill-posed problem because one LR image corresponds to multiple HR images. Recently, learning-based SISR methods have greatly outperformed traditional ones, while suffering from over-smoothing, mode collapse or large model footprint issues for PSNR-oriented, GAN-driven and flow-based methods respectively. To solve these problems, we propose a novel single image super-resolution diffusion probabilistic model (SRDiff), which is the first diffusion-based model for SISR. SRDiff is optimized with a variant of the variational bound on the data likelihood and can provide diverse and realistic SR predictions by gradually transforming the Gaussian noise into a super-resolution (SR) image conditioned on an LR input through a Markov chain. In addition, we introduce residual prediction to the whole framework to speed up convergence. Our extensive experiments on facial and general benchmarks (CelebA and DIV2K datasets) show that 1) SRDiff can generate diverse SR results in rich details with state-of-the-art performance, given only one LR input; 2) SRDiff is easy to train with a small footprint; and 3) SRDiff can perform flexible image manipulation including latent space interpolation and content fusion.



rate research

Read More

Single image super-resolution task has witnessed great strides with the development of deep learning. However, most existing studies focus on building a more complex neural network with a massive number of layers, bringing heavy computational cost and memory storage. Recently, as Transformer yields brilliant results in NLP tasks, more and more researchers start to explore the application of Transformer in computer vision tasks. But with the heavy computational cost and high GPU memory occupation of the vision Transformer, the network can not be designed too deep. To address this problem, we propose a novel Efficient Super-Resolution Transformer (ESRT) for fast and accurate image super-resolution. ESRT is a hybrid Transformer where a CNN-based SR network is first designed in the front to extract deep features. Specifically, there are two backbones for formatting the ESRT: lightweight CNN backbone (LCB) and lightweight Transformer backbone (LTB). Among them, LCB is a lightweight SR network to extract deep SR features at a low computational cost by dynamically adjusting the size of the feature map. LTB is made up of an efficient Transformer (ET) with a small GPU memory occupation, which benefited from the novel efficient multi-head attention (EMHA). In EMHA, a feature split module (FSM) is proposed to split the long sequence into sub-segments and then these sub-segments are applied by attention operation. This module can significantly decrease the GPU memory occupation. Extensive experiments show that our ESRT achieves competitive results. Compared with the original Transformer which occupies 16057M GPU memory, the proposed ET only occupies 4191M GPU memory with better performance.
We propose a novel single-image super-resolution approach based on the geostatistical method of kriging. Kriging is a zero-bias minimum-variance estimator that performs spatial interpolation based on a weighted average of known observations. Rather than solving for the kriging weights via the traditional method of inverting covariance matrices, we propose a supervised form in which we learn a deep network to generate said weights. We combine the kriging weight generation and kriging process into a joint network that can be learned end-to-end. Our network achieves competitive super-resolution results as other state-of-the-art methods. In addition, since the super-resolution process follows a known statistical framework, we are able to estimate bias and variance, something which is rarely possible for other deep networks.
Image super-resolution is a process to enhance image resolution. It is widely used in medical imaging, satellite imaging, target recognition, etc. In this paper, we conduct continuous modeling and assume that the unknown image intensity function is defined on a continuous domain and belongs to a space with a redundant basis. We propose a new iterative model for single image super-resolution based on an observation: an image is consisted of smooth components and non-smooth components, and we use two classes of approximated Heaviside functions (AHFs) to represent them respectively. Due to sparsity of the non-smooth components, a $L_{1}$ model is employed. In addition, we apply the proposed iterative model to image patches to reduce computation and storage. Comparisons with some existing competitive methods show the effectiveness of the proposed method.
In the recent years impressive advances were made for single image super-resolution. Deep learning is behind a big part of this success. Deep(er) architecture design and external priors modeling are the key ingredients. The internal contents of the low resolution input image is neglected with deep modeling despite the earlier works showing the power of using such internal priors. In this paper we propose a novel deep convolutional neural network carefully designed for robustness and efficiency at both learning and testing. Moreover, we propose a couple of model adaptation strategies to the internal contents of the low resolution input image and analyze their strong points and weaknesses. By trading runtime and using internal priors we achieve 0.1 up to 0.3dB PSNR improvements over best reported results on standard datasets. Our adaptation especially favors images with repetitive structures or under large resolutions. Moreover, it can be combined with other simple techniques, such as back-projection or enhanced prediction, for further improvements.
While the researches on single image super-resolution (SISR), especially equipped with deep neural networks (DNNs), have achieved tremendous successes recently, they still suffer from two major limitations. Firstly, the real image degradation is usually unknown and highly variant from one to another, making it extremely hard to train a single model to handle the general SISR task. Secondly, most of current methods mainly focus on the downsampling process of the degradation, but ignore or underestimate the inevitable noise contamination. For example, the commonly-used independent and identically distributed (i.i.d.) Gaussian noise distribution always largely deviates from the real image noise (e.g., camera sensor noise), which limits their performance in real scenarios. To address these issues, this paper proposes a model-based unsupervised SISR method to deal with the general SISR task with unknown degradations. Instead of the traditional i.i.d. Gaussian noise assumption, a novel patch-based non-i.i.d. noise modeling method is proposed to fit the complex real noise. Besides, a deep generator parameterized by a DNN is used to map the latent variable to the high-resolution image, and the conventional hyper-Laplacian prior is also elaborately embedded into such generator to further constrain the image gradients. Finally, a Monte Carlo EM algorithm is designed to solve our model, which provides a general inference framework to update the image generator both w.r.t. the latent variable and the network parameters. Comprehensive experiments demonstrate that the proposed method can evidently surpass the current state of the art (SotA) method (about 1dB PSNR) not only with a slighter model (0.34M vs. 2.40M) but also faster speed.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا