Do you want to publish a course? Click here

SROBB: Targeted Perceptual Loss for Single Image Super-Resolution

117   0   0.0 ( 0 )
 Added by Mohammad Saeed Rad
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

By benefiting from perceptual losses, recent studies have improved significantly the performance of the super-resolution task, where a high-resolution image is resolved from its low-resolution counterpart. Although such objective functions generate near-photorealistic results, their capability is limited, since they estimate the reconstruction error for an entire image in the same way, without considering any semantic information. In this paper, we propose a novel method to benefit from perceptual loss in a more objective way. We optimize a deep network-based decoder with a targeted objective function that penalizes images at different semantic levels using the corresponding terms. In particular, the proposed method leverages our proposed OBB (Object, Background and Boundary) labels, generated from segmentation labels, to estimate a suitable perceptual loss for boundaries, while considering texture similarity for backgrounds. We show that our proposed approach results in more realistic textures and sharper edges, and outperforms other state-of-the-art algorithms in terms of both qualitative results on standard benchmarks and results of extensive user studies.



rate research

Read More

This paper proposes an explicit way to optimize the super-resolution network for generating visually pleasing images. The previous approaches use several loss functions which is hard to interpret and has the implicit relationships to improve the perceptual score. We show how to exploit the machine learning based model which is directly trained to provide the perceptual score on generated images. It is believed that these models can be used to optimizes the super-resolution network which is easier to interpret. We further analyze the characteristic of the existing loss and our proposed explicit perceptual loss for better interpretation. The experimental results show the explicit approach has a higher perceptual score than other approaches. Finally, we demonstrate the relation of explicit perceptual loss and visually pleasing images using subjective evaluation.
Image quality measurement is a critical problem for image super-resolution (SR) algorithms. Usually, they are evaluated by some well-known objective metrics, e.g., PSNR and SSIM, but these indices cannot provide suitable results in accordance with the perception of human being. Recently, a more reasonable perception measurement has been proposed in [1], which is also adopted by the PIRM-SR 2018 challenge. In this paper, motivated by [1], we aim to generate a high-quality SR result which balances between the two indices, i.e., the perception index and root-mean-square error (RMSE). To do so, we design a new deep SR framework, dubbed Bi-GANs-ST, by integrating two complementary generative adversarial networks (GAN) branches. One is memory residual SRGAN (MR-SRGAN), which emphasizes on improving the objective performance, such as reducing the RMSE. The other is weight perception SRGAN (WP-SRGAN), which obtains the result that favors better subjective perception via a two-stage adversarial training mechanism. Then, to produce final result with excellent perception scores and RMSE, we use soft-thresholding method to merge the results generated by the two GANs. Our method performs well on the perceptual image super-resolution task of the PIRM 2018 challenge. Experimental results on five benchmarks show that our proposal achieves highly competent performance compared with other state-of-the-art methods.
Single image super-resolution task has witnessed great strides with the development of deep learning. However, most existing studies focus on building a more complex neural network with a massive number of layers, bringing heavy computational cost and memory storage. Recently, as Transformer yields brilliant results in NLP tasks, more and more researchers start to explore the application of Transformer in computer vision tasks. But with the heavy computational cost and high GPU memory occupation of the vision Transformer, the network can not be designed too deep. To address this problem, we propose a novel Efficient Super-Resolution Transformer (ESRT) for fast and accurate image super-resolution. ESRT is a hybrid Transformer where a CNN-based SR network is first designed in the front to extract deep features. Specifically, there are two backbones for formatting the ESRT: lightweight CNN backbone (LCB) and lightweight Transformer backbone (LTB). Among them, LCB is a lightweight SR network to extract deep SR features at a low computational cost by dynamically adjusting the size of the feature map. LTB is made up of an efficient Transformer (ET) with a small GPU memory occupation, which benefited from the novel efficient multi-head attention (EMHA). In EMHA, a feature split module (FSM) is proposed to split the long sequence into sub-segments and then these sub-segments are applied by attention operation. This module can significantly decrease the GPU memory occupation. Extensive experiments show that our ESRT achieves competitive results. Compared with the original Transformer which occupies 16057M GPU memory, the proposed ET only occupies 4191M GPU memory with better performance.
RGBD images, combining high-resolution color and lower-resolution depth from various types of depth sensors, are increasingly common. One can significantly improve the resolution of depth maps by taking advantage of color information; deep learning methods make combining color and depth information particularly easy. However, fusing these two sources of data may lead to a variety of artifacts. If depth maps are used to reconstruct 3D shapes, e.g., for virtual reality applications, the visual quality of upsampled images is particularly important. The main idea of our approach is to measure the quality of depth map upsampling using renderings of resulting 3D surfaces. We demonstrate that a simple visual appearance-based loss, when used with either a trained CNN or simply a deep prior, yields significantly improved 3D shapes, as measured by a number of existing perceptual metrics. We compare this approach with a number of existing optimization and learning-based techniques.
We propose a novel single-image super-resolution approach based on the geostatistical method of kriging. Kriging is a zero-bias minimum-variance estimator that performs spatial interpolation based on a weighted average of known observations. Rather than solving for the kriging weights via the traditional method of inverting covariance matrices, we propose a supervised form in which we learn a deep network to generate said weights. We combine the kriging weight generation and kriging process into a joint network that can be learned end-to-end. Our network achieves competitive super-resolution results as other state-of-the-art methods. In addition, since the super-resolution process follows a known statistical framework, we are able to estimate bias and variance, something which is rarely possible for other deep networks.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا