Correction Filter for Single Image Super-Resolution: Robustifying Off-the-Shelf Deep Super-Resolvers

125 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shady Abu Hussein

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Shady Abu Hussein - Tom Tirer -

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The single image super-resolution task is one of the most examined inverse problems in the past decade. In the recent years, Deep Neural Networks (DNNs) have shown superior performance over alternative methods when the acquisition process uses a fixed known downsampling kernel-typically a bicubic kernel. However, several recent works have shown that in practical scenarios, where the test data mismatch the training data (e.g. when the downsampling kernel is not the bicubic kernel or is not available at training), the leading DNN methods suffer from a huge performance drop. Inspired by the literature on generalized sampling, in this work we propose a method for improving the performance of DNNs that have been trained with a fixed kernel on observations acquired by other kernels. For a known kernel, we design a closed-form correction filter that modifies the low-resolution image to match one which is obtained by another kernel (e.g. bicubic), and thus improves the results of existing pre-trained DNNs. For an unknown kernel, we extend this idea and propose an algorithm for blind estimation of the required correction filter. We show that our approach outperforms other super-resolution methods, which are designed for general downsampling kernels.

قيم البحث

579 - Yan Wu , Zhiwu Huang , Suryansh Kumar 2021

Modern solutions to the single image super-resolution (SISR) problem using deep neural networks aim not only at better performance accuracy but also at a lighter and computationally efficient model. To that end, recently, neural architecture search ( NAS) approaches have shown some tremendous potential. Following the same underlying, in this paper, we suggest a novel trilevel NAS method that provides a better balance between different efficiency metrics and performance to solve SISR. Unlike available NAS, our search is more complete, and therefore it leads to an efficient, optimized, and compressed architecture. We innovatively introduce a trilevel search space modeling, i.e., hierarchical modeling on network-, cell-, and kernel-level structures. To make the search on trilevel spaces differentiable and efficient, we exploit a new sparsestmax technique that is excellent at generating sparse distributions of individual neural architecture candidates so that they can be better disentangled for the final selection from the enlarged search space. We further introduce the sorting technique to the sparsestmax relaxation for better network-level compression. The proposed NAS optimization additionally facilitates simultaneous search and training in a single phase, reducing search time and train time. Comprehensive evaluations on the benchmark datasets show our methods clear superiority over the state-of-the-art NAS in terms of a good trade-off between model size, performance, and efficiency.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

EndoL2H: Deep Super-Resolution for Capsule Endoscopy

71 - Yasin Almalioglu , Kutsev Bengisu Ozyoruk , Abdulkadir Gokce 2020

Although wireless capsule endoscopy is the preferred modality for diagnosis and assessment of small bowel diseases, the poor camera resolution is a substantial limitation for both subjective and automated diagnostics. Enhanced-resolution endoscopy ha s shown to improve adenoma detection rate for conventional endoscopy and is likely to do the same for capsule endoscopy. In this work, we propose and quantitatively validate a novel framework to learn a mapping from low-to-high resolution endoscopic images. We combine conditional adversarial networks with a spatial attention block to improve the resolution by up to factors of 8x, 10x, 12x, respectively. Quantitative and qualitative studies performed demonstrate the superiority of EndoL2H over state-of-the-art deep super-resolution methods DBPN, RCAN and SRGAN. MOS tests performed by 30 gastroenterologists qualitatively assess and confirm the clinical relevance of the approach. EndoL2H is generally applicable to any endoscopic capsule system and has the potential to improve diagnosis and better harness computational approaches for polyp detection and characterization. Our code and trained models are available at https://github.com/CapsuleEndoscope/EndoL2H.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

Supervised Deep Kriging for Single-Image Super-Resolution

180 - Gianni Franchi , Angela Yao , Andreas Kolb 2018

We propose a novel single-image super-resolution approach based on the geostatistical method of kriging. Kriging is a zero-bias minimum-variance estimator that performs spatial interpolation based on a weighted average of known observations. Rather t han solving for the kriging weights via the traditional method of inverting covariance matrices, we propose a supervised form in which we learn a deep network to generate said weights. We combine the kriging weight generation and kriging process into a joint network that can be learned end-to-end. Our network achieves competitive super-resolution results as other state-of-the-art methods. In addition, since the super-resolution process follows a known statistical framework, we are able to estimate bias and variance, something which is rarely possible for other deep networks.

الرؤية الحاسوبية وتمييز الأنماط

Deep Back-Projection Networks for Single Image Super-resolution

92 - Muhammad Haris , Greg Shakhnarovich , 2019

Previous feed-forward architectures of recently proposed deep super-resolution networks learn the features of low-resolution inputs and the non-linear mapping from those to a high-resolution output. However, this approach does not fully address the m utual dependencies of low- and high-resolution images. We propose Deep Back-Projection Networks (DBPN), the winner of two image super-resolution challenges (NTIRE2018 and PIRM2018), that exploit iterative up- and down-sampling layers. These layers are formed as a unit providing an error feedback mechanism for projection errors. We construct mutually-connected up- and down-sampling units each of which represents different types of low- and high-resolution components. We also show that extending this idea to demonstrate a new insight towards more efficient network design substantially, such as parameter sharing on the projection module and transition layer on projection step. The experimental results yield superior results and in particular establishing new state-of-the-art results across multiple data sets, especially for large scaling factors such as 8x.

الرؤية الحاسوبية وتمييز الأنماط

Efficient Transformer for Single Image Super-Resolution

107 - Zhisheng Lu , Hong Liu , Juncheng Li 2021

Single image super-resolution task has witnessed great strides with the development of deep learning. However, most existing studies focus on building a more complex neural network with a massive number of layers, bringing heavy computational cost an d memory storage. Recently, as Transformer yields brilliant results in NLP tasks, more and more researchers start to explore the application of Transformer in computer vision tasks. But with the heavy computational cost and high GPU memory occupation of the vision Transformer, the network can not be designed too deep. To address this problem, we propose a novel Efficient Super-Resolution Transformer (ESRT) for fast and accurate image super-resolution. ESRT is a hybrid Transformer where a CNN-based SR network is first designed in the front to extract deep features. Specifically, there are two backbones for formatting the ESRT: lightweight CNN backbone (LCB) and lightweight Transformer backbone (LTB). Among them, LCB is a lightweight SR network to extract deep SR features at a low computational cost by dynamically adjusting the size of the feature map. LTB is made up of an efficient Transformer (ET) with a small GPU memory occupation, which benefited from the novel efficient multi-head attention (EMHA). In EMHA, a feature split module (FSM) is proposed to split the long sequence into sub-segments and then these sub-segments are applied by attention operation. This module can significantly decrease the GPU memory occupation. Extensive experiments show that our ESRT achieves competitive results. Compared with the original Transformer which occupies 16057M GPU memory, the proposed ET only occupies 4191M GPU memory with better performance.

الرؤية الحاسوبية وتمييز الأنماط