High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling

101 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yu Zeng

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yu Zeng - Zhe Lin - Jimei Yang

الرؤية الحاسوبية وتمييز الأنماط الوسائط المتعددة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Existing image inpainting methods often produce artifacts when dealing with large holes in real applications. To address this challenge, we propose an iterative inpainting method with a feedback mechanism. Specifically, we introduce a deep generative model which not only outputs an inpainting result but also a corresponding confidence map. Using this map as feedback, it progressively fills the hole by trusting only high-confidence pixels inside the hole at each iteration and focuses on the remaining pixels in the next iteration. As it reuses partial predictions from the previous iterations as known pixels, this process gradually improves the result. In addition, we propose a guided upsampling network to enable generation of high-resolution inpainting results. We achieve this by extending the Contextual Attention module to borrow high-resolution feature patches in the input image. Furthermore, to mimic real object removal scenarios, we collect a large object mask dataset and synthesize more realistic training data that better simulates user inputs. Experiments show that our method significantly outperforms existing methods in both quantitative and qualitative evaluations. More results and Web APP are available at https://zengxianyu.github.io/iic.

قيم البحث

اقرأ أيضاً

Text-Guided Neural Image Inpainting

260 - Lisai Zhang , Qingcai Chen , Baotian Hu 2020

Image inpainting task requires filling the corrupted image with contents coherent with the context. This research field has achieved promising progress by using neural image inpainting methods. Nevertheless, there is still a critical challenge in gue ssing the missed content with only the context pixels. The goal of this paper is to fill the semantic information in corrupted images according to the provided descriptive text. Unique from existing text-guided image generation works, the inpainting models are required to compare the semantic content of the given text and the remaining part of the image, then find out the semantic content that should be filled for missing part. To fulfill such a task, we propose a novel inpainting model named Text-Guided Dual Attention Inpainting Network (TDANet). Firstly, a dual multimodal attention mechanism is designed to extract the explicit semantic information about the corrupted regions, which is done by comparing the descriptive text and complementary image areas through reciprocal attention. Secondly, an image-text matching loss is applied to maximize the semantic similarity of the generated image and the text. Experiments are conducted on two open datasets. Results show that the proposed TDANet model reaches new state-of-the-art on both quantitative and qualitative measures. Result analysis suggests that the generated images are consistent with the guidance text, enabling the generation of various results by providing different descriptions. Codes are available at https://github.com/idealwhite/TDANet

الرؤية الحاسوبية وتمييز الأنماط الحساب واللغة

CR-Fill: Generative Image Inpainting with Auxiliary Contexutal Reconstruction

396 - Yu Zeng , Zhe Lin , Huchuan Lu 2020

Recent deep generative inpainting methods use attention layers to allow the generator to explicitly borrow feature patches from the known region to complete a missing region. Due to the lack of supervision signals for the correspondence between missi ng regions and known regions, it may fail to find proper reference features, which often leads to artifacts in the results. Also, it computes pair-wise similarity across the entire feature map during inference bringing a significant computational overhead. To address this issue, we propose to teach such patch-borrowing behavior to an attention-free generator by joint training of an auxiliary contextual reconstruction task, which encourages the generated output to be plausible even when reconstructed by surrounding regions. The auxiliary branch can be seen as a learnable loss function, i.e. named as contextual reconstruction (CR) loss, where query-reference feature similarity and reference-based reconstructor are jointly optimized with the inpainting generator. The auxiliary branch (i.e. CR loss) is required only during training, and only the inpainting generator is required during the inference. Experimental results demonstrate that the proposed inpainting model compares favourably against the state-of-the-art in terms of quantitative and visual performance.

الرؤية الحاسوبية وتمييز الأنماط الوسائط المتعددة

Symmetric Skip Connection Wasserstein GAN for High-Resolution Facial Image Inpainting

79 - Jireh Jam , Connah Kendrick , Vincent Drouard 2020

The state-of-the-art facial image inpainting methods achieved promising results but face realism preservation remains a challenge. This is due to limitations such as; failures in preserving edges and blurry artefacts. To overcome these limitations, w e propose a Symmetric Skip Connection Wasserstein Generative Adversarial Network (S-WGAN) for high-resolution facial image inpainting. The architecture is an encoder-decoder with convolutional blocks, linked by skip connections. The encoder is a feature extractor that captures data abstractions of an input image to learn an end-to-end mapping from an input (binary masked image) to the ground-truth. The decoder uses learned abstractions to reconstruct the image. With skip connections, S-WGAN transfers image details to the decoder. Additionally, we propose a Wasserstein-Perceptual loss function to preserve colour and maintain realism on a reconstructed image. We evaluate our method and the state-of-the-art methods on CelebA-HQ dataset. Our results show S-WGAN produces sharper and more realistic images when visually compared with other methods. The quantitative measures show our proposed S-WGAN achieves the best Structure Similarity Index Measure (SSIM) of 0.94.

الرؤية الحاسوبية وتمييز الأنماط معالجة الصور والفيديو

Weak Texture Information Map Guided Image Super-resolution with Deep Residual Networks

79 - Bo Fu , Liyan Wang , Yuechu Wu 2020

Single image super-resolution (SISR) is an image processing task which obtains high-resolution (HR) image from a low-resolution (LR) image. Recently, due to the capability in feature extraction, a series of deep learning methods have brought importan t crucial improvement for SISR. However, we observe that no matter how deeper the networks are designed, they usually do not have good generalization ability, which leads to the fact that almost all of existing SR methods have poor performances on restoration of the weak texture details. To solve these problems, we propose a weak texture information map guided image super-resolution with deep residual networks. It contains three sub-networks, one main network which extracts the main features and fuses weak texture details, another two auxiliary networks extract the weak texture details fallen in the main network. Two part of networks work cooperatively, the auxiliary networks predict and integrates week texture information into the main network, which is conducive to the main network learning more inconspicuous details. Experiments results demonstrate that our methods performs achieve the state-of-the-art quantitatively. Specifically, the image super-resolution results of our method own more weak texture details.

معالجة الصور والفيديو الوسائط المتعددة

GUN: Gradual Upsampling Network for Single Image Super-Resolution

110 - Yang Zhao , Guoqing Li , Wenjun Xie 2017

In this paper, an efficient super-resolution (SR) method based on deep convolutional neural network (CNN) is proposed, namely Gradual Upsampling Network (GUN). Recent CNN based SR methods often preliminarily magnify the low resolution (LR) input to h igh resolution (HR) and then reconstruct the HR input, or directly reconstruct the LR input and then recover the HR result at the last layer. The proposed GUN utilizes a gradual process instead of these two commonly used frameworks. The GUN consists of an input layer, multiple upsampling and convolutional layers, and an output layer. By means of the gradual process, the proposed network can simplify the direct SR problem to multistep easier upsampling tasks with very small magnification factor in each step. Furthermore, a gradual training strategy is presented for the GUN. In the proposed training process, an initial network can be easily trained with edge-like samples, and then the weights are gradually tuned with more complex samples. The GUN can recover fine and vivid results, and is easy to be trained. The experimental results on several image sets demonstrate the effectiveness of the proposed network.

الرؤية الحاسوبية وتمييز الأنماط