No Arabic abstract
Patch-based methods and deep networks have been employed to tackle image inpainting problem, with their own strengths and weaknesses. Patch-based methods are capable of restoring a missing region with high-quality texture through searching nearest neighbor patches from the unmasked regions. However, these methods bring problematic contents when recovering large missing regions. Deep networks, on the other hand, show promising results in completing large regions. Nonetheless, the results often lack faithful and sharp details that resemble the surrounding area. By bringing together the best of both paradigms, we propose a new deep inpainting framework where texture generation is guided by a texture memory of patch samples extracted from unmasked regions. The framework has a novel design that allows texture memory retrieval to be trained end-to-end with the deep inpainting network. In addition, we introduce a patch distribution loss to encourage high-quality patch synthesis. The proposed method shows superior performance both qualitatively and quantitatively on three challenging image benchmarks, i.e., Places, CelebA-HQ, and Paris Street-View datasets.
We present a new deep learning approach to pose-guided resynthesis of human photographs. At the heart of the new approach is the estimation of the complete body surface texture based on a single photograph. Since the input photograph always observes only a part of the surface, we suggest a new inpainting method that completes the texture of the human body. Rather than working directly with colors of texture elements, the inpainting network estimates an appropriate source location in the input image for each element of the body surface. This correspondence field between the input image and the texture is then further warped into the target image coordinate frame based on the desired pose, effectively establishing the correspondence between the source and the target view even when the pose change is drastic. The final convolutional network then uses the established correspondence and all other available information to synthesize the output image. A fully-convolutional architecture with deformable skip connections guided by the estimated correspondence field is used. We show state-of-the-art result for pose-guided image synthesis. Additionally, we demonstrate the performance of our system for garment transfer and pose-guided face resynthesis.
Prior knowledge of face shape and structure plays an important role in face inpainting. However, traditional face inpainting methods mainly focus on the generated image resolution of the missing portion without consideration of the special particularities of the human face explicitly and generally produce discordant facial parts. To solve this problem, we present a domain embedded multi-model generative adversarial model for inpainting of face images with large cropped regions. We firstly represent only face regions using the latent variable as the domain knowledge and combine it with the non-face parts textures to generate high-quality face images with plausible contents. Two adversarial discriminators are finally used to judge whether the generated distribution is close to the real distribution or not. It can not only synthesize novel image structures but also explicitly utilize the embedded face domain knowledge to generate better predictions with consistency on structures and appearance. Experiments on both CelebA and CelebA-HQ face datasets demonstrate that our proposed approach achieved state-of-the-art performance and generates higher quality inpainting results than existing ones.
Remote sensing image (RSI) inpainting plays an important role in real applications. Recently, fully-connected tensor network (FCTN) decomposition has been shown the remarkable ability to fully characterize the global correlation. Considering the global correlation and the nonlocal self-similarity (NSS) of RSIs, this paper introduces the FCTN decomposition to the whole RSI and its NSS groups, and proposes a novel nonlocal patch-based FCTN (NL-FCTN) decomposition for RSI inpainting. Different from other nonlocal patch-based methods, the NL-FCTN decomposition-based method, which increases tensor order by stacking similar small-sized patches to NSS groups, cleverly leverages the remarkable ability of FCTN decomposition to deal with higher-order tensors. Besides, we propose an efficient proximal alternating minimization-based algorithm to solve the proposed NL-FCTN decomposition-based model with a theoretical convergence guarantee. Extensive experiments on RSIs demonstrate that the proposed method achieves the state-of-the-art inpainting performance in all compared methods.
The degree of difficulty in image inpainting depends on the types and sizes of the missing parts. Existing image inpainting approaches usually encounter difficulties in completing the missing parts in the wild with pleasing visual and contextual results as they are trained for either dealing with one specific type of missing patterns (mask) or unilaterally assuming the shapes and/or sizes of the masked areas. We propose a deep generative inpainting network, named DeepGIN, to handle various types of masked images. We design a Spatial Pyramid Dilation (SPD) ResNet block to enable the use of distant features for reconstruction. We also employ Multi-Scale Self-Attention (MSSA) mechanism and Back Projection (BP) technique to enhance our inpainting results. Our DeepGIN outperforms the state-of-the-art approaches generally, including two publicly available datasets (FFHQ and Oxford Buildings), both quantitatively and qualitatively. We also demonstrate that our model is capable of completing masked images in the wild.
Despite the great progress made by deep neural networks in the semantic segmentation task, traditional neural-networkbased methods typically suffer from a shortage of large amounts of pixel-level annotations. Recent progress in fewshot semantic segmentation tackles the issue by only a few pixel-level annotated examples. However, these few-shot approaches cannot easily be applied to multi-way or weak annotation settings. In this paper, we advance the few-shot segmentation paradigm towards a scenario where image-level annotations are available to help the training process of a few pixel-level annotations. Our key idea is to learn a better prototype representation of the class by fusing the knowledge from the image-level labeled data. Specifically, we propose a new framework, called PAIA, to learn the class prototype representation in a metric space by integrating image-level annotations. Furthermore, by considering the uncertainty of pseudo-masks, a distilled soft masked average pooling strategy is designed to handle distractions in image-level annotations. Extensive empirical results on two datasets show superior performance of PAIA.