Generative Adversarial Networks for Extreme Learned Image Compression

177 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Michael Tschannen

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Eirikur Agustsson - Michael Tschannen - Fabian Mentzer

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present a learned image compression system based on GANs, operating at extremely low bitrates. Our proposed framework combines an encoder, decoder/generator and a multi-scale discriminator, which we train jointly for a generative learned compression objective. The model synthesizes details it cannot afford to store, obtaining visually pleasing results at bitrates where previous methods fail and show strong artifacts. Furthermore, if a semantic label map of the original image is available, our method can fully synthesize unimportant regions in the decoded image such as streets and trees from the label map, proportionally reducing the storage cost. A user study confirms that for low bitrates, our approach is preferred to state-of-the-art methods, even when they use more than double the bits.

قيم البحث

132 - Chen-Hsuan Lin , Ersin Yumer , Oliver Wang 2018

We address the problem of finding realistic geometric corrections to a foreground object such that it appears natural when composited into a background image. To achieve this, we propose a novel Generative Adversarial Network (GAN) architecture that utilizes Spatial Transformer Networks (STNs) as the generator, which we call Spatial Transformer GANs (ST-GANs). ST-GANs seek image realism by operating in the geometric warp parameter space. In particular, we exploit an iterative STN warping scheme and propose a sequential training strategy that achieves better results compared to naive training of a single generator. One of the key advantages of ST-GAN is its applicability to high-resolution images indirectly since the predicted warp parameters are transferable between reference frames. We demonstrate our approach in two applications: (1) visualizing how indoor furniture (e.g. from product images) might be perceived in a room, (2) hallucinating how accessories like glasses would look when matched with real portraits.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Boundless: Generative Adversarial Networks for Image Extension

140 - Piotr Teterwak , Aaron Sarna , Dilip Krishnan 2019

Image extension models have broad applications in image editing, computational photography and computer graphics. While image inpainting has been extensively studied in the literature, it is challenging to directly apply the state-of-the-art inpainti ng methods to image extension as they tend to generate blurry or repetitive pixels with inconsistent semantics. We introduce semantic conditioning to the discriminator of a generative adversarial network (GAN), and achieve strong results on image extension with coherent semantics and visually pleasing colors and textures. We also show promising results in extreme extensions, such as panorama generation.

الرؤية الحاسوبية وتمييز الأنماط

Domain Embedded Multi-model Generative Adversarial Networks for Image-based Face Inpainting

137 - Xian Zhang , Xin Wang , Bin Kong 2020

Prior knowledge of face shape and structure plays an important role in face inpainting. However, traditional face inpainting methods mainly focus on the generated image resolution of the missing portion without consideration of the special particular ities of the human face explicitly and generally produce discordant facial parts. To solve this problem, we present a domain embedded multi-model generative adversarial model for inpainting of face images with large cropped regions. We firstly represent only face regions using the latent variable as the domain knowledge and combine it with the non-face parts textures to generate high-quality face images with plausible contents. Two adversarial discriminators are finally used to judge whether the generated distribution is close to the real distribution or not. It can not only synthesize novel image structures but also explicitly utilize the embedded face domain knowledge to generate better predictions with consistency on structures and appearance. Experiments on both CelebA and CelebA-HQ face datasets demonstrate that our proposed approach achieved state-of-the-art performance and generates higher quality inpainting results than existing ones.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

Interpreting Generative Adversarial Networks for Interactive Image Generation

117 - Bolei Zhou 2021

Great progress has been made by the advances in Generative Adversarial Networks (GANs) for image generation. However, there lacks enough understanding on how a realistic image can be generated by the deep representations of GANs from a random vector. This chapter will give a summary of recent works on interpreting deep generative models. We will see how the human-understandable concepts that emerge in the learned representation can be identified and used for interactive image generation and editing.

الرؤية الحاسوبية وتمييز الأنماط

Non-Parametric Priors For Generative Adversarial Networks

107 - Rajhans Singh 2019

The advent of generative adversarial networks (GAN) has enabled new capabilities in synthesis, interpolation, and data augmentation heretofore considered very challenging. However, one of the common assumptions in most GAN architectures is the assump tion of simple parametric latent-space distributions. While easy to implement, a simple latent-space distribution can be problematic for uses such as interpolation. This is due to distributional mismatches when samples are interpolated in the latent space. We present a straightforward formalization of this problem; using basic results from probability theory and off-the-shelf-optimization tools, we develop ways to arrive at appropriate non-parametric priors. The obtained prior exhibits unusual qualitative properties in terms of its shape, and quantitative benefits in terms of lower divergence with its mid-point distribution. We demonstrate that our designed prior helps improve image generation along any Euclidean straight line during interpolation, both qualitatively and quantitatively, without any additional training or architectural modifications. The proposed formulation is quite flexible, paving the way to impose newer constraints on the latent-space statistics.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي