Towards Disentangling Latent Space for Unsupervised Semantic Face Editing

269 0 0.0 ( 0 )

Download Cite

Added by Kanglin Liu

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Kanglin Liu - Gaofeng Cao - Fei Zhou

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Facial attributes in StyleGAN generated images are entangled in the latent space which makes it very difficult to independently control a specific attribute without affecting the others. Supervised attribute editing requires annotated training data which is difficult to obtain and limits the editable attributes to those with labels. Therefore, unsupervised attribute editing in an disentangled latent space is key to performing neat and versatile semantic face editing. In this paper, we present a new technique termed Structure-Texture Independent Architecture with Weight Decomposition and Orthogonal Regularization (STIA-WO) to disentangle the latent space for unsupervised semantic face editing. By applying STIA-WO to GAN, we have developed a StyleGAN termed STGAN-WO which performs weight decomposition through utilizing the style vector to construct a fully controllable weight matrix to regulate image synthesis, and employs orthogonal regularization to ensure each entry of the style vector only controls one independent feature matrix. To further disentangle the facial attributes, STGAN-WO introduces a structure-texture independent architecture which utilizes two independently and identically distributed (i.i.d.) latent vectors to control the synthesis of the texture and structure components in a disentangled way. Unsupervised semantic editing is achieved by moving the latent code in the coarse layers along its orthogonal directions to change texture related attributes or changing the latent code in the fine layers to manipulate structure related ones. We present experimental results which show that our new STGAN-WO can achieve better attribute editing than state of the art methods.

rate research

Transforming the Latent Space of StyleGAN for Real Face Editing

89 - Heyi Li , Jinlong Liu , Yunzhi Bai 2021

Despite recent advances in semantic manipulation using StyleGAN, semantic editing of real faces remains challenging. The gap between the $W$ space and the $W$+ space demands an undesirable trade-off between reconstruction quality and editing quality. To solve this problem, we propose to expand the latent space by replacing fully-connected layers in the StyleGANs mapping network with attention-based transformers. This simple and effective technique integrates the aforementioned two spaces and transforms them into one new latent space called $W$++. Our modified StyleGAN maintains the state-of-the-art generation quality of the original StyleGAN with moderately better diversity. But more importantly, the proposed $W$++ space achieves superior performance in both reconstruction quality and editing quality. Despite these significant advantages, our $W$++ space supports existing inversion algorithms and editing methods with only negligible modifications thanks to its structural similarity with the $W/W$+ space. Extensive experiments on the FFHQ dataset prove that our proposed $W$++ space is evidently more preferable than the previous $W/W$+ space for real face editing. The code is publicly available for research purposes at https://github.com/AnonSubm2021/TransStyleGAN.

Computer Vision and Pattern Recognition

Disentangled Face Attribute Editing via Instance-Aware Latent Space Search

143 - Yuxuan Han , Jiaolong Yang , 2021

Recent works have shown that a rich set of semantic directions exist in the latent space of Generative Adversarial Networks (GANs), which enables various facial attribute editing applications. However, existing methods may suffer poor attribute variation disentanglement, leading to unwanted change of other attributes when altering the desired one. The semantic directions used by existing methods are at attribute level, which are difficult to model complex attribute correlations, especially in the presence of attribute distribution bias in GANs training set. In this paper, we propose a novel framework (IALS) that performs Instance-Aware Latent-Space Search to find semantic directions for disentangled attribute editing. The instance information is injected by leveraging the supervision from a set of attribute classifiers evaluated on the input images. We further propose a Disentanglement-Transformation (DT) metric to quantify the attribute transformation and disentanglement efficacy and find the optimal control factor between attribute-level and instance-specific directions based on it. Experimental results on both GAN-generated and real-world images collectively show that our method outperforms state-of-the-art methods proposed recently by a wide margin. Code is available at https://github.com/yxuhan/IALS.

Computer Vision and Pattern Recognition

Neural Face Editing with Intrinsic Image Disentangling

145 - Zhixin Shu , Ersin Yumer , Sunil Hadap 2017

Traditional face editing methods often require a number of sophisticated and task specific algorithms to be applied one after the other --- a process that is tedious, fragile, and computationally intensive. In this paper, we propose an end-to-end generative adversarial network that infers a face-specific disentangled representation of intrinsic face properties, including shape (i.e. normals), albedo, and lighting, and an alpha matte. We show that this network can be trained on in-the-wild images by incorporating an in-network physically-based image formation module and appropriate loss functions. Our disentangling latent representation allows for semantically relevant edits, where one aspect of facial appearance can be manipulated while keeping orthogonal properties fixed, and we demonstrate its use for a number of facial editing applications.

Computer Vision and Pattern Recognition

GuidedStyle: Attribute Knowledge Guided Style Manipulation for Semantic Face Editing

80 - Xianxu Hou , Xiaokang Zhang , Linlin Shen 2020

Although significant progress has been made in synthesizing high-quality and visually realistic face images by unconditional Generative Adversarial Networks (GANs), there still lacks of control over the generation process in order to achieve semantic face editing. In addition, it remains very challenging to maintain other face information untouched while editing the target attributes. In this paper, we propose a novel learning framework, called GuidedStyle, to achieve semantic face editing on StyleGAN by guiding the image generation process with a knowledge network. Furthermore, we allow an attention mechanism in StyleGAN generator to adaptively select a single layer for style manipulation. As a result, our method is able to perform disentangled and controllable edits along various attributes, including smiling, eyeglasses, gender, mustache and hair color. Both qualitative and quantitative results demonstrate the superiority of our method over other competing methods for semantic face editing. Moreover, we show that our model can be also applied to different types of real and artistic face editing, demonstrating strong generalization ability.

Computer Vision and Pattern Recognition

Disentangling Latent Space for VAE by Label Relevant/Irrelevant Dimensions

222 - Zhilin Zheng , Li Sun 2018

VAE requires the standard Gaussian distribution as a prior in the latent space. Since all codes tend to follow the same prior, it often suffers the so-called posterior collapse. To avoid this, this paper introduces the class specific distribution for the latent code. But different from CVAE, we present a method for disentangling the latent space into the label relevant and irrelevant dimensions, $bm{mathrm{z}}_s$ and $bm{mathrm{z}}_u$, for a single input. We apply two separated encoders to map the input into $bm{mathrm{z}}_s$ and $bm{mathrm{z}}_u$ respectively, and then give the concatenated code to the decoder to reconstruct the input. The label irrelevant code $bm{mathrm{z}}_u$ represent the common characteristics of all inputs, hence they are constrained by the standard Gaussian, and their encoder is trained in amortized variational inference way, like VAE. While $bm{mathrm{z}}_s$ is assumed to follow the Gaussian mixture distribution in which each component corresponds to a particular class. The parameters for the Gaussian components in $bm{mathrm{z}}_s$ encoder are optimized by the label supervision in a global stochastic way. In theory, we show that our method is actually equivalent to adding a KL divergence term on the joint distribution of $bm{mathrm{z}}_s$ and the class label $c$, and it can directly increase the mutual information between $bm{mathrm{z}}_s$ and the label $c$. Our model can also be extended to GAN by adding a discriminator in the pixel domain so that it produces high quality and diverse images.

Computer Vision and Pattern Recognition