Do you want to publish a course? Click here

Face Images as Jigsaw Puzzles: Compositional Perception of Human Faces for Machines Using Generative Adversarial Networks

77   0   0.0 ( 0 )
 Added by Mahla Abdolahnejad
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

An important goal in human-robot-interaction (HRI) is for machines to achieve a close to human level of face perception. One of the important differences between machine learning and human intelligence is the lack of compositionality. This paper introduces a new scheme to enable generative adversarial networks to learn the distribution of face images composed of smaller parts. This results in a more flexible machine face perception and easier generalization to outside training examples. We demonstrate that this model is able to produce realistic high-quality face images by generating and piecing together the parts. Additionally, we demonstrate that this model learns the relations between the facial parts and their distributions. Therefore, the specific facial parts are interchangeable between generated face images.

rate research

Read More

The paper proposes a solution based on Generative Adversarial Network (GAN) for solving jigsaw puzzles. The problem assumes that an image is cut into equal square pieces, and asks to recover the image according to pieces information. Conventional jigsaw solvers often determine piece relationships based on the piece boundaries, which ignore the important semantic information. In this paper, we propose JigsawGAN, a GAN-based self-supervised method for solving jigsaw puzzles with unpaired images (with no prior knowledge of the initial images). We design a multi-task pipeline that includes, (1) a classification branch to classify jigsaw permutations, and (2) a GAN branch to recover features to images with correct orders. The classification branch is constrained by the pseudo-labels generated according to the shuffled pieces. The GAN branch concentrates on the image semantic information, among which the generator produces the natural images to fool the discriminator with reassembled pieces, while the discriminator distinguishes whether a given image belongs to the synthesized or the real target manifold. These two branches are connected by a flow-based warp that is applied to warp features to correct order according to the classification results. The proposed method can solve jigsaw puzzles more efficiently by utilizing both semantic information and edge information simultaneously. Qualitative and quantitative comparisons against several leading prior methods demonstrate the superiority of our method.
Face aging is the task aiming to translate the faces in input images to designated ages. To simplify the problem, previous methods have limited themselves only able to produce discrete age groups, each of which consists of ten years. Consequently, the exact ages of the translated results are unknown and it is unable to obtain the faces of different ages within groups. To this end, we propose the continuous face aging generative adversarial networks (CFA-GAN). Specifically, to make the continuous aging feasible, we propose to decompose image features into two orthogonal features: the identity and the age basis features. Moreover, we introduce the novel loss function for identity preservation which maximizes the cosine similarity between the original and the generated identity basis features. With the qualitative and quantitative evaluations on MORPH, we demonstrate the realistic and continuous aging ability of our model, validating its superiority against existing models. To the best of our knowledge, this work is the first attempt to handle continuous target ages.
137 - Xian Zhang , Xin Wang , Bin Kong 2020
Prior knowledge of face shape and structure plays an important role in face inpainting. However, traditional face inpainting methods mainly focus on the generated image resolution of the missing portion without consideration of the special particularities of the human face explicitly and generally produce discordant facial parts. To solve this problem, we present a domain embedded multi-model generative adversarial model for inpainting of face images with large cropped regions. We firstly represent only face regions using the latent variable as the domain knowledge and combine it with the non-face parts textures to generate high-quality face images with plausible contents. Two adversarial discriminators are finally used to judge whether the generated distribution is close to the real distribution or not. It can not only synthesize novel image structures but also explicitly utilize the embedded face domain knowledge to generate better predictions with consistency on structures and appearance. Experiments on both CelebA and CelebA-HQ face datasets demonstrate that our proposed approach achieved state-of-the-art performance and generates higher quality inpainting results than existing ones.
62 - Shiqing Fan , Ye Luo 2021
Low-quality face image restoration is a popular research direction in todays computer vision field. It can be used as a pre-work for tasks such as face detection and face recognition. At present, there is a lot of work to solve the problem of low-quality faces under various environmental conditions. This paper mainly focuses on the restoration of motion-blurred faces. In increasingly abundant mobile scenes, the fast recovery of motion-blurred faces can bring highly effective speed improvements in tasks such as face matching. In order to achieve this goal, a deblurring method for motion-blurred facial image signals based on generative adversarial networks(GANs) is proposed. It uses an end-to-end method to train a sharp image generator, i.e., a processor for motion-blurred facial images. This paper introduce the processing progress of motion-blurred images, the development and changes of GANs and some basic concepts. After that, it give the details of network structure and training optimization design of the image processor. Then we conducted a motion blur image generation experiment on some general facial data set, and used the pairs of blurred and sharp face image data to perform the training and testing experiments of the processor GAN, and gave some visual displays. Finally, MTCNN is used to detect the faces of the image generated by the deblurring processor, and compare it with the result of the blurred image. From the results, the processing effect of the deblurring processor on the motion-blurred picture has a significant improvement both in terms of intuition and evaluation indicators of face detection.
Subsampling unconditional generative adversarial networks (GANs) to improve the overall image quality has been studied recently. However, these methods often require high training costs (e.g., storage space, parameter tuning) and may be inefficient or even inapplicable for subsampling conditional GANs, such as class-conditional GANs and continuous conditional GANs (CcGANs), when the condition has many distinct values. In this paper, we propose an efficient method called conditional density ratio estimation in feature space with conditional Softplus loss (cDRE-F-cSP). With cDRE-F-cSP, we estimate an images conditional density ratio based on a novel conditional Softplus (cSP) loss in the feature space learned by a specially designed ResNet-34 or sparse autoencoder. We then derive the error bound of a conditional density ratio model trained with the proposed cSP loss. Finally, we propose a rejection sampling scheme, termed cDRE-F-cSP+RS, which can subsample both class-conditional GANs and CcGANs efficiently. An extra filtering scheme is also developed for CcGANs to increase the label consistency. Experiments on CIFAR-10 and Tiny-ImageNet datasets show that cDRE-F-cSP+RS can substantially improve the Intra-FID and FID scores of BigGAN. Experiments on RC-49 and UTKFace datasets demonstrate that cDRE-F-cSP+RS also improves Intra-FID, Diversity, and Label Score of CcGANs. Moreover, to show the high efficiency of cDRE-F-cSP+RS, we compare it with the state-of-the-art unconditional subsampling method (i.e., DRE-F-SP+RS). With comparable or even better performance, cDRE-F-cSP+RS only requires about textbf{10}% and textbf{1.7}% of the training costs spent respectively on CIFAR-10 and UTKFace by DRE-F-SP+RS.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا