ﻻ يوجد ملخص باللغة العربية
A caricature is an artistic form of a persons picture in which certain striking characteristics are abstracted or exaggerated in order to create a humor or sarcasm effect. For numerous caricature related applications such as attribute recognition and caricature editing, face parsing is an essential pre-processing step that provides a complete facial structure understanding. However, current state-of-the-art face parsing methods require large amounts of labeled data on the pixel-level and such process for caricature is tedious and labor-intensive. For real photos, there are numerous labeled datasets for face parsing. Thus, we formulate caricature face parsing as a domain adaptation problem, where real photos play the role of the source domain, adapting to the target caricatures. Specifically, we first leverage a spatial transformer based network to enable shape domain shifts. A feed-forward style transfer network is then utilized to capture texture-level domain gaps. With these two steps, we synthesize face caricatures from real photos, and thus we can use parsing ground truths of the original photos to learn the parsing model. Experimental results on the synthetic and real caricatures demonstrate the effectiveness of the proposed domain adaptation algorithm. Code is available at: https://github.com/ZJULearning/CariFaceParsing .
Scene Graph Generation (SGG) aims to extract entities, predicates and their semantic structure from images, enabling deep understanding of visual content, with many applications such as visual reasoning and image retrieval. Nevertheless, existing SGG
Caricature generation is an interesting yet challenging task. The primary goal is to generate plausible caricatures with reasonable exaggerations given face images. Conventional caricature generation approaches mainly use low-level geometric transfor
Automatic pain assessment has an important potential diagnostic value for populations that are incapable of articulating their pain experiences. As one of the dominating nonverbal channels for eliciting pain expression events, facial expressions has
Fully convolutional networks (FCN) have achieved great success in human parsing in recent years. In conventional human parsing tasks, pixel-level labeling is required for guiding the training, which usually involves enormous human labeling efforts. T
3D face reconstruction plays a very important role in many real-world multimedia applications, including digital entertainment, social media, affection analysis, and person identification. The de-facto pipeline for estimating the parametric face mode