ترغب بنشر مسار تعليمي؟ اضغط هنا

SfSNet: Learning Shape, Reflectance and Illuminance of Faces in the Wild

70   0   0.0 ( 0 )
 نشر من قبل Soumyadip Sengupta
 تاريخ النشر 2017
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We present SfSNet, an end-to-end learning framework for producing an accurate decomposition of an unconstrained human face image into shape, reflectance and illuminance. SfSNet is designed to reflect a physical lambertian rendering model. SfSNet learns from a mixture of labeled synthetic and unlabeled real world images. This allows the network to capture low frequency variations from synthetic and high frequency details from real images through the photometric reconstruction loss. SfSNet consists of a new decomposition architecture with residual blocks that learns a complete separation of albedo and normal. This is used along with the original image to predict lighting. SfSNet produces significantly better quantitative and qualitative results than state-of-the-art methods for inverse rendering and independent normal and illumination estimation.



قيم البحث

اقرأ أيضاً

There are demographic biases in current models used for facial recognition (FR). Our Balanced Faces In the Wild (BFW) dataset serves as a proxy to measure bias across ethnicity and gender subgroups, allowing one to characterize FR performances per su bgroup. We show performances are non-optimal when a single score threshold is used to determine whether sample pairs are genuine or imposter. Across subgroups, performance ratings vary from the reported across the entire dataset. Thus, claims of specific error rates only hold true for populations matching that of the validation data. We mitigate the imbalanced performances using a novel domain adaptation learning scheme on the facial features extracted using state-of-the-art. Not only does this technique balance performance, but it also boosts the overall performance. A benefit of the proposed is to preserve identity information in facial features while removing demographic knowledge in the lower dimensional features. The removal of demographic knowledge prevents future potential biases from being injected into decision-making. This removal satisfies privacy concerns. We explore why this works qualitatively; we also show quantitatively that subgroup classifiers can no longer learn from the features mapped by the proposed.
Recently, image-to-image translation has been made much progress owing to the success of conditional Generative Adversarial Networks (cGANs). And some unpaired methods based on cycle consistency loss such as DualGAN, CycleGAN and DiscoGAN are really popular. However, its still very challenging for translation tasks with the requirement of high-level visual information conversion, such as photo-to-caricature translation that requires satire, exaggeration, lifelikeness and artistry. We present an approach for learning to translate faces in the wild from the source photo domain to the target caricature domain with different styles, which can also be used for other high-level image-to-image translation tasks. In order to capture global structure with local statistics while translation, we design a dual pathway model with one coarse discriminator and one fine discriminator. For generator, we provide one extra perceptual loss in association with adversarial loss and cycle consistency loss to achieve representation learning for two different domains. Also the style can be learned by the auxiliary noise input. Experiments on photo-to-caricature translation of faces in the wild show considerable performance gain of our proposed method over state-of-the-art translation methods as well as its potential real applications.
While deep face recognition has benefited significantly from large-scale labeled data, current research is focused on leveraging unlabeled data to further boost performance, reducing the cost of human annotation. Prior work has mostly been in control led settings, where the labeled and unlabeled data sets have no overlapping identities by construction. This is not realistic in large-scale face recognition, where one must contend with such overlaps, the frequency of which increases with the volume of data. Ignoring identity overlap leads to significant labeling noise, as data from the same identity is split into multiple clusters. To address this, we propose a novel identity separation method based on extreme value theory. It is formulated as an out-of-distribution detection algorithm, and greatly reduces the problems caused by overlapping-identity label noise. Considering cluster assignments as pseudo-labels, we must also overcome the labeling noise from clustering errors. We propose a modulation of the cosine loss, where the modulation weights correspond to an estimate of clustering uncertainty. Extensive experiments on both controlled and real settings demonstrate our methods consistent improvements over supervised baselines, e.g., 11.6% improvement on IJB-A verification.
Face detection methods have relied on face datasets for training. However, existing face datasets tend to be in small scales for face learning in both constrained and unconstrained environments. In this paper, we first introduce our large-scale image datasets, Large-scale Labeled Face (LSLF) and noisy Large-scale Labeled Non-face (LSLNF). Our LSLF dataset consists of a large number of unconstrained multi-view and partially occluded faces. The faces have many variations in color and grayscale, image quality, image resolution, image illumination, image background, image illusion, human face, cartoon face, facial expression, light and severe partial facial occlusion, make up, gender, age, and race. Many of these faces are partially occluded with accessories such as tattoos, hats, glasses, sunglasses, hands, hair, beards, scarves, microphones, or other objects or persons. The LSLF dataset is currently the largest labeled face image dataset in the literature in terms of the number of labeled images and the number of individuals compared to other existing labeled face image datasets. Second, we introduce our CrowedFaces and CrowedNonFaces image datasets. The crowedFaces and CrowedNonFaces datasets include faces and non-faces images from crowed scenes. These datasets essentially aim for researchers to provide a large number of training examples with many variations for large scale face learning and face recognition tasks.
3D shape completion for real data is important but challenging, since partial point clouds acquired by real-world sensors are usually sparse, noisy and unaligned. Different from previous methods, we address the problem of learning 3D complete shape f rom unaligned and real-world partial point clouds. To this end, we propose a weakly-supervised method to estimate both 3D canonical shape and 6-DoF pose for alignment, given multiple partial observations associated with the same instance. The network jointly optimizes canonical shapes and poses with multi-view geometry constraints during training, and can infer the complete shape given a single partial point cloud. Moreover, learned pose estimation can facilitate partial point cloud registration. Experiments on both synthetic and real data show that it is feasible and promising to learn 3D shape completion through large-scale data without shape and pose supervision.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا