No Arabic abstract
The problem of face alignment has been intensively studied in the past years. A large number of novel methods have been proposed and reported very good performance on benchmark dataset such as 300W. However, the differences in the experimental setting and evaluation metric, missing details in the description of the methods make it hard to reproduce the results reported and evaluate the relative merits. For instance, most recent face alignment methods are built on top of face detection but from different face detectors. In this paper, we carry out a rigorous evaluation of these methods by making the following contributions: 1) we proposes a new evaluation metric for face alignment on a set of images, i.e., area under error distribution curve within a threshold, AUC$_alpha$, given the fact that the traditional evaluation measure (mean error) is very sensitive to big alignment error. 2) we extend the 300W database with more practical face detections to make fair comparison possible. 3) we carry out face alignment sensitivity analysis w.r.t. face detection, on both synthetic and real data, using both off-the-shelf and re-retrained models. 4) we study factors that are particularly important to achieve good performance and provide suggestions for practical applications. Most of the conclusions drawn from our comparative analysis cannot be inferred from the original publications.
In recent years, benefiting from the expressive power of Graph Convolutional Networks (GCNs), significant breakthroughs have been made in face clustering. However, rare attention has been paid to GCN-based clustering on imbalanced data. Although imbalance problem has been extensively studied, the impact of imbalanced data on GCN-based linkage prediction task is quite different, which would cause problems in two aspects: imbalanced linkage labels and biased graph representations. The problem of imbalanced linkage labels is similar to that in image classification task, but the latter is a particular problem in GCN-based clustering via linkage prediction. Significantly biased graph representations in training can cause catastrophic overfitting of a GCN model. To tackle these problems, we evaluate the feasibility of those existing methods for imbalanced image classification problem on graphs with extensive experiments, and present a new method to alleviate the imbalanced labels and also augment graph representations using a Reverse-Imbalance Weighted Sampling (RIWS) strategy, followed with insightful analyses and discussions. The code and a series of imbalanced benchmark datasets synthesized from MS-Celeb-1M and DeepFashion are available on https://github.com/espectre/GCNs_on_imbalanced_datasets.
In this paper, we present a deep learning based image feature extraction method designed specifically for face images. To train the feature extraction model, we construct a large scale photo-realistic face image dataset with ground-truth correspondence between multi-view face images, which are synthesized from real photographs via an inverse rendering procedure. The deep face feature (DFF) is trained using correspondence between face images rendered from different views. Using the trained DFF model, we can extract a feature vector for each pixel of a face image, which distinguishes different facial regions and is shown to be more effective than general-purpose feature descriptors for face-related tasks such as matching and alignment. Based on the DFF, we develop a robust face alignment method, which iteratively updates landmarks, pose and 3D shape. Extensive experiments demonstrate that our method can achieve state-of-the-art results for face alignment under highly unconstrained face images.
Expressiveness and generalization of deep models was recently addressed via the connection between neural networks (NNs) and kernel learning, where first-order dynamics of NN during a gradient-descent (GD) optimization were related to gradient similarity kernel, also known as Neural Tangent Kernel (NTK). In the majority of works this kernel is considered to be time-invariant, with its properties being defined entirely by NN architecture and independent of the learning task at hand. In contrast, in this paper we empirically explore these properties along the optimization and show that in practical applications the NTK changes in a very dramatic and meaningful way, with its top eigenfunctions aligning toward the target function learned by NN. Moreover, these top eigenfunctions serve as basis functions for NN output - a function represented by NN is spanned almost completely by them for the entire optimization process. Further, since the learning along top eigenfunctions is typically fast, their alignment with the target function improves the overall optimization performance. In addition, we study how the neural spectrum is affected by learning rate decay, typically done by practitioners, showing various trends in the kernel behavior. We argue that the presented phenomena may lead to a more complete theoretical understanding behind NN learning.
While convolutional neural networks (CNNs) have significantly boosted the performance of face related algorithms, maintaining accuracy and efficiency simultaneously in practical use remains challenging. Recent study shows that using a cascade of hourglass modules which consist of a number of bottom-up and top-down convolutional layers can extract facial structural information for face alignment to improve accuracy. However, previous studies have shown that features produced by shallow convolutional layers are highly correspond to edges. These features could be directly used to provide the structural information without addition cost. Motivated by this intuition, we propose an efficient multitask face alignment, face tracking and head pose estimation network (ATPN). Specifically, we introduce a shortcut connection between shallow-layer features and deep-layer features to provide the structural information for face alignment and apply the CoordConv to the last few layers to provide coordinate information. The predicted facial landmarks enable us to generate a cheap heatmap which contains both geometric and appearance information for head pose estimation and it also provides attention clues for face tracking. Moreover, the face tracking task saves us the face detection procedure for each frame, which is significant to boost performance for video-based tasks. The proposed framework is evaluated on four benchmark datasets, WFLW, 300VW, WIDER Face and 300W-LP. The experimental results show that the ATPN achieves improved performance compared to previous state-of-the-art methods while having less number of parameters and FLOPS.
Despite apparent human-level performances of deep neural networks (DNN), they behave fundamentally differently from humans. They easily change predictions when small corruptions such as blur and noise are applied on the input (lack of robustness), and they often produce confident predictions on out-of-distribution samples (improper uncertainty measure). While a number of researches have aimed to address those issues, proposed solutions are typically expensive and complicated (e.g. Bayesian inference and adversarial training). Meanwhile, many simple and cheap regularization methods have been developed to enhance the generalization of classifiers. Such regularization methods have largely been overlooked as baselines for addressing the robustness and uncertainty issues, as they are not specifically designed for that. In this paper, we provide extensive empirical evaluations on the robustness and uncertainty estimates of image classifiers (CIFAR-100 and ImageNet) trained with state-of-the-art regularization methods. Furthermore, experimental results show that certain regularization methods can serve as strong baseline methods for robustness and uncertainty estimation of DNNs.