An Efficient Multitask Neural Network for Face Alignment, Head Pose Estimation and Face Tracking


Abstract in English

While convolutional neural networks (CNNs) have significantly boosted the performance of face related algorithms, maintaining accuracy and efficiency simultaneously in practical use remains challenging. Recent study shows that using a cascade of hourglass modules which consist of a number of bottom-up and top-down convolutional layers can extract facial structural information for face alignment to improve accuracy. However, previous studies have shown that features produced by shallow convolutional layers are highly correspond to edges. These features could be directly used to provide the structural information without addition cost. Motivated by this intuition, we propose an efficient multitask face alignment, face tracking and head pose estimation network (ATPN). Specifically, we introduce a shortcut connection between shallow-layer features and deep-layer features to provide the structural information for face alignment and apply the CoordConv to the last few layers to provide coordinate information. The predicted facial landmarks enable us to generate a cheap heatmap which contains both geometric and appearance information for head pose estimation and it also provides attention clues for face tracking. Moreover, the face tracking task saves us the face detection procedure for each frame, which is significant to boost performance for video-based tasks. The proposed framework is evaluated on four benchmark datasets, WFLW, 300VW, WIDER Face and 300W-LP. The experimental results show that the ATPN achieves improved performance compared to previous state-of-the-art methods while having less number of parameters and FLOPS.

Download