Computational efficient deep neural network with difference attention maps for facial action unit detection

73 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Kejun Wang

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jing Chen - Chenhui Wang - Kejun Wang

الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we propose a computational efficient end-to-end training deep neural network (CEDNN) model and spatial attention maps based on difference images. Firstly, the difference image is generated by image processing. Then five binary images of difference images are obtained using different thresholds, which are used as spatial attention maps. We use group convolution to reduce model complexity. Skip connection and $text{1}times text{1}$ convolution are used to ensure good performance even if the network model is not deep. As an input, spatial attention map can be selectively fed into the input of each block. The feature maps tend to focus on the parts that are related to the target task better. In addition, we only need to adjust the parameters of classifier to train different numbers of AU. It can be easily extended to varying datasets without increasing too much computation. A large number of experimental results show that the proposed CEDNN is obviously better than the traditional deep learning method on DISFA+ and CK+ datasets. After adding spatial attention maps, the result is better than the most advanced AU detection method. At the same time, the scale of the network is small, the running speed is fast, and the requirement for experimental equipment is low.

قيم البحث

86 - Zhiwen Shao , Zhilei Liu , Jianfei Cai 2018

Facial action unit (AU) detection and face alignment are two highly correlated tasks since facial landmarks can provide precise AU locations to facilitate the extraction of meaningful local features for AU detection. Most existing AU detection works often treat face alignment as a preprocessing and handle the two tasks independently. In this paper, we propose a novel end-to-end deep learning framework for joint AU detection and face alignment, which has not been explored before. In particular, multi-scale shared features are learned firstly, and high-level features of face alignment are fed into AU detection. Moreover, to extract precise local features, we propose an adaptive attention learning module to refine the attention map of each AU adaptively. Finally, the assembled local features are integrated with face alignment features and global features for AU detection. Experiments on BP4D and DISFA benchmarks demonstrate that our framework significantly outperforms the state-of-the-art methods for AU detection.

الرؤية الحاسوبية وتمييز الأنماط

Deep Structure Inference Network for Facial Action Unit Recognition

89 - Ciprian A. Corneanu , Meysam Madadi , Sergio Escalera 2018

Facial expressions are combinations of basic components called Action Units (AU). Recognizing AUs is key for developing general facial expression analysis. In recent years, most efforts in automatic AU recognition have been dedicated to learning comb inations of local features and to exploiting correlations between Action Units. In this paper, we propose a deep neural architecture that tackles both problems by combining learned local and global features in its initial stages and replicating a message passing algorithm between classes similar to a graphical model inference approach in later stages. We show that by training the model end-to-end with increased supervision we improve state-of-the-art by 5.3% and 8.2% performance on BP4D and DISFA datasets, respectively.

الرؤية الحاسوبية وتمييز الأنماط

Facial Action Unit Detection Using Attention and Relation Learning

124 - Zhiwen Shao , Zhilei Liu , Jianfei Cai 2018

Attention mechanism has recently attracted increasing attentions in the field of facial action unit (AU) detection. By finding the region of interest of each AU with the attention mechanism, AU-related local features can be captured. Most of the exis ting attention based AU detection works use prior knowledge to predefine fixed attentions or refine the predefined attentions within a small range, which limits their capacity to model various AUs. In this paper, we propose an end-to-end deep learning based attention and relation learning framework for AU detection with only AU labels, which has not been explored before. In particular, multi-scale features shared by each AU are learned firstly, and then both channel-wise and spatial attentions are adaptively learned to select and extract AU-related local features. Moreover, pixel-level relations for AUs are further captured to refine spatial attentions so as to extract more relevant local features. Without changing the network architecture, our framework can be easily extended for AU intensity estimation. Extensive experiments show that our framework (i) soundly outperforms the state-of-the-art methods for both AU detection and AU intensity estimation on the challenging BP4D, DISFA, FERA 2015 and BP4D+ benchmarks, (ii) can adaptively capture the correlated regions of each AU, and (iii) also works well under severe occlusions and large poses.

الرؤية الحاسوبية وتمييز الأنماط

Spatio-Temporal Relation and Attention Learning for Facial Action Unit Detection

93 - Zhiwen Shao , Lixin Zou , Jianfei Cai 2020

Spatio-temporal relations among facial action units (AUs) convey significant information for AU detection yet have not been thoroughly exploited. The main reasons are the limited capability of current AU detection works in simultaneously learning spa tial and temporal relations, and the lack of precise localization information for AU feature learning. To tackle these limitations, we propose a novel spatio-temporal relation and attention learning framework for AU detection. Specifically, we introduce a spatio-temporal graph convolutional network to capture both spatial and temporal relations from dynamic AUs, in which the AU relations are formulated as a spatio-temporal graph with adaptively learned instead of predefined edge weights. Moreover, the learning of spatio-temporal relations among AUs requires individual AU features. Considering the dynamism and shape irregularity of AUs, we propose an attention regularization method to adaptively learn regional attentions that capture highly relevant regions and suppress irrelevant regions so as to extract a complete feature for each AU. Extensive experiments show that our approach achieves substantial improvements over the state-of-the-art AU detection methods on BP4D and especially DISFA benchmarks.

الرؤية الحاسوبية وتمييز الأنماط

Relation Modeling with Graph Convolutional Networks for Facial Action Unit Detection

112 - Zhilei Liu , Jiahui Dong , Cuicui Zhang 2019

Most existing AU detection works considering AU relationships are relying on probabilistic graphical models with manually extracted features. This paper proposes an end-to-end deep learning framework for facial AU detection with graph convolutional n etwork (GCN) for AU relation modeling, which has not been explored before. In particular, AU related regions are extracted firstly, latent representations full of AU information are learned through an auto-encoder. Moreover, each latent representation vector is feed into GCN as a node, the connection mode of GCN is determined based on the relationships of AUs. Finally, the assembled features updated through GCN are concatenated for AU detection. Extensive experiments on BP4D and DISFA benchmarks demonstrate that our framework significantly outperforms the state-of-the-art methods for facial AU detection. The proposed framework is also validated through a series of ablation studies.

الرؤية الحاسوبية وتمييز الأنماط

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة حماه

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Computational efficient deep neural network with difference attention maps for facial action unit detection

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً