ترغب بنشر مسار تعليمي؟ اضغط هنا

SMA-STN: Segmented Movement-Attending Spatiotemporal Network forMicro-Expression Recognition

171   0   0.0 ( 0 )
 نشر من قبل Jiateng Liu
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Correctly perceiving micro-expression is difficult since micro-expression is an involuntary, repressed, and subtle facial expression, and efficiently revealing the subtle movement changes and capturing the significant segments in a micro-expression sequence is the key to micro-expression recognition (MER). To handle the crucial issue, in this paper, we firstly propose a dynamic segmented sparse imaging module (DSSI) to compute dynamic images as local-global spatiotemporal descriptors under a unique sampling protocol, which reveals the subtle movement changes visually in an efficient way. Secondly, a segmented movement-attending spatiotemporal network (SMA-STN) is proposed to further unveil imperceptible small movement changes, which utilizes a spatiotemporal movement-attending module (STMA) to capture long-distance spatial relation for facial expression and weigh temporal segments. Besides, a deviation enhancement loss (DE-Loss) is embedded in the SMA-STN to enhance the robustness of SMA-STN to subtle movement changes in feature level. Extensive experiments on three widely used benchmarks, i.e., CASME II, SAMM, and SHIC, show that the proposed SMA-STN achieves better MER performance than other state-of-the-art methods, which proves that the proposed method is effective to handle the challenging MER problem.



قيم البحث

اقرأ أيضاً

Micro-expression can reflect peoples real emotions. Recognizing micro-expressions is difficult because they are small motions and have a short duration. As the research is deepening into micro-expression recognition, many effective features and metho ds have been proposed. To determine which direction of movement feature is easier for distinguishing micro-expressions, this paper selects 18 directions (including three types of horizontal, vertical and oblique movements) and proposes a new low-dimensional feature called the Histogram of Single Direction Gradient (HSDG) to study this topic. In this paper, HSDG in every direction is concatenated with LBP-TOP to obtain the LBP with Single Direction Gradient (LBP-SDG) and analyze which direction of movement feature is more discriminative for micro-expression recognition. As with some existing work, Euler Video Magnification (EVM) is employed as a preprocessing step. The experiments on the CASME II and SMIC-HS databases summarize the effective and optimal directions and demonstrate that HSDG in an optimal direction is discriminative, and the corresponding LBP-SDG achieves state-of-the-art performance using EVM.
Facial expression recognition is a challenging task, arguably because of large intra-class variations and high inter-class similarities. The core drawback of the existing approaches is the lack of ability to discriminate the changes in appearance cau sed by emotions and identities. In this paper, we present a novel identity-enhanced network (IDEnNet) to eliminate the negative impact of identity factor and focus on recognizing facial expressions. Spatial fusion combined with self-constrained multi-task learning are adopted to jointly learn the expression representations and identity-related information. We evaluate our approach on three popular datasets, namely Oulu-CASIA, CK+ and MMI. IDEnNet improves the baseline consistently, and achieves the best or comparable state-of-the-art on all three datasets.
Micro-expression recognition (textbf{MER}) has attracted lots of researchers attention in a decade. However, occlusion will occur for MER in real-world scenarios. This paper deeply investigates an interesting but unexplored challenging issue in MER, ie, occlusion MER. First, to research MER under real-world occlusion, synthetic occluded micro-expression databases are created by using various mask for the community. Second, to suppress the influence of occlusion, a underline{R}egion-inspired underline{R}elation underline{R}easoning underline{N}etwork (textbf{RRRN}) is proposed to model relations between various facial regions. RRRN consists of a backbone network, the Region-Inspired (textbf{RI}) module and Relation Reasoning (textbf{RR}) module. More specifically, the backbone network aims at extracting feature representations from different facial regions, RI module computing an adaptive weight from the region itself based on attention mechanism with respect to the unobstructedness and importance for suppressing the influence of occlusion, and RR module exploiting the progressive interactions among these regions by performing graph convolutions. Experiments are conducted on handout-database evaluation and composite database evaluation tasks of MEGC 2018 protocol. Experimental results show that RRRN can significantly explore the importance of facial regions and capture the cooperative complementary relationship of facial regions for MER. The results also demonstrate RRRN outperforms the state-of-the-art approaches, especially on occlusion, and RRRN acts more robust to occlusion.
Despite the recent advances in optical character recognition (OCR), mathematical expressions still face a great challenge to recognize due to their two-dimensional graphical layout. In this paper, we propose a convolutional sequence modeling network, ConvMath, which converts the mathematical expression description in an image into a LaTeX sequence in an end-to-end way. The network combines an image encoder for feature extraction and a convolutional decoder for sequence generation. Compared with other Long Short Term Memory(LSTM) based encoder-decoder models, ConvMath is entirely based on convolution, thus it is easy to perform parallel computation. Besides, the network adopts multi-layer attention mechanism in the decoder, which allows the model to align output symbols with source feature vectors automatically, and alleviates the problem of lacking coverage while training the model. The performance of ConvMath is evaluated on an open dataset named IM2LATEX-100K, including 103556 samples. The experimental results demonstrate that the proposed network achieves state-of-the-art accuracy and much better efficiency than previous methods.
This paper presents a novel hierarchical spatiotemporal orientation representation for spacetime image analysis. It is designed to combine the benefits of the multilayer architecture of ConvNets and a more controlled approach to spacetime analysis. A distinguishing aspect of the approach is that unlike most contemporary convolutional networks no learning is involved; rather, all design decisions are specified analytically with theoretical motivations. This approach makes it possible to understand what information is being extracted at each stage and layer of processing as well as to minimize heuristic choices in design. Another key aspect of the network is its recurrent nature, whereby the output of each layer of processing feeds back to the input. To keep the network size manageable across layers, a novel cross-channel feature pooling is proposed. The multilayer architecture that results systematically reveals hierarchical image structure in terms of multiscale, multiorientation properties of visual spacetime. To illustrate its utility, the network has been applied to the task of dynamic texture recognition. Empirical evaluation on multiple standard datasets shows that it sets a new state-of-the-art.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا