Learning Unit State Recognition Based on Multi-channel Data Fusion

65 0 0.0 ( 0 )

Download Cite

Added by Feng Tian Ph.D Eng.

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Feng Tian - Jia Yue - Xing Wan

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Despite recent advances in MOOC, the current e-learning systems have advantages of alleviating barriers by time differences, and geographically spatial separation between teachers and students. However, there has been a lack of supervision problem that e-learners learning unit state(LUS) cant be supervised automatically. In this paper, we present a fusion framework considering three channel data sources: 1) videos/images from a camera, 2) eye movement information tracked by a low solution eye tracker and 3) mouse movement. Based on these data modalities, we propose a novel approach of multi-channel data fusion to explore the learning unit state recognition. We also propose a method to build a learning state recognition model to avoid manually labeling image data. The experiments were carried on our designed online learning prototype system, and we choose CART, Random Forest and GBDT regression model to predict e-learners learning state. The results show that multi-channel data fusion model have a better recognition performance in comparison with single channel model. In addition, a best recognition performance can be reached when image, eye movement and mouse movement features are fused.

rate research

HiCOMEX: Facial Action Unit Recognition Based on Hierarchy Intensity Distribution and COMEX Relation Learning

94 - Ziqiang Shi , Liu Liu , Zhongling Liu 2020

The detection of facial action units (AUs) has been studied as it has the competition due to the wide-ranging applications thereof. In this paper, we propose a novel framework for the AU detection from a single input image by grasping the textbf{c}o-textbf{o}ccurrence and textbf{m}utual textbf{ex}clusion (COMEX) as well as the intensity distribution among AUs. Our algorithm uses facial landmarks to detect the features of local AUs. The features are input to a bidirectional long short-term memory (BiLSTM) layer for learning the intensity distribution. Afterwards, the new AU feature continuously passed through a self-attention encoding layer and a continuous-state modern Hopfield layer for learning the COMEX relationships. Our experiments on the challenging BP4D and DISFA benchmarks without any external data or pre-trained models yield F1-scores of 63.7% and 61.8% respectively, which shows our proposed networks can lead to performance improvement in the AU detection task.

Computer Vision and Pattern Recognition

A Multi-modal and Multi-task Learning Method for Action Unit and Expression Recognition

83 - Yue Jin , Tianqing Zheng , Chao Gao 2021

Analyzing human affect is vital for human-computer interaction systems. Most methods are developed in restricted scenarios which are not practical for in-the-wild settings. The Affective Behavior Analysis in-the-wild (ABAW) 2021 Contest provides a benchmark for this in-the-wild problem. In this paper, we introduce a multi-modal and multi-task learning method by using both visual and audio information. We use both AU and expression annotations to train the model and apply a sequence model to further extract associations between video frames. We achieve an AU score of 0.712 and an expression score of 0.477 on the validation set. These results demonstrate the effectiveness of our approach in improving model performance.

Computer Vision and Pattern Recognition

Robust Deep Multi-modal Learning Based on Gated Information Fusion Network

60 - Jaekyum Kim , Junho Koh , Yecheol Kim 2018

The goal of multi-modal learning is to use complimentary information on the relevant task provided by the multiple modalities to achieve reliable and robust performance. Recently, deep learning has led significant improvement in multi-modal learning by allowing for the information fusion in the intermediate feature levels. This paper addresses a problem of designing robust deep multi-modal learning architecture in the presence of imperfect modalities. We introduce deep fusion architecture for object detection which processes each modality using the separate convolutional neural network (CNN) and constructs the joint feature map by combining the intermediate features from the CNNs. In order to facilitate the robustness to the degraded modalities, we employ the gated information fusion (GIF) network which weights the contribution from each modality according to the input feature maps to be fused. The weights are determined through the convolutional layers followed by a sigmoid function and trained along with the information fusion network in an end-to-end fashion. Our experiments show that the proposed GIF network offers the additional architectural flexibility to achieve robust performance in handling some degraded modalities, and show a significant performance improvement based on Single Shot Detector (SSD) for KITTI dataset using the proposed fusion network and data augmentation schemes.

Computer Vision and Pattern Recognition

Multi-Modality Fusion based on Consensus-Voting and 3D Convolution for Isolated Gesture Recognition

60 - Jiali Duan , Shuai Zhou , Jun Wan 2016

Recently, the popularity of depth-sensors such as Kinect has made depth videos easily available while its advantages have not been fully exploited. This paper investigates, for gesture recognition, to explore the spatial and temporal information complementarily embedded in RGB and depth sequences. We propose a convolutional twostream consensus voting network (2SCVN) which explicitly models both the short-term and long-term structure of the RGB sequences. To alleviate distractions from background, a 3d depth-saliency ConvNet stream (3DDSN) is aggregated in parallel to identify subtle motion characteristics. These two components in an unified framework significantly improve the recognition accuracy. On the challenging Chalearn IsoGD benchmark, our proposed method outperforms the first place on the leader-board by a large margin (10.29%) while also achieving the best result on RGBD-HuDaAct dataset (96.74%). Both quantitative experiments and qualitative analysis shows the effectiveness of our proposed framework and codes will be released to facilitate future research.

Computer Vision and Pattern Recognition

Multi-Level Adaptive Region of Interest and Graph Learning for Facial Action Unit Recognition

97 - Jingwei Yan , Boyuan Jiang , Jingjing Wang 2021

In facial action unit (AU) recognition tasks, regional feature learning and AU relation modeling are two effective aspects which are worth exploring. However, the limited representation capacity of regional features makes it difficult for relation models to embed AU relationship knowledge. In this paper, we propose a novel multi-level adaptive ROI and graph learning (MARGL) framework to tackle this problem. Specifically, an adaptive ROI learning module is designed to automatically adjust the location and size of the predefined AU regions. Meanwhile, besides relationship between AUs, there exists strong relevance between regional features across multiple levels of the backbone network as level-wise features focus on different aspects of representation. In order to incorporate the intra-level AU relation and inter-level AU regional relevance simultaneously, a multi-level AU relation graph is constructed and graph convolution is performed to further enhance AU regional features of each level. Experiments on BP4D and DISFA demonstrate the proposed MARGL significantly outperforms the previous state-of-the-art methods.

Computer Vision and Pattern Recognition