LOMo: Latent Ordinal Model for Facial Analysis in Videos

65 0 0.0 ( 0 )

Download Cite

Added by Karan Sikka

Publication date 2016

fields Informatics Engineering

and research's language is English

Authors Karan Sikka - Gaurav Sharma - Marian Bartlett

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We study the problem of facial analysis in videos. We propose a novel weakly supervised learning method that models the video event (expression, pain etc.) as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for smile, brow lower and cheek raise for pain). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF- it extends such frameworks to model the ordinal or temporal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations. In combination with complimentary features, we report state-of-the-art results on these datasets.

rate research

Discriminatively Trained Latent Ordinal Model for Video Classification

97 - Karan Sikka , Gaurav Sharma 2016

We study the problem of video classification for facial analysis and human action recognition. We propose a novel weakly supervised learning method that models the video as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for smile, running and jumping for highjump). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF -- it extends such frameworks to model the ordinal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations and on three challenging human action datasets. We also validate the method with qualitative results and show that they largely support the intuitions behind the method.

Computer Vision and Pattern Recognition

Deep DA for Ordinal Regression of Pain Intensity Estimation Using Weakly-Labeled Videos

120 - Gnana Praveen R , Eric Granger , Patrick Cardinal 2020

Automatic estimation of pain intensity from facial expressions in videos has an immense potential in health care applications. However, domain adaptation (DA) is needed to alleviate the problem of domain shifts that typically occurs between video data captured in source and target do-mains. Given the laborious task of collecting and annotating videos, and the subjective bias due to ambiguity among adjacent intensity levels, weakly-supervised learning (WSL)is gaining attention in such applications. Yet, most state-of-the-art WSL models are typically formulated as regression problems, and do not leverage the ordinal relation between intensity levels, nor the temporal coherence of multiple consecutive frames. This paper introduces a new deep learn-ing model for weakly-supervised DA with ordinal regression(WSDA-OR), where videos in target domain have coarse la-bels provided on a periodic basis. The WSDA-OR model enforces ordinal relationships among the intensity levels as-signed to the target sequences, and associates multiple relevant frames to sequence-level labels (instead of a single frame). In particular, it learns discriminant and domain-invariant feature representations by integrating multiple in-stance learning with deep adversarial DA, where soft Gaussian labels are used to efficiently represent the weak ordinal sequence-level labels from the target domain. The proposed approach was validated on the RECOLA video dataset as fully-labeled source domain, and UNBC-McMaster video data as weakly-labeled target domain. We have also validated WSDA-OR on BIOVID and Fatigue (private) datasets for sequence level estimation. Experimental results indicate that our approach can provide a significant improvement over the state-of-the-art models, allowing to achieve a greater localization accuracy.

Computer Vision and Pattern Recognition

Deception Detection in Videos using the Facial Action Coding System

62 - Hammad Ud Din Ahmed , Usama Ijaz Bajwa , Fan Zhang 2021

Facts are important in decision making in every situation, which is why it is important to catch deceptive information before they are accepted as facts. Deception detection in videos has gained traction in recent times for its various real-life application. In our approach, we extract facial action units using the facial action coding system which we use as parameters for training a deep learning model. We specifically use long short-term memory (LSTM) which we trained using the real-life trial dataset and it provided one of the best facial only approaches to deception detection. We also tested cross-dataset validation using the Real-life trial dataset, the Silesian Deception Dataset, and the Bag-of-lies Deception Dataset which has not yet been attempted by anyone else for a deception detection system. We tested and compared all datasets amongst each other individually and collectively using the same deep learning training model. The results show that adding different datasets for training worsen the accuracy of the model. One of the primary reasons is that the nature of these datasets vastly differs from one another.

Computer Vision and Pattern Recognition

Deep Domain Adaptation for Ordinal Regression of Pain Intensity Estimation Using Weakly-Labelled Videos

87 - Gnana Praveen R , Eric Granger , Patrick Cardinal 2020

Estimation of pain intensity from facial expressions captured in videos has an immense potential for health care applications. Given the challenges related to subjective variations of facial expressions, and operational capture conditions, the accuracy of state-of-the-art DL models for recognizing facial expressions may decline. Domain adaptation has been widely explored to alleviate the problem of domain shifts that typically occur between video data captured across various source and target domains. Moreover, given the laborious task of collecting and annotating videos, and subjective bias due to ambiguity among adjacent intensity levels, weakly-supervised learning is gaining attention in such applications. State-of-the-art WSL models are typically formulated as regression problems, and do not leverage the ordinal relationship among pain intensity levels, nor temporal coherence of multiple consecutive frames. This paper introduces a new DL model for weakly-supervised DA with ordinal regression that can be adapted using target domain videos with coarse labels provided on a periodic basis. The WSDA-OR model enforces ordinal relationships among intensity levels assigned to target sequences, and associates multiple relevant frames to sequence-level labels. In particular, it learns discriminant and domain-invariant feature representations by integrating multiple instance learning with deep adversarial DA, where soft Gaussian labels are used to efficiently represent the weak ordinal sequence-level labels from target domain. The proposed approach was validated using RECOLA video dataset as fully-labeled source domain data, and UNBC-McMaster shoulder pain video dataset as weakly-labeled target domain data. We have also validated WSDA-OR on BIOVID and Fatigue datasets for sequence level estimation.

Computer Vision and Pattern Recognition

Facial Aging and Rejuvenation by Conditional Multi-Adversarial Autoencoder with Ordinal Regression

65 - Haiping Zhu , Qi Zhou , Junping Zhang 2018

Facial aging and facial rejuvenation analyze a given face photograph to predict a future look or estimate a past look of the person. To achieve this, it is critical to preserve human identity and the corresponding aging progression and regression with high accuracy. However, existing methods cannot simultaneously handle these two objectives well. We propose a novel generative adversarial network based approach, named the Conditional Multi-Adversarial AutoEncoder with Ordinal Regression (CMAAE-OR). It utilizes an age estimation technique to control the aging accuracy and takes a high-level feature representation to preserve personalized identity. Specifically, the face is first mapped to a latent vector through a convolutional encoder. The latent vector is then projected onto the face manifold conditional on the age through a deconvolutional generator. The latent vector preserves personalized face features and the age controls facial aging and rejuvenation. A discriminator and an ordinal regression are imposed on the encoder and the generator in tandem, making the generated face images to be more photorealistic while simultaneously exhibiting desirable aging effects. Besides, a high-level feature representation is utilized to preserve personalized identity of the generated face. Experiments on two benchmark datasets demonstrate appealing performance of the proposed method over the state-of-the-art.

Computer Vision and Pattern Recognition