ODN: Opening the Deep Network for Open-set Action Recognition

67 0 0.0 ( 0 )

Download Cite

Added by Yu Shu

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Yu Shu - Yemin Shi - Yaowei Wang

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In recent years, the performance of action recognition has been significantly improved with the help of deep neural networks. Most of the existing action recognition works hold the textit{closed-set} assumption that all action categories are known beforehand while deep networks can be well trained for these categories. However, action recognition in the real world is essentially an textit{open-set} problem, namely, it is impossible to know all action categories beforehand and consequently infeasible to prepare sufficient training samples for those emerging categories. In this case, applying closed-set recognition methods will definitely lead to unseen-category errors. To address this challenge, we propose the Open Deep Network (ODN) for the open-set action recognition task. Technologically, ODN detects new categories by applying a multi-class triplet thresholding method, and then dynamically reconstructs the classification layer and opens the deep network by adding predictors for new categories continually. In order to transfer the learned knowledge to the new category, two novel methods, Emphasis Initialization and Allometry Training, are adopted to initialize and incrementally train the new predictor so that only few samples are needed to fine-tune the model. Extensive experiments show that ODN can effectively detect and recognize new categories with little human intervention, thus applicable to the open-set action recognition tasks in the real world. Moreover, ODN can even achieve comparable performance to some closed-set methods.

rate research

Deep manifold-to-manifold transforming network for action recognition

114 - Tong Zhang 2017

Symmetric positive definite (SPD) matrices (e.g., covariances, graph Laplacians, etc.) are widely used to model the relationship of spatial or temporal domain. Nevertheless, SPD matrices are theoretically embedded on Riemannian manifolds. In this paper, we propose an end-to-end deep manifold-to-manifold transforming network (DMT-Net) which can make SPD matrices flow from one Riemannian manifold to another more discriminative one. To learn discriminative SPD features characterizing both spatial and temporal dependencies, we specifically develop three novel layers on manifolds: (i) the local SPD convolutional layer, (ii) the non-linear SPD activation layer, and (iii) the Riemannian-preserved recursive layer. The SPD property is preserved through all layers without any requirement of singular value decomposition (SVD), which is often used in the existing methods with expensive computation cost. Furthermore, a diagonalizing SPD layer is designed to efficiently calculate the final metric for the classification task. To evaluate our proposed method, we conduct extensive experiments on the task of action recognition, where input signals are popularly modeled as SPD matrices. The experimental results demonstrate that our DMT-Net is much more competitive over state-of-the-art.

Computer Vision and Pattern Recognition

Deep Structure Inference Network for Facial Action Unit Recognition

89 - Ciprian A. Corneanu , Meysam Madadi , Sergio Escalera 2018

Facial expressions are combinations of basic components called Action Units (AU). Recognizing AUs is key for developing general facial expression analysis. In recent years, most efforts in automatic AU recognition have been dedicated to learning combinations of local features and to exploiting correlations between Action Units. In this paper, we propose a deep neural architecture that tackles both problems by combining learned local and global features in its initial stages and replicating a message passing algorithm between classes similar to a graphical model inference approach in later stages. We show that by training the model end-to-end with increased supervision we improve state-of-the-art by 5.3% and 8.2% performance on BP4D and DISFA datasets, respectively.

Computer Vision and Pattern Recognition

Adversarial Reciprocal Points Learning for Open Set Recognition

97 - Guangyao Chen , Peixi Peng , Xiangqian Wang 2021

Open set recognition (OSR), aiming to simultaneously classify the seen classes and identify the unseen classes as unknown, is essential for reliable machine learning.The key challenge of OSR is how to reduce the empirical classification risk on the labeled known data and the open space risk on the potential unknown data simultaneously. To handle the challenge, we formulate the open space risk problem from the perspective of multi-class integration, and model the unexploited extra-class space with a novel concept Reciprocal Point. Follow this, a novel learning framework, termed Adversarial Reciprocal Point Learning (ARPL), is proposed to minimize the overlap of known distribution and unknown distributions without loss of known classification accuracy. Specifically, each reciprocal point is learned by the extra-class space with the corresponding known category, and the confrontation among multiple known categories are employed to reduce the empirical classification risk. Then, an adversarial margin constraint is proposed to reduce the open space risk by limiting the latent open space constructed by reciprocal points. To further estimate the unknown distribution from open space, an instantiated adversarial enhancement method is designed to generate diverse and confusing training samples, based on the adversarial mechanism between the reciprocal points and known classes. This can effectively enhance the model distinguishability to the unknown classes. Extensive experimental results on various benchmark datasets indicate that the proposed method is significantly superior to other existing approaches and achieves state-of-the-art performance.

Computer Vision and Pattern Recognition

Adversarial Motorial Prototype Framework for Open Set Recognition

119 - Ziheng Xia , Penghui Wang , Ganggang Dong 2021

Open set recognition is designed to identify known classes and to reject unknown classes simultaneously. Specifically, identifying known classes and rejecting unknown classes correspond to reducing the empirical risk and the open space risk, respectively. First, the motorial prototype framework (MPF) is proposed, which classifies known classes according to the prototype classification idea. Moreover, a motorial margin constraint term is added into the loss function of the MPF, which can further improve the clustering compactness of known classes in the feature space to reduce both risks. Second, this paper proposes the adversarial motorial prototype framework (AMPF) based on the MPF. On the one hand, this model can generate adversarial samples and add these samples into the training phase; on the other hand, it can further improve the differential mapping ability of the model to known and unknown classes with the adversarial motion of the margin constraint radius. Finally, this paper proposes an upgraded version of the AMPF, AMPF++, which adds much more generated unknown samples into the training phase. In this paper, a large number of experiments prove that the performance of the proposed models is superior to that of other current works.

Computer Vision and Pattern Recognition Artificial Intelligence

Joint Network based Attention for Action Recognition

86 - Yemin Shi , Yonghong Tian , Yaowei Wang 2016

By extracting spatial and temporal characteristics in one network, the two-stream ConvNets can achieve the state-of-the-art performance in action recognition. However, such a framework typically suffers from the separately processing of spatial and temporal information between the two standalone streams and is hard to capture long-term temporal dependence of an action. More importantly, it is incapable of finding the salient portions of an action, say, the frames that are the most discriminative to identify the action. To address these problems, a textbf{j}oint textbf{n}etwork based textbf{a}ttention (JNA) is proposed in this study. We find that the fully-connected fusion, branch selection and spatial attention mechanism are totally infeasible for action recognition. Thus in our joint network, the spatial and temporal branches share some information during the training stage. We also introduce an attention mechanism on the temporal domain to capture the long-term dependence meanwhile finding the salient portions. Extensive experiments are conducted on two benchmark datasets, UCF101 and HMDB51. Experimental results show that our method can improve the action recognition performance significantly and achieves the state-of-the-art results on both datasets.

Computer Vision and Pattern Recognition