ﻻ يوجد ملخص باللغة العربية
Despite the growing discriminative capabilities of modern deep learning methods for recognition tasks, the inner workings of the state-of-art models still remain mostly black-boxes. In this paper, we propose a systematic interpretation of model parameters and hidden representations of Residual Temporal Convolutional Networks (Res-TCN) for action recognition in time-series data. We also propose a Feature Map Decoder as part of the interpretation analysis, which outputs a representation of models hidden variables in the same domain as the input. Such analysis empowers us to expose models characteristic learning patterns in an interpretable way. For example, through the diagnosis analysis, we discovered that our model has learned to achieve view-point invariance by implicitly learning to perform rotational normalization of the input to a more discriminative view. Based on the findings from the model interpretation analysis, we propose a targeted refinement technique, which can generalize to various other recognition models. The proposed work introduces a three-stage paradigm for model learning: training, interpretable diagnosis and targeted refinement. We validate our approach on skeleton based 3D human action recognition benchmark of NTU RGB+D. We show that the proposed workflow is an effective model learning strategy and the resulting Multi-stream Residual Temporal Convolutional Network (MS-Res-TCN) achieves the state-of-the-art performance on NTU RGB+D.
How to model fine-grained spatial-temporal dynamics in videos has been a challenging problem for action recognition. It requires learning deep and rich features with superior distinctiveness for the subtle and abstract motions. Most existing methods
Human pose is a useful feature for fine-grained sports action understanding. However, pose estimators are often unreliable when run on sports video due to domain shift and factors such as motion blur and occlusions. This leads to poor accuracy when d
Learning from the web can ease the extreme dependence of deep learning on large-scale manually labeled datasets. Especially for fine-grained recognition, which targets at distinguishing subordinate categories, it will significantly reduce the labelin
Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications, whereas the data of rare fine-grained categories is very limited. Therefore, we propose the few
In the following paper, we present and discuss challenging applications for fine-grained visual classification (FGVC): biodiversity and species analysis. We not only give details about two challenging new datasets suitable for computer vision researc