ﻻ يوجد ملخص باللغة العربية
Gait recognition plays a vital role in human identification since gait is a unique biometric feature that can be perceived at a distance. Although existing gait recognition methods can learn gait features from gait sequences in different ways, the performance of gait recognition suffers from insufficient labeled data, especially in some practical scenarios associated with short gait sequences or various clothing styles. It is unpractical to label the numerous gait data. In this work, we propose a self-supervised gait recognition method, termed SelfGait, which takes advantage of the massive, diverse, unlabeled gait data as a pre-training process to improve the representation abilities of spatiotemporal backbones. Specifically, we employ the horizontal pyramid mapping (HPM) and micro-motion template builder (MTB) as our spatiotemporal backbones to capture the multi-scale spatiotemporal representations. Experiments on CASIA-B and OU-MVLP benchmark gait datasets demonstrate the effectiveness of the proposed SelfGait compared with four state-of-the-art gait recognition methods. The source code has been released at https://github.com/EchoItLiu/SelfGait.
Gait, the walking pattern of individuals, is one of the most important biometrics modalities. Most of the existing gait recognition methods take silhouettes or articulated body models as the gait features. These methods suffer from degraded recogniti
Recent advances in deep learning have achieved promising performance for medical image analysis, while in most cases ground-truth annotations from human experts are necessary to train the deep model. In practice, such annotations are expensive to col
Self-supervised representation learning is able to learn semantically meaningful features; however, much of its recent success relies on multiple crops of an image with very few objects. Instead of learning view-invariant representation from simple i
In the past few years, we have witnessed remarkable breakthroughs in self-supervised representation learning. Despite the success and adoption of representations learned through this paradigm, much is yet to be understood about how different training
We present a self-supervised Contrastive Video Representation Learning (CVRL) method to learn spatiotemporal visual representations from unlabeled videos. Our representations are learned using a contrastive loss, where two augmented clips from the sa