No Arabic abstract
With the advancement of IoT and artificial intelligence technologies, and the need for rapid application growth in fields such as security entrance control and financial business trade, facial information processing has become an important means for achieving identity authentication and information security. In this paper, we propose a multi-feature fusion algorithm based on integral histograms and a real-time update tracking particle filtering module. First, edge and colour features are extracted, weighting methods are used to weight the colour histogram and edge features to describe facial features, and fusion of colour and edge features is made adaptive by using fusion coefficients to improve face tracking reliability. Then, the integral histogram is integrated into the particle filtering algorithm to simplify the calculation steps of complex particles. Finally, the tracking window size is adjusted in real time according to the change in the average distance from the particle centre to the edge of the current model and the initial model to reduce the drift problem and achieve stable tracking with significant changes in the target dimension. The results show that the algorithm improves video tracking accuracy, simplifies particle operation complexity, improves the speed, and has good anti-interference ability and robustness.
Nowadays, the increasingly growing number of mobile and computing devices has led to a demand for safer user authentication systems. Face anti-spoofing is a measure towards this direction for bio-metric user authentication, and in particular face recognition, that tries to prevent spoof attacks. The state-of-the-art anti-spoofing techniques leverage the ability of deep neural networks to learn discriminative features, based on cues from the training set images or video samples, in an effort to detect spoof attacks. However, due to the particular nature of the problem, i.e. large variability due to factors like different backgrounds, lighting conditions, camera resolutions, spoof materials, etc., these techniques typically fail to generalize to new samples. In this paper, we explicitly tackle this problem and propose a class-conditional domain discriminator module, that, coupled with a gradient reversal layer, tries to generate live and spoof features that are discriminative, but at the same time robust against the aforementioned variability factors. Extensive experimental analysis shows the effectiveness of the proposed method over existing image- and video-based anti-spoofing techniques, both in terms of numerical improvement as well as when visualizing the learned features.
Applications such as face recognition that deal with high-dimensional data need a mapping technique that introduces representation of low-dimensional features with enhanced discriminatory power and a proper classifier, able to classify those complex features. Most of traditional Linear Discriminant Analysis suffer from the disadvantage that their optimality criteria are not directly related to the classification ability of the obtained feature representation. Moreover, their classification accuracy is affected by the small sample size problem which is often encountered in FR tasks. In this short paper, we combine nonlinear kernel based mapping of data called KDDA with Support Vector machine classifier to deal with both of the shortcomings in an efficient and cost effective manner. The proposed here method is compared, in terms of classification accuracy, to other commonly used FR methods on UMIST face database. Results indicate that the performance of the proposed method is overall superior to those of traditional FR approaches, such as the Eigenfaces, Fisherfaces, and D-LDA methods and traditional linear classifiers.
A significant amount of redundancy exists between consecutive frames of a video. Object detectors typically produce detections for one image at a time, without any capabilities for taking advantage of this redundancy. Meanwhile, many applications for object detection work with videos, including intelligent transportation systems, advanced driver assistance systems and video surveillance. Our work aims at taking advantage of the similarity between video frames to produce better detections. We propose FFAVOD, standing for feature fusion architecture for video object detection. We first introduce a novel video object detection architecture that allows a network to share feature maps between nearby frames. Second, we propose a feature fusion module that learns to merge feature maps to enhance them. We show that using the proposed architecture and the fusion module can improve the performance of three base object detectors on two object detection benchmarks containing sequences of moving road users. Additionally, to further increase performance, we propose an improvement to the SpotNet attention module. Using our architecture on the improved SpotNet detector, we obtain the state-of-the-art performance on the UA-DETRAC public benchmark as well as on the UAVDT dataset. Code is available at https://github.com/hu64/FFAVOD.
This is a short technical report introducing the solution of the Team TCParser for Short-video Face Parsing Track of The 3rd Person in Context (PIC) Workshop and Challenge at CVPR 2021. In this paper, we introduce a strong backbone which is cross-window based Shuffle Transformer for presenting accurate face parsing representation. To further obtain the finer segmentation results, especially on the edges, we introduce a Feature Alignment Aggregation (FAA) module. It can effectively relieve the feature misalignment issue caused by multi-resolution feature aggregation. Benefiting from the stronger backbone and better feature aggregation, the proposed method achieves 86.9519% score in the Short-video Face Parsing track of the 3rd Person in Context (PIC) Workshop and Challenge, ranked the first place.
Analyzing the story behind TV series and movies often requires understanding who the characters are and what they are doing. With improving deep face models, this may seem like a solved problem. However, as face detectors get better, clustering/identification needs to be revisited to address increasing diversity in facial appearance. In this paper, we address video face clustering using unsupervised methods. Our emphasis is on distilling the essential information, identity, from the representations obtained using deep pre-trained face networks. We propose a self-supervised Siamese network that can be trained without the need for video/track based supervision, and thus can also be applied to image collections. We evaluate our proposed method on three video face clustering datasets. The experiments show that our methods outperform current state-of-the-art methods on all datasets. Video face clustering is lacking a common benchmark as current works are often evaluated with different metrics and/or different sets of face tracks.