بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-based Person Re-identification

139 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zhizheng Zhang

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zhizheng Zhang - Cuiling Lan - Wenjun Zeng

الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Video-based person re-identification (reID) aims at matching the same person across video clips. It is a challenging task due to the existence of redundancy among frames, newly revealed appearance, occlusion, and motion blurs. In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-aided Attentive Feature Aggregation (MG-RAFA), to delicately aggregate spatio-temporal features into a discriminative video-level feature representation. In order to determine the contribution/importance of a spatial-temporal feature node, we propose to learn the attention from a global view with convolutional operations. Specifically, we stack its relations, i.e., pairwise correlations with respect to a representative set of reference feature nodes (S-RFNs) that represents global video information, together with the feature itself to infer the attention. Moreover, to exploit the semantics of different levels, we propose to learn multi-granularity attentions based on the relations captured at different granularities. Extensive ablation studies demonstrate the effectiveness of our attentive feature aggregation module MG-RAFA. Our framework achieves the state-of-the-art performance on three benchmark datasets.

قيم البحث

67 - Guoqing Zhang , Yuhao Chen , Yang Dai 2021

Recently, video-based person re-identification (re-ID) has drawn increasing attention in compute vision community because of its practical application prospects. Due to the inaccurate person detections and pose changes, pedestrian misalignment signif icantly increases the difficulty of feature extraction and matching. To address this problem, in this paper, we propose a textbf{R}eference-textbf{A}ided textbf{P}art-textbf{A}ligned (textbf{RAPA}) framework to disentangle robust features of different parts. Firstly, in order to obtain better references between different videos, a pose-based reference feature learning module is introduced. Secondly, an effective relation-based part feature disentangling module is explored to align frames within each video. By means of using both modules, the informative parts of pedestrian in videos are well aligned and more discriminative feature representation is generated. Comprehensive experiments on three widely-used benchmarks, i.e. iLIDS-VID, PRID-2011 and MARS datasets verify the effectiveness of the proposed framework. Our code will be made publicly available.

الرؤية الحاسوبية وتمييز الأنماط

Intra-clip Aggregation for Video Person Re-identification

159 - Takashi Isobe , Jian Han , Fang Zhu 2019

Video-based person re-identification has drawn massive attention in recent years due to its extensive applications in video surveillance. While deep learning-based methods have led to significant progress, these methods are limited by ineffectively u sing complementary information, which is blamed on necessary data augmentation in the training process. Data augmentation has been widely used to mitigate the over-fitting trap and improve the ability of network representation. However, the previous methods adopt image-based data augmentation scheme to individually process the input frames, which corrupts the complementary information between consecutive frames and causes performance degradation. Extensive experiments on three benchmark datasets demonstrate that our framework outperforms the most recent state-of-the-art methods. We also perform cross-dataset validation to prove the generality of our method.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Person Re-Identification via Recurrent Feature Aggregation

90 - Yichao Yan , Bingbing Ni , Zhichao Song 2017

We address the person re-identification problem by effectively exploiting a globally discriminative feature representation from a sequence of tracked human regions/patches. This is in contrast to previous person re-id works, which rely on either sing le frame based person to person patch matching, or graph based sequence to sequence matching. We show that a progressive/sequential fusion framework based on long short term memory (LSTM) network aggregates the frame-wise human region representation at each time stamp and yields a sequence level human feature representation. Since LSTM nodes can remember and propagate previously accumulated good features and forget newly input inferior ones, even with simple hand-crafted features, the proposed recurrent feature aggregation network (RFA-Net) is effective in generating highly discriminative sequence level human representations. Extensive experimental results on two person re-identification benchmarks demonstrate that the proposed method performs favorably against state-of-the-art person re-identification methods.

الرؤية الحاسوبية وتمييز الأنماط

Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-Identification

121 - Mang Ye , Jianbing Shen , David J. Crandall 2020

Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem. Due to the large intra-class variations and cross-modality discrepancy with large amount of sample noise, it is difficult to learn discr iminative part features. Existing VI-ReID methods instead tend to learn global representations, which have limited discriminability and weak robustness to noisy images. In this paper, we propose a novel dynamic dual-attentive aggregation (DDAG) learning method by mining both intra-modality part-level and cross-modality graph-level contextual cues for VI-ReID. We propose an intra-modality weighted-part attention module to extract discriminative part-aggregated features, by imposing the domain knowledge on the part relationship mining. To enhance robustness against noisy samples, we introduce cross-modality graph structured attention to reinforce the representation with the contextual relations across the two modalities. We also develop a parameter-free dynamic dual aggregation learning strategy to adaptively integrate the two components in a progressive joint training manner. Extensive experiments demonstrate that DDAG outperforms the state-of-the-art methods under various settings.

الرؤية الحاسوبية وتمييز الأنماط

Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification

165 - Yichao Yan , Jie Qin1 , Jiaxin Chen 2021

Video-based person re-identification (re-ID) is an important research topic in computer vision. The key to tackling the challenging task is to exploit both spatial and temporal clues in video sequences. In this work, we propose a novel graph-based fr amework, namely Multi-Granular Hypergraph (MGH), to pursue better representational capabilities by modeling spatiotemporal dependencies in terms of multiple granularities. Specifically, hypergraphs with different spatial granularities are constructed using various levels of part-based features across the video sequence. In each hypergraph, different temporal granularities are captured by hyperedges that connect a set of graph nodes (i.e., part-based features) across different temporal ranges. Two critical issues (misalignment and occlusion) are explicitly addressed by the proposed hypergraph propagation and feature aggregation schemes. Finally, we further enhance the overall video representation by learning more diversified graph-level representations of multiple granularities based on mutual information minimization. Extensive experiments on three widely adopted benchmarks clearly demonstrate the effectiveness of the proposed framework. Notably, 90.0% top-1 accuracy on MARS is achieved using MGH, outperforming the state-of-the-arts. Code is available at https://github.com/daodaofr/hypergraph_reid.

الرؤية الحاسوبية وتمييز الأنماط

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة الأميركية في بيروت

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-based Person Re-identification

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً