Deep Flow Collaborative Network for Online Visual Tracking

202 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Peidong Liu

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Peidong Liu - Xiyu Yan - Yong Jiang

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The deep learning-based visual tracking algorithms such as MDNet achieve high performance leveraging to the feature extraction ability of a deep neural network. However, the tracking efficiency of these trackers is not very high due to the slow feature extraction for each frame in a video. In this paper, we propose an effective tracking algorithm to alleviate the time-consuming problem. Specifically, we design a deep flow collaborative network, which executes the expensive feature network only on sparse keyframes and transfers the feature maps to other frames via optical flow. Moreover, we raise an effective adaptive keyframe scheduling mechanism to select the most appropriate keyframe. We evaluate the proposed approach on large-scale datasets: OTB2013 and OTB2015. The experiment results show that our algorithm achieves considerable speedup and high precision as well.

قيم البحث

108 - Jiaxin Zhang , Wei Sui , Xinggang Wang 2021

In this work, we propose a novel deep online correction (DOC) framework for monocular visual odometry. The whole pipeline has two stages: First, depth maps and initial poses are obtained from convolutional neural networks (CNNs) trained in self-super vised manners. Second, the poses predicted by CNNs are further improved by minimizing photometric errors via gradient updates of poses during inference phases. The benefits of our proposed method are twofold: 1) Different from online-learning methods, DOC does not need to calculate gradient propagation for parameters of CNNs. Thus, it saves more computation resources during inference phases. 2) Unlike hybrid methods that combine CNNs with traditional methods, DOC fully relies on deep learning (DL) frameworks. Though without complex back-end optimization modules, our method achieves outstanding performance with relative transform error (RTE) = 2.0% on KITTI Odometry benchmark for Seq. 09, which outperforms traditional monocular VO frameworks and is comparable to hybrid methods.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي علم الروبوتات

Multi-domain Collaborative Feature Representation for Robust Visual Object Tracking

98 - Jiqing Zhang , Kai Zhao , Bo Dong 2021

Jointly exploiting multiple different yet complementary domain information has been proven to be an effective way to perform robust object tracking. This paper focuses on effectively representing and utilizing complementary features from the frame do main and event domain for boosting object tracking performance in challenge scenarios. Specifically, we propose Common Features Extractor (CFE) to learn potential common representations from the RGB domain and event domain. For learning the unique features of the two domains, we utilize a Unique Extractor for Event (UEE) based on Spiking Neural Networks to extract edge cues in the event domain which may be missed in RGB in some challenging conditions, and a Unique Extractor for RGB (UER) based on Deep Convolutional Neural Networks to extract texture and semantic information in RGB domain. Extensive experiments on standard RGB benchmark and real event tracking dataset demonstrate the effectiveness of the proposed approach. We show our approach outperforms all compared state-of-the-art tracking algorithms and verify event-based data is a powerful cue for tracking in challenging scenes.

الرؤية الحاسوبية وتمييز الأنماط

Deep Tracking: Visual Tracking Using Deep Convolutional Networks

180 - Meera Hahn , Si Chen , Afshin Dehghan 2015

In this paper, we study a discriminatively trained deep convolutional network for the task of visual tracking. Our tracker utilizes both motion and appearance features that are extracted from a pre-trained dual stream deep convolution network. We sho w that the features extracted from our dual-stream network can provide rich information about the target and this leads to competitive performance against state of the art tracking methods on a visual tracking benchmark.

الرؤية الحاسوبية وتمييز الأنماط

Siamese Box Adaptive Network for Visual Tracking

370 - Zedu Chen , Bineng Zhong , Guorong Li 2020

Most of the existing trackers usually rely on either a multi-scale searching scheme or pre-defined anchor boxes to accurately estimate the scale and aspect ratio of a target. Unfortunately, they typically call for tedious and heuristic configurations . To address this issue, we propose a simple yet effective visual tracking framework (named Siamese Box Adaptive Network, SiamBAN) by exploiting the expressive power of the fully convolutional network (FCN). SiamBAN views the visual tracking problem as a parallel classification and regression problem, and thus directly classifies objects and regresses their bounding boxes in a unified FCN. The no-prior box design avoids hyper-parameters associated with the candidate boxes, making SiamBAN more flexible and general. Extensive experiments on visual tracking benchmarks including VOT2018, VOT2019, OTB100, NFS, UAV123, and LaSOT demonstrate that SiamBAN achieves state-of-the-art performance and runs at 40 FPS, confirming its effectiveness and efficiency. The code will be available at https://github.com/hqucv/siamban.

الرؤية الحاسوبية وتمييز الأنماط

Pose Flow: Efficient Online Pose Tracking

177 - Yuliang Xiu , Jiefeng Li , Haoyu Wang 2018

Multi-person articulated pose tracking in unconstrained videos is an important while challenging problem. In this paper, going along the road of top-down approaches, we propose a decent and efficient pose tracker based on pose flows. First, we design an online optimization framework to build the association of cross-frame poses and form pose flows (PF-Builder). Second, a novel pose flow non-maximum suppression (PF-NMS) is designed to robustly reduce redundant pose flows and re-link temporal disjoint ones. Extensive experiments show that our method significantly outperforms best-reported results on two standard Pose Tracking datasets by 13 mAP 25 MOTA and 6 mAP 3 MOTA respectively. Moreover, in the case of working on detected poses in individual frames, the extra computation of pose tracker is very minor, guaranteeing online 10FPS tracking. Our source codes are made publicly available(https://github.com/YuliangXiu/PoseFlow).

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي