ﻻ يوجد ملخص باللغة العربية
Jointly exploiting multiple different yet complementary domain information has been proven to be an effective way to perform robust object tracking. This paper focuses on effectively representing and utilizing complementary features from the frame domain and event domain for boosting object tracking performance in challenge scenarios. Specifically, we propose Common Features Extractor (CFE) to learn potential common representations from the RGB domain and event domain. For learning the unique features of the two domains, we utilize a Unique Extractor for Event (UEE) based on Spiking Neural Networks to extract edge cues in the event domain which may be missed in RGB in some challenging conditions, and a Unique Extractor for RGB (UER) based on Deep Convolutional Neural Networks to extract texture and semantic information in RGB domain. Extensive experiments on standard RGB benchmark and real event tracking dataset demonstrate the effectiveness of the proposed approach. We show our approach outperforms all compared state-of-the-art tracking algorithms and verify event-based data is a powerful cue for tracking in challenging scenes.
We propose a new Group Feature Selection method for Discriminative Correlation Filters (GFS-DCF) based visual object tracking. The key innovation of the proposed method is to perform group feature selection across both channel and spatial dimensions,
Multi-sensor perception is crucial to ensure the reliability and accuracy in autonomous driving system, while multi-object tracking (MOT) improves that by tracing sequential movement of dynamic objects. Most current approaches for multi-sensor multi-
Online updating of the object model via samples from historical frames is of great importance for accurate visual object tracking. Recent works mainly focus on constructing effective and efficient updating methods while neglecting the training sample
The deep learning-based visual tracking algorithms such as MDNet achieve high performance leveraging to the feature extraction ability of a deep neural network. However, the tracking efficiency of these trackers is not very high due to the slow featu
Physical processes, camera movement, and unpredictable environmental conditions like the presence of dust can induce noise and artifacts in video feeds. We observe that popular unsupervised MOT methods are dependent on noise-free inputs. We show that