No Arabic abstract
Tracking-by-detection has become an attractive tracking technique, which treats tracking as a category detection problem. However, the task in tracking is to search for a specific object, rather than an object category as in detection. In this paper, we propose a novel tracking framework based on exemplar detector rather than category detector. The proposed tracker is an ensemble of exemplar-based linear discriminant analysis (ELDA) detectors. Each detector is quite specific and discriminative, because it is trained by a single object instance and massive negatives. To improve its adaptivity, we update both object and background models. Experimental results on several challenging video sequences demonstrate the effectiveness and robustness of our tracking algorithm.
Discriminative correlation filters show excellent performance in object tracking. However, in complex scenes, the apparent characteristics of the tracked target are variable, which makes it easy to pollute the model and cause the model drift. In this paper, considering that the secondary peak has a greater impact on the model update, we propose a method for detecting the primary and secondary peaks of the response map. Secondly, a novel confidence function which uses the adaptive update discriminant mechanism is proposed, which yield good robustness. Thirdly, we propose a robust tracker with correlation filters, which uses hand-crafted features and can improve model drift in complex scenes. Finally, in order to cope with the current trackers multi-feature response merge, we propose a simple exponential adaptive merge approach. Extensive experiments are performed on OTB2013, OTB100 and TC128 datasets. Our approach performs superiorly against several state-of-the-art trackers while runs at speed in real time.
Fast appearance variations and the distractions of similar objects are two of the most challenging problems in visual object tracking. Unlike many existing trackers that focus on modeling only the target, in this work, we consider the emph{transient variations of the whole scene}. The key insight is that the object correspondence and spatial layout of the whole scene are consistent (i.e., global structure consistency) in consecutive frames which helps to disambiguate the target from distractors. Moreover, modeling transient variations enables to localize the target under fast variations. Specifically, we propose an effective and efficient short-term model that learns to exploit the global structure consistency in a short time and thus can handle fast variations and distractors. Since short-term modeling falls short of handling occlusion and out of the views, we adopt the long-short term paradigm and use a long-term model that corrects the short-term model when it drifts away from the target or the target is not present. These two components are carefully combined to achieve the balance of stability and plasticity during tracking. We empirically verify that the proposed tracker can tackle the two challenging scenarios and validate it on large scale benchmarks. Remarkably, our tracker improves state-of-the-art-performance on VOT2018 from 0.440 to 0.460, GOT-10k from 0.611 to 0.640, and NFS from 0.619 to 0.629.
Multi-sensor perception is crucial to ensure the reliability and accuracy in autonomous driving system, while multi-object tracking (MOT) improves that by tracing sequential movement of dynamic objects. Most current approaches for multi-sensor multi-object tracking are either lack of reliability by tightly relying on a single input source (e.g., center camera), or not accurate enough by fusing the results from multiple sensors in post processing without fully exploiting the inherent information. In this study, we design a generic sensor-agnostic multi-modality MOT framework (mmMOT), where each modality (i.e., sensors) is capable of performing its role independently to preserve reliability, and further improving its accuracy through a novel multi-modality fusion module. Our mmMOT can be trained in an end-to-end manner, enables joint optimization for the base feature extractor of each modality and an adjacency estimator for cross modality. Our mmMOT also makes the first attempt to encode deep representation of point cloud in data association process in MOT. We conduct extensive experiments to evaluate the effectiveness of the proposed framework on the challenging KITTI benchmark and report state-of-the-art performance. Code and models are available at https://github.com/ZwwWayne/mmMOT.
Visual tracking plays an important role in perception system, which is a crucial part of intelligent transportation. Recently, Siamese network is a hot topic for visual tracking to estimate moving targets trajectory, due to its superior accuracy and simple framework. In general, Siamese tracking algorithms, supervised by logistic loss and triplet loss, increase the value of inner product between exemplar template and positive sample while reduce the value of inner product with background sample. However, the distractors from different exemplars are not considered by mentioned loss functions, which limit the feature models discrimination. In this paper, a new exemplar loss integrated with logistic loss is proposed to enhance the feature models discrimination by reducing inner products among exemplars. Without the bells and whistles, the proposed algorithm outperforms the methods supervised by logistic loss or triplet loss. Numerical results suggest that the newly developed algorithm achieves comparable performance in public benchmarks.
Online updating of the object model via samples from historical frames is of great importance for accurate visual object tracking. Recent works mainly focus on constructing effective and efficient updating methods while neglecting the training samples for learning discriminative object models, which is also a key part of a learning problem. In this paper, we propose the DeepMix that takes historical samples embeddings as input and generates augmented embeddings online, enhancing the state-of-the-art online learning methods for visual object tracking. More specifically, we first propose the online data augmentation for tracking that online augments the historical samples through object-aware filtering. Then, we propose MixNet which is an offline trained network for performing online data augmentation within one-step, enhancing the tracking accuracy while preserving high speeds of the state-of-the-art online learning methods. The extensive experiments on three different tracking frameworks, i.e., DiMP, DSiam, and SiamRPN++, and three large-scale and challenging datasets, ie, OTB-2015, LaSOT, and VOT, demonstrate the effectiveness and advantages of the proposed method.