ﻻ يوجد ملخص باللغة العربية
Most existing Siamese-based tracking methods execute the classification and regression of the target object based on the similarity maps. However, they either employ a single map from the last convolutional layer which degrades the localization accuracy in complex scenarios or separately use multiple maps for decision making, introducing intractable computations for aerial mobile platforms. Thus, in this work, we propose an efficient and effective hierarchical feature transformer (HiFT) for aerial tracking. Hierarchical similarity maps generated by multi-level convolutional layers are fed into the feature transformer to achieve the interactive fusion of spatial (shallow layers) and semantics cues (deep layers). Consequently, not only the global contextual information can be raised, facilitating the target search, but also our end-to-end architecture with the transformer can efficiently learn the interdependencies among multi-level features, thereby discovering a tracking-tailored feature space with strong discriminability. Comprehensive evaluations on four aerial benchmarks have proven the effectiveness of HiFT. Real-world tests on the aerial platform have strongly validated its practicability with a real-time speed. Our code is available at https://github.com/vision4robotics/HiFT.
A saliency guided hierarchical visual tracking (SHT) algorithm containing global and local search phases is proposed in this paper. In global search, a top-down saliency model is novelly developed to handle abrupt motion and appearance variation prob
Correlation acts as a critical role in the tracking field, especially in recent popular Siamese-based trackers. The correlation operation is a simple fusion manner to consider the similarity between the template and the search region. However, the co
Recent studies indicate that hierarchical Vision Transformer with a macro architecture of interleaved non-overlapped window-based self-attention & shifted-window operation is able to achieve state-of-the-art performance in various visual recognition
Visual object tracking, which is representing a major interest in image processing field, has facilitated numerous real world applications. Among them, equipping unmanned aerial vehicle (UAV) with real time robust visual trackers for all day aerial m
Computer vision researchers have been expecting that neural networks have spatial transformation ability to eliminate the interference caused by geometric distortion for a long time. Emergence of spatial transformer network makes dream come true. Spa