ﻻ يوجد ملخص باللغة العربية
Offline Siamese networks have achieved very promising tracking performance, especially in accuracy and efficiency. However, they often fail to track an object in complex scenes due to the incapacity in online update. Traditional updaters are difficult to process the irregular variations and sampling noises of objects, so it is quite risky to adopt them to update Siamese networks. In this paper, we first present a two-stage one-shot learner, which can predict the local parameters of primary classifier with object samples from diverse stages. Then, an updatable Siamese network is proposed based on the learner (SiamTOL), which is able to complement online update by itself. Concretely, we introduce an extra inputting branch to sequentially capture the latest object features, and design a residual module to update the initial exemplar using these features. Besides, an effective multi-aspect training loss is designed for our network to avoid overfit. Extensive experimental results on several popular benchmarks including OTB100, VOT2018, VOT2019, LaSOT, UAV123 and GOT10k manifest that the proposed tracker achieves the leading performance and outperforms other state-of-the-art methods
We propose a novel Siamese Natural Language Tracker (SNLT), which brings the advancements in visual tracking to the tracking by natural language (NL) descriptions task. The proposed SNLT is applicable to a wide range of Siamese trackers, providing a
Siamese deep-network trackers have received significant attention in recent years due to their real-time speed and state-of-the-art performance. However, Siamese trackers suffer from similar looking confusers, that are prevalent in aerial imagery and
In this paper, we tackle one-shot texture retrieval: given an example of a new reference texture, detect and segment all the pixels of the same texture category within an arbitrary image. To address this problem, we present an OS-TR network to encode
Convolutional neural networks (CNNs) are a promising technique for automated glaucoma diagnosis from images of the fundus, and these images are routinely acquired as part of an ophthalmic exam. Nevertheless, CNNs typically require a large amount of w
Scale variance among different sizes of body parts and objects is a challenging problem for visual recognition tasks. Existing works usually design dedicated backbone or apply Neural architecture Search(NAS) for each task to tackle this challenge. Ho