ﻻ يوجد ملخص باللغة العربية
Visual object tracking was generally tackled by reasoning independently on fast processing algorithms, accurate online adaptation methods, and fusion of trackers. In this paper, we unify such goals by proposing a novel tracking methodology that takes advantage of other visual trackers, offline and online. A compact student model is trained via the marriage of knowledge distillation and reinforcement learning. The first allows to transfer and compress tracking knowledge of other trackers. The second enables the learning of evaluation measures which are then exploited online. After learning, the student can be ultimately used to build (i) a very fast single-shot tracker, (ii) a tracker with a simple and effective online adaptation mechanism, (iii) a tracker that performs fusion of other trackers. Extensive validation shows that the proposed algorithms compete with real-time state-of-the-art trackers.
We propose a novel Siamese Natural Language Tracker (SNLT), which brings the advancements in visual tracking to the tracking by natural language (NL) descriptions task. The proposed SNLT is applicable to a wide range of Siamese trackers, providing a
Deep regression trackers are among the fastest tracking algorithms available, and therefore suitable for real-time robotic applications. However, their accuracy is inadequate in many domains due to distribution shift and overfitting. In this paper we
Neural dialog state trackers are generally limited due to the lack of quantity and diversity of annotated training data. In this paper, we address this difficulty by proposing a reinforcement learning (RL) based framework for data augmentation that c
Hashing has been widely used in approximate nearest neighbor search for its storage and computational efficiency. Deep supervised hashing methods are not widely used because of the lack of labeled data, especially when the domain is transferred. Mean
Ultrasound (US) is the most widely used fetal imaging technique. However, US images have limited capture range, and suffer from view dependent artefacts such as acoustic shadows. Compounding of overlapping 3D US acquisitions into a high-resolution vo