ترغب بنشر مسار تعليمي؟ اضغط هنا

Understanding human-object interactions is fundamental in First Person Vision (FPV). Tracking algorithms which follow the objects manipulated by the camera wearer can provide useful cues to effectively model such interactions. Visual tracking solutio ns available in the computer vision literature have significantly improved their performance in the last years for a large variety of target objects and tracking scenarios. However, despite a few previous attempts to exploit trackers in FPV applications, a methodical analysis of the performance of state-of-the-art trackers in this domain is still missing. In this paper, we fill the gap by presenting the first systematic study of object tracking in FPV. Our study extensively analyses the performance of recent visual trackers and baseline FPV trackers with respect to different aspects and considering a new performance measure. This is achieved through TREK-150, a novel benchmark dataset composed of 150 densely annotated video sequences. Our results show that object tracking in FPV is challenging, which suggests that more research efforts should be devoted to this problem so that tracking could benefit FPV tasks.
Deep regression trackers are among the fastest tracking algorithms available, and therefore suitable for real-time robotic applications. However, their accuracy is inadequate in many domains due to distribution shift and overfitting. In this paper we overcome such limitations by presenting the first methodology for domain adaption of such a class of trackers. To reduce the labeling effort we propose a weakly-supervised adaptation strategy, in which reinforcement learning is used to express weak supervision as a scalar application-dependent and temporally-delayed feedback. At the same time, knowledge distillation is employed to guarantee learning stability and to compress and transfer knowledge from more powerful but slower trackers. Extensive experiments on five different robotic vision domains demonstrate the relevance of our methodology. Real-time speed is achieved on embedded devices and on machines without GPUs, while accuracy reaches significant results.
Understanding human-object interactions is fundamental in First Person Vision (FPV). Tracking algorithms which follow the objects manipulated by the camera wearer can provide useful information to effectively model such interactions. Despite a few pr evious attempts to exploit trackers in FPV applications, a systematic analysis of the performance of state-of-the-art trackers in this domain is still missing. On the other hand, the visual tracking solutions available in the computer vision literature have significantly improved their performance in the last years for a large variety of target objects and tracking scenarios. To fill the gap, in this paper, we present TREK-100, the first benchmark dataset for visual object tracking in FPV. The dataset is composed of 100 video sequences densely annotated with 60K bounding boxes, 17 sequence attributes, 13 action verb attributes and 29 target object attributes. Along with the dataset, we present an extensive analysis of the performance of 30 among the best and most recent visual trackers. Our results show that object tracking in FPV is challenging, which suggests that more research efforts should be devoted to this problem.
Visual object tracking is the problem of predicting a target objects state in a video. Generally, bounding-boxes have been used to represent states, and a surge of effort has been spent by the community to produce efficient causal algorithms capable of locating targets with such representations. As the field is moving towards binary segmentation masks to define objects more precisely, in this paper we propose to extensively explore target-conditioned segmentation methods available in the computer vision community, in order to transform any bounding-box tracker into a segmentation tracker. Our analysis shows that such methods allow trackers to compete with recently proposed segmentation trackers, while performing quasi real-time.
Visual object tracking was generally tackled by reasoning independently on fast processing algorithms, accurate online adaptation methods, and fusion of trackers. In this paper, we unify such goals by proposing a novel tracking methodology that takes advantage of other visual trackers, offline and online. A compact student model is trained via the marriage of knowledge distillation and reinforcement learning. The first allows to transfer and compress tracking knowledge of other trackers. The second enables the learning of evaluation measures which are then exploited online. After learning, the student can be ultimately used to build (i) a very fast single-shot tracker, (ii) a tracker with a simple and effective online adaptation mechanism, (iii) a tracker that performs fusion of other trackers. Extensive validation shows that the proposed algorithms compete with real-time state-of-the-art trackers.
In the last decade many different algorithms have been proposed to track a generic object in videos. Their execution on recent large-scale video datasets can produce a great amount of various tracking behaviours. New trends in Reinforcement Learning showed that demonstrations of an expert agent can be efficiently used to speed-up the process of policy learning. Taking inspiration from such works and from the recent applications of Reinforcement Learning to visual tracking, we propose two novel trackers, A3CT, which exploits demonstrations of a state-of-the-art tracker to learn an effective tracking policy, and A3CTD, that takes advantage of the same expert tracker to correct its behaviour during tracking. Through an extensive experimental validation on the GOT-10k, OTB-100, LaSOT, UAV123 and VOT benchmarks, we show that the proposed trackers achieve state-of-the-art performance while running in real-time.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا