ﻻ يوجد ملخص باللغة العربية
Object detectors are typically learned based on fully-annotated training data with fixed pre-defined categories. However, not all possible categories of interest can be known beforehand, classes are often required to be increased progressively in many realistic applications. In such scenario, only the original training set annotated with the old classes and some new training data labeled with the new classes are available. Based on the limited datasets without extra manual labor, a unified detector that can handle all categories is strongly needed. Plain joint training leads to heavy biases and poor performance due to the incomplete annotations. To avoid such situation, we propose a practical framework in this paper. A conflict-free loss is designed to avoid label ambiguity, leading to an acceptable detector in one training round. To further improve performance, we propose a retraining phase in which Monte Carlo Dropout is employed to calculate the localization confidence, combined with the classification confidence, to mine more accurate bounding boxes, and an overlap-weighted method is employed for making better use of pseudo annotations during retraining to achieve more powerful detectors. Extensive experiments conducted on multiple datasets demonstrate the effectiveness of our framework for category-extended object detectors.
Tracking the 6D pose of objects in video sequences is important for robot manipulation. Most prior efforts, however, often assume that the target objects CAD model, at least at a category-level, is available for offline training or during online temp
Object reconstruction from a single image -- in the wild -- is a problem where we can make progress and get meaningful results today. This is the main message of this paper, which introduces an automated pipeline with pixels as inputs and 3D surfaces
Being effective and efficient is essential to an object detector for practical use. To meet these two concerns, we comprehensively evaluate a collection of existing refinements to improve the performance of PP-YOLO while almost keep the infer time un
A large gap exists between fully-supervised object detection and weakly-supervised object detection. To narrow this gap, some methods consider knowledge transfer from additional fully-supervised dataset. But these methods do not fully exploit discrim
In this paper, we study a new representation-learning task, which we termed as disassembling object representations. Given an image featuring multiple objects, the goal of disassembling is to acquire a latent representation, of which each part corres