Towards A Category-extended Object Detector without Relabeling or Conflicts

96 0 0.0 ( 0 )

Download Cite

Added by Bowen Zhao

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Bowen Zhao - Chen Chen - Wanpeng Xiao

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Object detectors are typically learned based on fully-annotated training data with fixed pre-defined categories. However, not all possible categories of interest can be known beforehand, classes are often required to be increased progressively in many realistic applications. In such scenario, only the original training set annotated with the old classes and some new training data labeled with the new classes are available. Based on the limited datasets without extra manual labor, a unified detector that can handle all categories is strongly needed. Plain joint training leads to heavy biases and poor performance due to the incomplete annotations. To avoid such situation, we propose a practical framework in this paper. A conflict-free loss is designed to avoid label ambiguity, leading to an acceptable detector in one training round. To further improve performance, we propose a retraining phase in which Monte Carlo Dropout is employed to calculate the localization confidence, combined with the classification confidence, to mine more accurate bounding boxes, and an overlap-weighted method is employed for making better use of pseudo annotations during retraining to achieve more powerful detectors. Extensive experiments conducted on multiple datasets demonstrate the effectiveness of our framework for category-extended object detectors.

rate research

BundleTrack: 6D Pose Tracking for Novel Objects without Instance or Category-Level 3D Models

275 - Bowen Wen , Kostas Bekris 2021

Tracking the 6D pose of objects in video sequences is important for robot manipulation. Most prior efforts, however, often assume that the target objects CAD model, at least at a category-level, is available for offline training or during online template matching. This work proposes BundleTrack, a general framework for 6D pose tracking of novel objects, which does not depend upon 3D models, either at the instance or category-level. It leverages the complementary attributes of recent advances in deep learning for segmentation and robust feature extraction, as well as memory-augmented pose graph optimization for spatiotemporal consistency. This enables long-term, low-drift tracking under various challenging scenarios, including significant occlusions and object motions. Comprehensive experiments given two public benchmarks demonstrate that the proposed approach significantly outperforms state-of-art, category-level 6D tracking or dynamic SLAM methods. When compared against state-of-art methods that rely on an object instance CAD model, comparable performance is achieved, despite the proposed methods reduced information requirements. An efficient implementation in CUDA provides a real-time performance of 10Hz for the entire framework. Code is available at: https://github.com/wenbowen123/BundleTrack

Computer Vision and Pattern Recognition Artificial Intelligence Graphics

Category-Specific Object Reconstruction from a Single Image

163 - Abhishek Kar , Shubham Tulsiani , Jo~ao Carreira 2014

Object reconstruction from a single image -- in the wild -- is a problem where we can make progress and get meaningful results today. This is the main message of this paper, which introduces an automated pipeline with pixels as inputs and 3D surfaces of various rigid categories as outputs in images of realistic scenes. At the core of our approach are deformable 3D models that can be learned from 2D annotations available in existing object detection datasets, that can be driven by noisy automatic object segmentations and which we complement with a bottom-up module for recovering high-frequency shape details. We perform a comprehensive quantitative analysis and ablation study of our approach using the recently introduced PASCAL 3D+ dataset and show very encouraging automatic reconstructions on PASCAL VOC.

Computer Vision and Pattern Recognition

PP-YOLOv2: A Practical Object Detector

427 - Xin Huang , Xinxin Wang , Wenyu Lv 2021

Being effective and efficient is essential to an object detector for practical use. To meet these two concerns, we comprehensively evaluate a collection of existing refinements to improve the performance of PP-YOLO while almost keep the infer time unchanged. This paper will analyze a collection of refinements and empirically evaluate their impact on the final model performance through incremental ablation study. Things we tried that didnt work will also be discussed. By combining multiple effective refinements, we boost PP-YOLOs performance from 45.9% mAP to 49.5% mAP on COCO2017 test-dev. Since a significant margin of performance has been made, we present PP-YOLOv2. In terms of speed, PP-YOLOv2 runs in 68.9FPS at 640x640 input size. Paddle inference engine with TensorRT, FP16-precision, and batch size = 1 further improves PP-YOLOv2s infer speed, which achieves 106.5 FPS. Such a performance surpasses existing object detectors with roughly the same amount of parameters (i.e., YOLOv4-CSP, YOLOv5l). Besides, PP-YOLOv2 with ResNet101 achieves 50.3% mAP on COCO2017 test-dev. Source code is at https://github.com/PaddlePaddle/PaddleDetection.

Computer Vision and Pattern Recognition

CaT: Weakly Supervised Object Detection with Category Transfer

142 - Tianyue Cao , Lianyu Du , Xiaoyun Zhang 2021

A large gap exists between fully-supervised object detection and weakly-supervised object detection. To narrow this gap, some methods consider knowledge transfer from additional fully-supervised dataset. But these methods do not fully exploit discriminative category information in the fully-supervised dataset, thus causing low mAP. To solve this issue, we propose a novel category transfer framework for weakly supervised object detection. The intuition is to fully leverage both visually-discriminative and semantically-correlated category information in the fully-supervised dataset to enhance the object-classification ability of a weakly-supervised detector. To handle overlapping category transfer, we propose a double-supervision mean teacher to gather common category information and bridge the domain gap between two datasets. To handle non-overlapping category transfer, we propose a semantic graph convolutional network to promote the aggregation of semantic features between correlated categories. Experiments are conducted with Pascal VOC 2007 as the target weakly-supervised dataset and COCO as the source fully-supervised dataset. Our category transfer framework achieves 63.5% mAP and 80.3% CorLoc with 5 overlapping categories between two datasets, which outperforms the state-of-the-art methods. Codes are avaliable at https://github.com/MediaBrain-SJTU/CaT.

Computer Vision and Pattern Recognition

Disassembling Object Representations without Labels

101 - Zunlei Feng , Xinchao Wang , Yongming He 2020

In this paper, we study a new representation-learning task, which we termed as disassembling object representations. Given an image featuring multiple objects, the goal of disassembling is to acquire a latent representation, of which each part corresponds to one category of objects. Disassembling thus finds its application in a wide domain such as image editing and few- or zero-shot learning, as it enables category-specific modularity in the learned representations. To this end, we propose an unsupervised approach to achieving disassembling, named Unsupervised Disassembling Object Representation (UDOR). UDOR follows a double auto-encoder architecture, in which a fuzzy classification and an object-removing operation are imposed. The fuzzy classification constrains each part of the latent representation to encode features of up to one object category, while the object-removing, combined with a generative adversarial network, enforces the modularity of the representations and integrity of the reconstructed image. Furthermore, we devise two metrics to respectively measure the modularity of disassembled representations and the visual integrity of reconstructed images. Experimental results demonstrate that the proposed UDOR, despited unsupervised, achieves truly encouraging results on par with those of supervised methods.

Computer Vision and Pattern Recognition