Mixup Regularization for Region Proposal based Object Detectors

121 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shahine Bouabid

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Shahine Bouabid - Vincent Delaitre

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Mixup - a neural network regularization technique based on linear interpolation of labeled sample pairs - has stood out by its capacity to improve models robustness and generalizability through a surprisingly simple formalism. However, its extension to the field of object detection remains unclear as the interpolation of bounding boxes cannot be naively defined. In this paper, we propose to leverage the inherent region mapping structure of anchors to introduce a mixup-driven training regularization for region proposal based object detectors. The proposed method is benchmarked on standard datasets with challenging detection settings. Our experiments show an enhanced robustness to image alterations along with an ability to decontextualize detections, resulting in an improved generalization power.

قيم البحث

82 - Garrick Brazil , Xiaoming Liu 2019

Understanding the world in 3D is a critical component of urban autonomous driving. Generally, the combination of expensive LiDAR sensors and stereo RGB imaging has been paramount for successful 3D object detection algorithms, whereas monocular image- only methods experience drastically reduced performance. We propose to reduce the gap by reformulating the monocular 3D detection problem as a standalone 3D region proposal network. We leverage the geometric relationship of 2D and 3D perspectives, allowing 3D boxes to utilize well-known and powerful convolutional features generated in the image-space. To help address the strenuous 3D parameter estimations, we further design depth-aware convolutional layers which enable location specific feature development and in consequence improved 3D scene understanding. Compared to prior work in monocular 3D detection, our method consists of only the proposed 3D region proposal network rather than relying on external networks, data, or multiple stages. M3D-RPN is able to significantly improve the performance of both monocular 3D Object Detection and Birds Eye View tasks within the KITTI urban autonomous driving dataset, while efficiently using a shared multi-class model.

الرؤية الحاسوبية وتمييز الأنماط

Collaborative Training between Region Proposal Localization and Classification for Domain Adaptive Object Detection

71 - Ganlong Zhao , Guanbin Li , Ruijia Xu 2020

Object detectors are usually trained with large amount of labeled data, which is expensive and labor-intensive. Pre-trained detectors applied to unlabeled dataset always suffer from the difference of dataset distribution, also called domain shift. Do main adaptation for object detection tries to adapt the detector from labeled datasets to unlabeled ones for better performance. In this paper, we are the first to reveal that the region proposal network (RPN) and region proposal classifier~(RPC) in the endemic two-stage detectors (e.g., Faster RCNN) demonstrate significantly different transferability when facing large domain gap. The region classifier shows preferable performance but is limited without RPNs high-quality proposals while simple alignment in the backbone network is not effective enough for RPN adaptation. We delve into the consistency and the difference of RPN and RPC, treat them individually and leverage high-confidence output of one as mutual guidance to train the other. Moreover, the samples with low-confidence are used for discrepancy calculation between RPN and RPC and minimax optimization. Extensive experimental results on various scenarios have demonstrated the effectiveness of our proposed method in both domain-adaptive region proposal generation and object detection. Code is available at https://github.com/GanlongZhao/CST_DA_detection.

الرؤية الحاسوبية وتمييز الأنماط

Ensembling object detectors for image and video data analysis

175 - Kateryna Chumachenko , Jenni Raitoharju , Alexandros Iosifidis 2021

In this paper, we propose a method for ensembling the outputs of multiple object detectors for improving detection performance and precision of bounding boxes on image data. We further extend it to video data by proposing a two-stage tracking-based s cheme for detection refinement. The proposed method can be used as a standalone approach for improving object detection performance, or as a part of a framework for faster bounding box annotation in unseen datasets, assuming that the objects of interest are those present in some common public datasets.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Learning Deep Object Detectors from 3D Models

598 - Xingchao Peng , Baochen Sun , Karim Ali 2014

Crowdsourced 3D CAD models are becoming easily accessible online, and can potentially generate an infinite number of training images for almost any object category.We show that augmenting the training data of contemporary Deep Convolutional Neural Ne t (DCNN) models with such synthetic data can be effective, especially when real training data is limited or not well matched to the target domain. Most freely available CAD models capture 3D shape but are often missing other low level cues, such as realistic object texture, pose, or background. In a detailed analysis, we use synthetic CAD-rendered images to probe the ability of DCNN to learn without these cues, with surprising findings. In particular, we show that when the DCNN is fine-tuned on the target detection task, it exhibits a large degree of invariance to missing low-level cues, but, when pretrained on generic ImageNet classification, it learns better when the low-level cues are simulated. We show that our synthetic DCNN training approach significantly outperforms previous methods on the PASCAL VOC2007 dataset when learning in the few-shot scenario and improves performance in a domain shift scenario on the Office benchmark.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي الحوسبة العصبية والتطورية

Remix: Rebalanced Mixup

75 - Hsin-Ping Chou , Shih-Chieh Chang , Jia-Yu Pan 2020

Deep image classifiers often perform poorly when training data are heavily class-imbalanced. In this work, we propose a new regularization technique, Remix, that relaxes Mixups formulation and enables the mixing factors of features and labels to be d isentangled. Specifically, when mixing two samples, while features are mixed in the same fashion as Mixup, Remix assigns the label in favor of the minority class by providing a disproportionately higher weight to the minority class. By doing so, the classifier learns to push the decision boundaries towards the majority classes and balance the generalization error between majority and minority classes. We have studied the state-of-the art regularization techniques such as Mixup, Manifold Mixup and CutMix under class-imbalanced regime, and shown that the proposed Remix significantly outperforms these state-of-the-arts and several re-weighting and re-sampling techniques, on the imbalanced datasets constructed by CIFAR-10, CIFAR-100, and CINIC-10. We have also evaluated Remix on a real-world large-scale imbalanced dataset, iNaturalist 2018. The experimental results confirmed that Remix provides consistent and significant improvements over the previous methods.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي