Learning to decompose for object detection and instance segmentation

155 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Eunbyung Park

تاريخ النشر 2015

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Eunbyung Park - Alexander C. Berg

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Although deep convolutional neural networks(CNNs) have achieved remarkable results on object detection and segmentation, pre- and post-processing steps such as region proposals and non-maximum suppression(NMS), have been required. These steps result in high computational complexity and sensitivity to hyperparameters, e.g. thresholds for NMS. In this work, we propose a novel end-to-end trainable deep neural network architecture, which consists of convolutional and recurrent layers, that generates the correct number of object instances and their bounding boxes (or segmentation masks) given an image, using only a single network evaluation without any pre- or post-processing steps. We have tested on detecting digits in multi-digit images synthesized using MNIST, automatically segmenting digits in these images, and detecting cars in the KITTI benchmark dataset. The proposed approach outperforms a strong CNN baseline on the synthesized digits datasets and shows promising results on KITTI car detection.

قيم البحث

266 - Tianning Yuan 2021

Despite the substantial progress of active learning for image recognition, there still lacks an instance-level active learning method specified for object detection. In this paper, we propose Multiple Instance Active Object Detection (MI-AOD), to sel ect the most informative images for detector training by observing instance-level uncertainty. MI-AOD defines an instance uncertainty learning module, which leverages the discrepancy of two adversarial instance classifiers trained on the labeled set to predict instance uncertainty of the unlabeled set. MI-AOD treats unlabeled images as instance bags and feature anchors in images as instances, and estimates the image uncertainty by re-weighting instances in a multiple instance learning (MIL) fashion. Iterative instance uncertainty learning and re-weighting facilitate suppressing noisy instances, toward bridging the gap between instance uncertainty and image-level uncertainty. Experiments validate that MI-AOD sets a solid baseline for instance-level active learning. On commonly used object detection datasets, MI-AOD outperforms state-of-the-art methods with significant margins, particularly when the labeled sets are small. Code is available at https://github.com/yuantn/MI-AOD.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Towards Instance Segmentation with Object Priority: Prominent Object Detection and Recognition

211 - Hamed R. Tavakoli , Jorma Laaksonen 2017

This manuscript introduces the problem of prominent object detection and recognition inspired by the fact that human seems to priorities perception of scene elements. The problem deals with finding the most important region of interest, segmenting th e relevant item/object in that area, and assigning it an object class label. In other words, we are solving the three problems of saliency modeling, saliency detection, and object recognition under one umbrella. The motivation behind such a problem formulation is (1) the benefits to the knowledge representation-based vision pipelines, and (2) the potential improvements in emulating bio-inspired vision systems by solving these three problems together. We are foreseeing extending this problem formulation to fully semantically segmented scenes with instance object priority for high-level inferences in various applications including assistive vision. Along with a new problem definition, we also propose a method to achieve such a task. The proposed model predicts the most important area in the image, segments the associated objects, and labels them. The proposed problem and method are evaluated against human fixations, annotated segmentation masks, and object class categories. We define a chance level for each of the evaluation criterion to compare the proposed algorithm with. Despite the good performance of the proposed baseline, the overall evaluations indicate that the problem of prominent object detection and recognition is a challenging task that is still worth investigating further.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation

89 - Tai-Yu Pan , Cheng Zhang , Yandong Li 2021

Vanilla models for object detection and instance segmentation suffer from the heavy bias toward detecting frequent objects in the long-tailed setting. Existing methods address this issue mostly during training, e.g., by re-sampling or re-weighting. I n this paper, we investigate a largely overlooked approach -- post-processing calibration of confidence scores. We propose NorCal, Normalized Calibration for long-tailed object detection and instance segmentation, a simple and straightforward recipe that reweighs the predicted scores of each class by its training sample size. We show that separately handling the background class and normalizing the scores over classes for each proposal are keys to achieving superior performance. On the LVIS dataset, NorCal can effectively improve nearly all the baseline models not only on rare classes but also on common and frequent classes. Finally, we conduct extensive analysis and ablation studies to offer insights into various modeling choices and mechanisms of our approach.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Point-Set Anchors for Object Detection, Instance Segmentation and Pose Estimation

222 - Fangyun Wei , Xiao Sun , Hongyang Li 2020

A recent approach for object detection and human pose estimation is to regress bounding boxes or human keypoints from a central point on the object or person. While this center-point regression is simple and efficient, we argue that the image feature s extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries, due to object deformation and scale/orientation variation. To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions. This point set is arranged to reflect a good initialization for the given task, such as modes in the training data for pose estimation, which lie closer to the ground truth than the central point and provide more informative features for regression. As the utility of a point set depends on how well its scale, aspect ratio and rotation matches the target, we adopt the anchor box technique of sampling these transformations to generate additional point-set candidates. We apply this proposed framework, called Point-Set Anchors, to object detection, instance segmentation, and human pose estimation. Our results show that this general-purpose approach can achieve performance competitive with state-of-the-art methods for each of these tasks. Code is available at url{https://github.com/FangyunWei/PointSetAnchor}

الرؤية الحاسوبية وتمييز الأنماط

Multi-lane Detection Using Instance Segmentation and Attentive Voting

486 - Donghoon Chang 2020

Autonomous driving is becoming one of the leading industrial research areas. Therefore many automobile companies are coming up with semi to fully autonomous driving solutions. Among these solutions, lane detection is one of the vital driver-assist fe atures that play a crucial role in the decision-making process of the autonomous vehicle. A variety of solutions have been proposed to detect lanes on the road, which ranges from using hand-crafted features to the state-of-the-art end-to-end trainable deep learning architectures. Most of these architectures are trained in a traffic constrained environment. In this paper, we propose a novel solution to multi-lane detection, which outperforms state of the art methods in terms of both accuracy and speed. To achieve this, we also offer a dataset with a more intuitive labeling scheme as compared to other benchmark datasets. Using our approach, we are able to obtain a lane segmentation accuracy of 99.87% running at 54.53 fps (average).

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو