Towards Resolving the Challenge of Long-tail Distribution in UAV Images for Object Detection

75 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Chen Chen

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Weiping Yu - Taojiannan Yang - Chen Chen

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Existing methods for object detection in UAV images ignored an important challenge - imbalanced class distribution in UAV images - which leads to poor performance on tail classes. We systematically investigate existing solutions to long-tail problems and unveil that re-balancing methods that are effective on natural image datasets cannot be trivially applied to UAV datasets. To this end, we rethink long-tailed object detection in UAV images and propose the Dual Sampler and Head detection Network (DSHNet), which is the first work that aims to resolve long-tail distribution in UAV images. The key components in DSHNet include Class-Biased Samplers (CBS) and Bilateral Box Heads (BBH), which are developed to cope with tail classes and head classes in a dual-path manner. Without bells and whistles, DSHNet significantly boosts the performance of tail classes on different detection frameworks. Moreover, DSHNet significantly outperforms base detectors and generic approaches for long-tail problems on VisDrone and UAVDT datasets. It achieves new state-of-the-art performance when combining with image cropping methods. Code is available at https://github.com/we1pingyu/DSHNet

قيم البحث

189 - Xiaowei Xu , Xinyi Zhang , Bei Yu 2018

The 55th Design Automation Conference (DAC) held its first System Design Contest (SDC) in 2018. SDC18 features a lower power object detection challenge (LPODC) on designing and implementing novel algorithms based object detection in images taken from unmanned aerial vehicles (UAV). The dataset includes 95 categories and 150k images, and the hardware platforms include Nvidias TX2 and Xilinxs PYNQ Z1. DAC-SDC18 attracted more than 110 entries from 12 countries. This paper presents in detail the dataset and evaluation procedure. It further discusses the methods developed by some of the entries as well as representative results. The paper concludes with directions for future improvements.

الرؤية الحاسوبية وتمييز الأنماط

Distribution Alignment: A Unified Framework for Long-tail Visual Recognition

98 - Songyang Zhang , Zeming Li , Shipeng Yan 2021

Despite the recent success of deep neural networks, it remains challenging to effectively model the long-tail class distribution in visual recognition tasks. To address this problem, we first investigate the performance bottleneck of the two-stage le arning framework via ablative study. Motivated by our discovery, we propose a unified distribution alignment strategy for long-tail visual recognition. Specifically, we develop an adaptive calibration function that enables us to adjust the classification scores for each data point. We then introduce a generalized re-weight method in the two-stage learning to balance the class prior, which provides a flexible and unified solution to diverse scenarios in visual recognition tasks. We validate our method by extensive experiments on four tasks, including image classification, semantic segmentation, object detection, and instance segmentation. Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework. The code and models will be made publicly available at: https://github.com/Megvii-BaseDetection/DisAlign

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Dense Multiscale Feature Fusion Pyramid Networks for Object Detection in UAV-Captured Images

151 - Yingjie Liu 2020

Although much significant progress has been made in the research field of object detection with deep learning, there still exists a challenging task for the objects with small size, which is notably pronounced in UAV-captured images. Addressing these issues, it is a critical need to explore the feature extraction methods that can extract more sufficient feature information of small objects. In this paper, we propose a novel method called Dense Multiscale Feature Fusion Pyramid Networks(DMFFPN), which is aimed at obtaining rich features as much as possible, improving the information propagation and reuse. Specifically, the dense connection is designed to fully utilize the representation from the different convolutional layers. Furthermore, cascade architecture is applied in the second stage to enhance the localization capability. Experiments on the drone-based datasets named VisDrone-DET suggest a competitive performance of our method.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

A Competitive Method to VIPriors Object Detection Challenge

328 - Fei Shen , Xin He , Mengwan Wei 2021

In this report, we introduce the technical details of our submission to the VIPriors object detection challenge. Our solution is based on mmdetction of a strong baseline open-source detection toolbox. Firstly, we introduce an effective data augmentat ion method to address the lack of data problem, which contains bbox-jitter, grid-mask, and mix-up. Secondly, we present a robust region of interest (ROI) extraction method to learn more significant ROI features via embedding global context features. Thirdly, we propose a multi-model integration strategy to refinement the prediction box, which weighted boxes fusion (WBF). Experimental results demonstrate that our approach can significantly improve the average precision (AP) of object detection on the subset of the COCO2017 dataset.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

The Devil is in Classification: A Simple Framework for Long-tail Object Detection and Instance Segmentation

76 - Tao Wang , Yu Li , Bingyi Kang 2020

Most existing object instance detection and segmentation models only work well on fairly balanced benchmarks where per-category training sample numbers are comparable, such as COCO. They tend to suffer performance drop on realistic datasets that are usually long-tailed. This work aims to study and address such open challenges. Specifically, we systematically investigate performance drop of the state-of-the-art two-stage instance segmentation model Mask R-CNN on the recent long-tail LVIS dataset, and unveil that a major cause is the inaccurate classification of object proposals. Based on such an observation, we first consider various techniques for improving long-tail classification performance which indeed enhance instance segmentation results. We then propose a simple calibration framework to more effectively alleviate classification head bias with a bi-level class balanced sampling approach. Without bells and whistles, it significantly boosts the performance of instance segmentation for tail classes on the recent LVIS dataset and our sampled COCO-LT dataset. Our analysis provides useful insights for solving long-tail instance detection and segmentation problems, and the straightforward emph{SimCal} method can serve as a simple but strong baseline. With the method we have won the 2019 LVIS challenge. Codes and models are available at https://github.com/twangnh/SimCal.

الرؤية الحاسوبية وتمييز الأنماط