Workshop on Autonomous Driving at CVPR 2021: Technical Report for Streaming Perception Challenge

153 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Songyang Zhang

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Songyang Zhang - Lin Song - Songtao Liu

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this report, we introduce our real-time 2D object detection system for the realistic autonomous driving scenario. Our detector is built on a newly designed YOLO model, called YOLOX. On the Argoverse-HD dataset, our system achieves 41.0 streaming AP, which surpassed second place by 7.8/6.1 on detection-only track/fully track, respectively. Moreover, equipped with TensorRT, our model achieves the 30FPS inference speed with a high-resolution input size (e.g., 1440-2304). Code and models will be available at https://github.com/Megvii-BaseDetection/YOLOX

قيم البحث

59 - Saravanabalagi Ramachandran , Ganesh Sistu , John McDonald 2021

We present the WoodScape fisheye semantic segmentation challenge for autonomous driving which was held as part of the CVPR 2021 Workshop on Omnidirectional Computer Vision (OmniCV). This challenge is one of the first opportunities for the research co mmunity to evaluate the semantic segmentation techniques targeted for fisheye camera perception. Due to strong radial distortion standard models dont generalize well to fisheye images and hence the deformations in the visual appearance of objects and entities needs to be encoded implicitly or as explicit knowledge. This challenge served as a medium to investigate the challenges and new methodologies to handle the complexities with perception on fisheye images. The challenge was hosted on CodaLab and used the recently released WoodScape dataset comprising of 10k samples. In this paper, we provide a summary of the competition which attracted the participation of 71 global teams and a total of 395 submissions. The top teams recorded significantly improved mean IoU and accuracy scores over the baseline PSPNet with ResNet-50 backbone. We summarize the methods of winning algorithms and analyze the failure cases. We conclude by providing future directions for the research.

الرؤية الحاسوبية وتمييز الأنماط علم الروبوتات

Generic Event Boundary Detection Challenge at CVPR 2021 Technical Report: Cascaded Temporal Attention Network (CASTANET)

207 - Dexiang Hong , Congcong Li , Longyin Wen 2021

This report presents the approach used in the submission of Generic Event Boundary Detection (GEBD) Challenge at CVPR21. In this work, we design a Cascaded Temporal Attention Network (CASTANET) for GEBD, which is formed by three parts, the backbone n etwork, the temporal attention module, and the classification module. Specifically, the Channel-Separated Convolutional Network (CSN) is used as the backbone network to extract features, and the temporal attention module is designed to enforce the network to focus on the discriminative features. After that, the cascaded architecture is used in the classification module to generate more accurate boundaries. In addition, the ensemble strategy is used to further improve the performance of the proposed method. The proposed method achieves 83.30% F1 score on Kinetics-GEBD test set, which improves 20.5% F1 score compared to the baseline method. Code is available at https://github.com/DexiangHong/Cascade-PC.

الرؤية الحاسوبية وتمييز الأنماط

Method Towards CVPR 2021 Image Matching Challenge

91 - Xiaopeng Bi , Yu Chen , Xinyang Liu 2021

This report describes Megvii-3D teams approach towards CVPR 2021 Image Matching Workshop.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2021: Team M3EM Technical Report

77 - Lijin Yang , Yifei Huang , Yusuke Sugano 2021

In this report, we describe the technical details of our submission to the 2021 EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition. Leveraging multiple modalities has been proved to benefit the Unsupervised Domain Adapt ation (UDA) task. In this work, we present Multi-Modal Mutual Enhancement Module (M3EM), a deep module for jointly considering information from multiple modalities to find the most transferable representations across domains. We achieve this by implementing two sub-modules for enhancing each modality using the context of other modalities. The first sub-module exchanges information across modalities through the semantic space, while the second sub-module finds the most transferable spatial region based on the consensus of all modalities.

الرؤية الحاسوبية وتمييز الأنماط

Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization

144 - Ludan Ruan 2021

Entities Object Localization (EOL) aims to evaluate how grounded or faithful a description is, which consists of caption generation and object grounding. Previous works tackle this problem by jointly training the two modules in a framework, which lim its the complexity of each module. Therefore, in this work, we propose to divide these two modules into two stages and improve them respectively to boost the whole system performance. For the caption generation, we propose a Unified Multi-modal Pre-training Model (UMPM) to generate event descriptions with rich objects for better localization. For the object grounding, we fine-tune the state-of-the-art detection model MDETR and design a post processing method to make the grounding results more faithful. Our overall system achieves the state-of-the-art performances on both sub-tasks in Entities Object Localization challenge at Activitynet 2021, with 72.57 localization accuracy on the testing set of sub-task I and 0.2477 F1_all_per_sent on the hidden testing set of sub-task II.

الرؤية الحاسوبية وتمييز الأنماط