ﻻ يوجد ملخص باللغة العربية
Due to the high complexity and occlusion, insufficient perception in the crowded urban intersection can be a serious safety risk for both human drivers and autonomous algorithms, whereas CVIS (Cooperative Vehicle Infrastructure System) is a proposed solution for full-participants perception in this scenario. However, the research on roadside multimodal perception is still in its infancy, and there is no open-source dataset for such scenario. Accordingly, this paper fills the gap. Through an IPS (Intersection Perception System) installed at the diagonal of the intersection, this paper proposes a high-quality multimodal dataset for the intersection perception task. The center of the experimental intersection covers an area of 3000m2, and the extended distance reaches 300m, which is typical for CVIS. The first batch of open-source data includes 14198 frames, and each frame has an average of 319.84 labels, which is 9.6 times larger than the most crowded dataset (H3D dataset in 2019) by now. In order to facilitate further study, this dataset tries to keep the label documents consistent with the KITTI dataset, and a standardized benchmark is created for algorithm evaluation. Our dataset is available at: http://www.openmpd.com/column/other_datasets.
With the increasing global popularity of self-driving cars, there is an immediate need for challenging real-world datasets for benchmarking and training various computer vision tasks such as 3D object detection. Existing datasets either represent sim
The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing self-driving datasets are limited in the scale and variation of the environments they
While significant advancements have been made in the generation of deepfakes using deep learning technologies, its misuse is a well-known issue now. Deepfakes can cause severe security and privacy issues as they can be used to impersonate a persons i
Understanding human-object interactions is fundamental in First Person Vision (FPV). Tracking algorithms which follow the objects manipulated by the camera wearer can provide useful information to effectively model such interactions. Despite a few pr
This paper introduces a new video-and-language dataset with human actions for multimodal logical inference, which focuses on intentional and aspectual expressions that describe dynamic human actions. The dataset consists of 200 videos, 5,554 action l