ﻻ يوجد ملخص باللغة العربية
Video Visual Relation Detection (VidVRD), has received significant attention of our community over recent years. In this paper, we apply the state-of-the-art video object tracklet detection pipeline MEGA and deepSORT to generate tracklet proposals. Then we perform VidVRD in a tracklet-based manner without any pre-cutting operations. Specifically, we design a tracklet-based visual Transformer. It contains a temporal-aware decoder which performs feature interactions between the tracklets and learnable predicate query embeddings, and finally predicts the relations. Experimental results strongly demonstrate the superiority of our method, which outperforms other methods by a large margin on the Video Relation Understanding (VRU) Grand Challenge in ACM Multimedia 2021. Codes are released at https://github.com/Dawn-LX/VidVRD-tracklets.
Existing object detection frameworks are usually built on a single format of object/part representation, i.e., anchor/proposal rectangle boxes in RetinaNet and Faster R-CNN, center points in FCOS and RepPoints, and corner points in CornerNet. While t
Video editing tools are widely used nowadays for digital design. Although the demand for these tools is high, the prior knowledge required makes it difficult for novices to get started. Systems that could follow natural language instructions to perfo
Crowd estimation is a very challenging problem. The most recent study tries to exploit auditory information to aid the visual models, however, the performance is limited due to the lack of an effective approach for feature extraction and integration.
Detecting abnormal activities in real-world surveillance videos is an important yet challenging task as the prior knowledge about video anomalies is usually limited or unavailable. Despite that many approaches have been developed to resolve this prob
Autonomous highlight detection is crucial for enhancing the efficiency of video browsing on social media platforms. To attain this goal in a data-driven way, one may often face the situation where highlight annotations are not available on the target