ﻻ يوجد ملخص باللغة العربية
Current developments in temporal event or action localization usually target actions captured by a single camera. However, extensive events or actions in the wild may be captured as a sequence of shots by multiple cameras at different positions. In this paper, we propose a new and challenging task called multi-shot temporal event localization, and accordingly, collect a large scale dataset called MUlti-Shot EventS (MUSES). MUSES has 31,477 event instances for a total of 716 video hours. The core nature of MUSES is the frequent shot cuts, for an average of 19 shots per instance and 176 shots per video, which induces large intrainstance variations. Our comprehensive evaluations show that the state-of-the-art method in temporal action localization only achieves an mAP of 13.1% at IoU=0.5. As a minor contribution, we present a simple baseline approach for handling the intra-instance variations, which reports an mAP of 18.9% on MUSES and 56.9% on THUMOS14 at IoU=0.5. To facilitate research in this direction, we release the dataset and the project code at https://songbai.site/muses/ .
Video temporal action detection aims to temporally localize and recognize the action in untrimmed videos. Existing one-stage approaches mostly focus on unifying two subtasks, i.e., localization of action proposals and classification of each proposal
This paper presents a novel task together with a new benchmark for detecting generic, taxonomy-free event boundaries that segment a whole video into chunks. Conventional work in temporal video segmentation and action detection focuses on localizing p
We study the problem of localizing audio-visual events that are both audible and visible in a video. Existing works focus on encoding and aligning audio and visual features at the segment level while neglecting informative correlation between segment
Temporal Color Constancy (CC) is a recently proposed approach that challenges the conventional single-frame color constancy. The conventional approach is to use a single frame - shot frame - to estimate the scene illumination color. In temporal CC, m
In ophthalmology, early fundus screening is an economic and effective way to prevent blindness caused by ophthalmic diseases. Clinically, due to the lack of medical resources, manual diagnosis is time-consuming and may delay the condition. With the d