ﻻ يوجد ملخص باللغة العربية
Compact keyframe-based video summaries are a popular way of generating viewership on video sharing platforms. Yet, creating relevant and compelling summaries for arbitrarily long videos with a small number of keyframes is a challenging task. We propose a comprehensive keyframe-based summarization framework combining deep convolutional neural networks and restricted Boltzmann machines. An original co-regularization scheme is used to discover meaningful subject-scene associations. The resulting multimodal representations are then used to select highly-relevant keyframes. A comprehensive user study is conducted comparing our proposed method to a variety of schemes, including the summarization currently in use by one of the most popular video sharing websites. The results show that our method consistently outperforms the baseline schemes for any given amount of keyframes both in terms of attractiveness and informativeness. The lead is even more significant for smaller summaries.
Video is an essential imaging modality for diagnostics, e.g. in ultrasound imaging, for endoscopy, or movement assessment. However, video hasnt received a lot of attention in the medical image analysis community. In the clinical practice, it is chall
Audio and vision are two main modalities in video data. Multimodal learning, especially for audiovisual learning, has drawn considerable attention recently, which can boost the performance of various computer vision tasks. However, in video summariza
Exploiting the inner-shot and inter-shot dependencies is essential for key-shot based video summarization. Current approaches mainly devote to modeling the video as a frame sequence by recurrent neural networks. However, one potential limitation of t
A generic video summary is an abridged version of a video that conveys the whole story and features the most important scenes. Yet the importance of scenes in a video is often subjective, and users should have the option of customizing the summary by
We address the challenging problem of learning motion representations using deep models for video recognition. To this end, we make use of attention modules that learn to highlight regions in the video and aggregate features for recognition. Specific