ﻻ يوجد ملخص باللغة العربية
In this paper, we describe the system for generating textual descriptions of short video clips using recurrent neural networks (RNN), which we used while participating in the Large Scale Movie Description Challenge 2015 in ICCV 2015. Our work builds on static image captioning systems with RNN based language models and extends this framework to videos utilizing both static image features and video-specific features. In addition, we study the usefulness of visual content classifiers as a source of additional information for caption generation. With experimental results we show that utilizing keyframe based features, dense trajectory video features and content classifier outputs together gives better performance than any one of them individually.
This paper introduces the system we developed for the Google Cloud & YouTube-8M Video Understanding Challenge, which can be considered as a multi-label classification problem defined on top of the large scale YouTube-8M Dataset. We employ a large set
Typical video classification methods often divide a video into short clips, do inference on each clip independently, then aggregate the clip-level predictions to generate the video-level results. However, processing visually similar clips independent
This report describes our solution for the VATEX Captioning Challenge 2020, which requires generating descriptions for the videos in both English and Chinese languages. We identified three crucial factors that improve the performance, namely: multi-v
We present our submission to the Microsoft Video to Language Challenge of generating short captions describing videos in the challenge dataset. Our model is based on the encoder--decoder pipeline, popular in image and video captioning systems. We pro
The Recurrent Neural Networks and their variants have shown promising performances in sequence modeling tasks such as Natural Language Processing. These models, however, turn out to be impractical and difficult to train when exposed to very high-dime