ﻻ يوجد ملخص باللغة العربية
We report on CMU Informedia Labs system used in Googles YouTube 8 Million Video Understanding Challenge. In this multi-label video classification task, our pipeline achieved 84.675% and 84.662% GAP on our evaluation split and the official test set. We attribute the good performance to three components: 1) Refined video representation learning with residual links and hypercolumns 2) Latent concept mining which captures interactions among concepts. 3) Learning with temporal segments and weighted multi-model ensemble. We conduct experiments to validate and analyze the contribution of our models. We also share some unsuccessful trials leveraging conventional approaches such as recurrent neural networks for video representation learning for this large-scale video dataset. All the codes to reproduce our results are publicly available at https://github.com/Martini09/informedia-yt8m-release.
We propose to leverage a generic object tracker in order to perform object mining in large-scale unlabeled videos, captured in a realistic automotive setting. We present a dataset of more than 360000 automatically mined object tracks from 10+ hours o
Videos are a rich source of high-dimensional structured data, with a wide range of interacting components at varying levels of granularity. In order to improve understanding of unconstrained internet videos, it is important to consider the role of la
This paper addresses the problem of object discovery from unlabeled driving videos captured in a realistic automotive setting. Identifying recurring object categories in such raw video streams is a very challenging problem. Not only do object candida
This paper introduces the system we developed for the Google Cloud & YouTube-8M Video Understanding Challenge, which can be considered as a multi-label classification problem defined on top of the large scale YouTube-8M Dataset. We employ a large set
The natural association between visual observations and their corresponding sound provides powerful self-supervisory signals for learning video representations, which makes the ever-growing amount of online videos an attractive source of training dat