ﻻ يوجد ملخص باللغة العربية
Event-specific concepts are the semantic concepts designed for the events of interest, which can be used as a mid-level representation of complex events in videos. Existing methods only focus on defining event-specific concepts for a small number of predefined events, but cannot handle novel unseen events. This motivates us to build a large scale event-specific concept library that covers as many real-world events and their concepts as possible. Specifically, we choose WikiHow, an online forum containing a large number of how-to articles on human daily life events. We perform a coarse-to-fine event discovery process and discover 500 events from WikiHow articles. Then we use each event name as query to search YouTube and discover event-specific concepts from the tags of returned videos. After an automatic filter process, we end up with 95,321 videos and 4,490 concepts. We train a Convolutional Neural Network (CNN) model on the 95,321 videos over the 500 events, and use the model to extract deep learning feature from video content. With the learned deep learning feature, we train 4,490 binary SVM classifiers as the event-specific concept library. The concepts and events are further organized in a hierarchical structure defined by WikiHow, and the resultant concept library is called EventNet. Finally, the EventNet concept library is used to generate concept based representation of event videos. To the best of our knowledge, EventNet represents the first video event ontology that organizes events and their concepts into a semantic structure. It offers great potential for event retrieval and browsing. Extensive experiments over the zero-shot event retrieval task when no training samples are available show that the EventNet concept library consistently and significantly outperforms the state-of-the-art (such as the 20K ImageNet concepts trained with CNN) by a large margin up to 207%.
We report on CMU Informedia Labs system used in Googles YouTube 8 Million Video Understanding Challenge. In this multi-label video classification task, our pipeline achieved 84.675% and 84.662% GAP on our evaluation split and the official test set. W
Along with the development of modern smart cities, human-centric video analysis has been encountering the challenge of analyzing diverse and complex events in real scenes. A complex event relates to dense crowds, anomalous, or collective behaviors. H
In this paper, we propose a discriminative video representation for event detection over a large scale video dataset when only limited hardware resources are available. The focus of this paper is to effectively leverage deep Convolutional Neural Netw
We propose to leverage a generic object tracker in order to perform object mining in large-scale unlabeled videos, captured in a realistic automotive setting. We present a dataset of more than 360000 automatically mined object tracks from 10+ hours o
Abnormal event detection in video is a complex computer vision problem that has attracted significant attention in recent years. The complexity of the task arises from the commonly-adopted definition of an abnormal event, that is, a rarely occurring