ﻻ يوجد ملخص باللغة العربية
Detecting complex events in a large video collection crawled from video websites is a challenging task. When applying directly good image-based feature representation, e.g., HOG, SIFT, to videos, we have to face the problem of how to pool multiple frame feature representations into one feature representation. In this paper, we propose a novel learning-based frame pooling method. We formulate the pooling weight learning as an optimization problem and thus our method can automatically learn the best pooling weight configuration for each specific event category. Experimental results conducted on TRECVID MED 2011 reveal that our method outperforms the commonly used average pooling and max pooling strategies on both high-level and low-level 2D image features.
Early wildfire detection is of paramount importance to avoid as much damage as possible to the environment, properties, and lives. Deep Learning (DL) models that can leverage both visible and infrared information have the potential to display state-o
In recent years, the involvement of synthetic strongly labeled data,weakly labeled data and unlabeled data has drawn much research attentionin semi-supervised sound event detection (SSED). Self-training models carry out predictions without strong ann
Access to large corpora with strongly labelled sound events is expensive and difficult in engineering applications. Much research turns to address the problem of how to detect both the types and the timestamps of sound events with weak labels that on
Surface defect detection plays an increasingly important role in manufacturing industry to guarantee the product quality. Many deep learning methods have been widely used in surface defect detection tasks, and have been proven to perform well in defe
Event cameras are activity-driven bio-inspired vision sensors, thereby resulting in advantages such as sparsity,high temporal resolution, low latency, and power consumption. Given the different sensing modality of event camera and high quality of con