ﻻ يوجد ملخص باللغة العربية
Recognizing Video events in long, complex videos with multiple sub-activities has received persistent attention recently. This task is more challenging than traditional action recognition with short, relatively homogeneous video clips. In this paper, we investigate the problem of recognizing long and complex events with varying action rhythms, which has not been considered in the literature but is a practical challenge. Our work is inspired in part by how humans identify events with varying rhythms: quickly catching frames contributing most to a specific event. We propose a two-stage emph{end-to-end} framework, in which the first stage selects the most significant frames while the second stage recognizes the event using the selected frames. Our model needs only emph{event-level labels} in the training stage, and thus is more practical when the sub-activity labels are missing or difficult to obtain. The results of extensive experiments show that our model can achieve significant improvement in event recognition from long videos while maintaining high accuracy even if the test videos suffer from severe rhythm changes. This demonstrates the potential of our method for real-world video-based applications, where test and training videos can differ drastically in rhythms of sub-activities.
As a spontaneous expression of emotion on face, micro-expression reveals the underlying emotion that cannot be controlled by human. In micro-expression, facial movement is transient and sparsely localized through time. However, the existing represent
This paper tackles video prediction from a new dimension of predicting spacetime-varying motions that are incessantly changing across both space and time. Prior methods mainly capture the temporal state transitions but overlook the complex spatiotemp
As the GAN-based face image and video generation techniques, widely known as DeepFakes, have become more and more matured and realistic, there comes a pressing and urgent demand for effective DeepFakes detectors. Motivated by the fact that remote vis
Recognizing attributes of objects and their parts is important to many computer vision applications. Although great progress has been made to apply object-level recognition, recognizing the attributes of parts remains less applicable since the traini
As a vital topic in media content interpretation, video anomaly detection (VAD) has made fruitful progress via deep neural network (DNN). However, existing methods usually follow a reconstruction or frame prediction routine. They suffer from two gaps