A Survey on Deep Learning Techniques for Video Anomaly Detection

115 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jessie James Suarez

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jessie James P. Suarez - Prospero C. Naval Jr

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Anomaly detection in videos is a problem that has been studied for more than a decade. This area has piqued the interest of researchers due to its wide applicability. Because of this, there has been a wide array of approaches that have been proposed throughout the years and these approaches range from statistical-based approaches to machine learning-based approaches. Numerous surveys have already been conducted on this area but this paper focuses on providing an overview on the recent advances in the field of anomaly detection using Deep Learning. Deep Learning has been applied successfully in many fields of artificial intelligence such as computer vision, natural language processing and more. This survey, however, focuses on how Deep Learning has improved and provided more insights to the area of video anomaly detection. This paper provides a categorization of the different Deep Learning approaches with respect to their objectives. Additionally, it also discusses the commonly used datasets along with the common evaluation metrics. Afterwards, a discussion synthesizing all of the recent approaches is made to provide direction and possible areas for future research.

قيم البحث

113 - Sergiu Oprea , Pablo Martinez-Gonzalez , Alberto Garcia-Garcia 2020

The ability to predict, anticipate and reason about future outcomes is a key component of intelligent decision-making systems. In light of the success of deep learning in computer vision, deep-learning-based video prediction emerged as a promising re search direction. Defined as a self-supervised learning task, video prediction represents a suitable framework for representation learning, as it demonstrated potential capabilities for extracting meaningful representations of the underlying patterns in natural videos. Motivated by the increasing interest in this task, we provide a review on the deep learning methods for prediction in video sequences. We firstly define the video prediction fundamentals, as well as mandatory background concepts and the most used datasets. Next, we carefully analyze existing video prediction models organized according to a proposed taxonomy, highlighting their contributions and their significance in the field. The summary of the datasets and methods is accompanied with experimental results that facilitate the assessment of the state of the art on a quantitative basis. The paper is summarized by drawing some general conclusions, identifying open research challenges and by pointing out future research directions.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

Cloze Test Helps: Effective Video Anomaly Detection via Learning to Complete Video Events

180 - Guang Yu , Siqi Wang , Zhiping Cai 2020

As a vital topic in media content interpretation, video anomaly detection (VAD) has made fruitful progress via deep neural network (DNN). However, existing methods usually follow a reconstruction or frame prediction routine. They suffer from two gaps : (1) They cannot localize video activities in a both precise and comprehensive manner. (2) They lack sufficient abilities to utilize high-level semantics and temporal context information. Inspired by frequently-used cloze test in language study, we propose a brand-new VAD solution named Video Event Completion (VEC) to bridge gaps above: First, we propose a novel pipeline to achieve both precise and comprehensive enclosure of video activities. Appearance and motion are exploited as mutually complimentary cues to localize regions of interest (RoIs). A normalized spatio-temporal cube (STC) is built from each RoI as a video event, which lays the foundation of VEC and serves as a basic processing unit. Second, we encourage DNN to capture high-level semantics by solving a visual cloze test. To build such a visual cloze test, a certain patch of STC is erased to yield an incomplete event (IE). The DNN learns to restore the original video event from the IE by inferring the missing patch. Third, to incorporate richer motion dynamics, another DNN is trained to infer erased patches optical flow. Finally, two ensemble strategies using different types of IE and modalities are proposed to boost VAD performance, so as to fully exploit the temporal context and modality information for VAD. VEC can consistently outperform state-of-the-art methods by a notable margin (typically 1.5%-5% AUROC) on commonly-used VAD benchmarks. Our codes and results can be verified at github.com/yuguangnudt/VEC_VAD.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

Anomaly Detection in Video via Self-Supervised and Multi-Task Learning

89 - Mariana-Iuliana Georgescu , Antonio Barbalau , Radu Tudor Ionescu 2020

Anomaly detection in video is a challenging computer vision problem. Due to the lack of anomalous events at training time, anomaly detection requires the design of learning methods without full supervision. In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level. We first utilize a pre-trained detector to detect objects. Then, we train a 3D convolutional neural network to produce discriminative anomaly-specific information by jointly learning multiple proxy tasks: three self-supervised and one based on knowledge distillation. The self-supervised tasks are: (i) discrimination of forward/backward moving objects (arrow of time), (ii) discrimination of objects in consecutive/intermittent frames (motion irregularity) and (iii) reconstruction of object-specific appearance information. The knowledge distillation task takes into account both classification and detection information, generating large prediction discrepancies between teacher and student models when anomalies occur. To the best of our knowledge, we are the first to approach anomalous event detection in video as a multi-task learning problem, integrating multiple self-supervised and knowledge distillation proxy tasks in a single architecture. Our lightweight architecture outperforms the state-of-the-art methods on three benchmarks: Avenue, ShanghaiTech and UCSD Ped2. Additionally, we perform an ablation study demonstrating the importance of integrating self-supervised learning and normality-specific distillation in a multi-task learning setting.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

A Survey on Deep Learning Technique for Video Segmentation

103 - Wenguan Wang , Tianfei Zhou , Fatih Porikli 2021

Video segmentation, i.e., partitioning video frames into multiple segments or objects, plays a critical role in a broad range of practical applications, e.g., visual effect assistance in movie, scene understanding in autonomous driving, and virtual b ackground creation in video conferencing, to name a few. Recently, due to the renaissance of connectionism in computer vision, there has been an influx of numerous deep learning based approaches that have been dedicated to video segmentation and delivered compelling performance. In this survey, we comprehensively review two basic lines of research in this area, i.e., generic object segmentation (of unknown categories) in videos and video semantic segmentation, by introducing their respective task settings, background concepts, perceived need, development history, and main challenges. We also provide a detailed overview of representative literature on both methods and datasets. Additionally, we present quantitative performance comparisons of the reviewed methods on benchmark datasets. At last, we point out a set of unsolved open issues in this field, and suggest possible opportunities for further research.

الرؤية الحاسوبية وتمييز الأنماط

A Survey of Single-Scene Video Anomaly Detection

275 - Bharathkumar Ramachandra , Michael J. Jones , Ranga Raju Vatsavai 2020

This survey article summarizes research trends on the topic of anomaly detection in video feeds of a single scene. We discuss the various problem formulations, publicly available datasets and evaluation criteria. We categorize and situate past resear ch into an intuitive taxonomy and provide a comprehensive comparison of the accuracy of many algorithms on standard test sets. Finally, we also provide best practices and suggest some possible directions for future research.

الرؤية الحاسوبية وتمييز الأنماط