MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift

234 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Siddharth Bhatia

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Siddharth Bhatia - Arjit Jain - Shivin Srivastava

التعلم الآلي الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Given a stream of entries over time in a multi-aspect data setting where concept drift is present, how can we detect anomalous activities? Most of the existing unsupervised anomaly detection approaches seek to detect anomalous events in an offline fashion and require a large amount of data for training. This is not practical in real-life scenarios where we receive the data in a streaming manner and do not know the size of the stream beforehand. Thus, we need a data-efficient method that can detect and adapt to changing data trends, or concept drift, in an online manner. In this work, we propose MemStream, a streaming multi-aspect anomaly detection framework, allowing us to detect unusual events as they occur while being resilient to concept drift. We leverage the power of a denoising autoencoder to learn representations and a memory module to learn the dynamically changing trend in data without the need for labels. We prove the optimum memory size required for effective drift handling. Furthermore, MemStream makes use of two architecture design choices to be robust to memory poisoning. Experimental results show the effectiveness of our approach compared to state-of-the-art streaming baselines using 2 synthetic datasets and 11 real-world datasets.

قيم البحث

111 - Siddharth Bhatia , Arjit Jain , Pan Li 2020

Given a stream of entries in a multi-aspect data setting i.e., entries having multiple dimensions, how can we detect anomalous activities in an unsupervised manner? For example, in the intrusion detection setting, existing work seeks to detect anomal ous events or edges in dynamic graph streams, but this does not allow us to take into account additional attributes of each entry. Our work aims to define a streaming multi-aspect data anomaly detection framework, termed MSTREAM which can detect unusual group anomalies as they occur, in a dynamic manner. MSTREAM has the following properties: (a) it detects anomalies in multi-aspect data including both categorical and numeric attributes; (b) it is online, thus processing each record in constant time and constant memory; (c) it can capture the correlation between multiple aspects of the data. MSTREAM is evaluated over the KDDCUP99, CICIDS-DoS, UNSW-NB 15 and CICIDS-DDoS datasets, and outperforms state-of-the-art baselines.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

PWPAE: An Ensemble Framework for Concept Drift Adaptation in IoT Data Streams

129 - Li Yang , Dimitrios Michael Manias , Abdallah Shami 2021

As the number of Internet of Things (IoT) devices and systems have surged, IoT data analytics techniques have been developed to detect malicious cyber-attacks and secure IoT systems; however, concept drift issues often occur in IoT data analytics, as IoT data is often dynamic data streams that change over time, causing model degradation and attack detection failure. This is because traditional data analytics models are static models that cannot adapt to data distribution changes. In this paper, we propose a Performance Weighted Probability Averaging Ensemble (PWPAE) framework for drift adaptive IoT anomaly detection through IoT data stream analytics. Experiments on two public datasets show the effectiveness of our proposed PWPAE method compared against state-of-the-art methods.

التعلم الآلي الذكاء الاصطناعي التشفير والأمن

Concept Drift Detection in Federated Networked Systems

227 - Dimitrios Michael Manias , Ibrahim Shaer , Li Yang 2021

As next-generation networks materialize, increasing levels of intelligence are required. Federated Learning has been identified as a key enabling technology of intelligent and distributed networks; however, it is prone to concept drift as with any ma chine learning application. Concept drift directly affects the models performance and can result in severe consequences considering the critical and emergency services provided by modern networks. To mitigate the adverse effects of drift, this paper proposes a concept drift detection system leveraging the federated learning updates provided at each iteration of the federated training process. Using dimensionality reduction and clustering techniques, a framework that isolates the systems drifted nodes is presented through experiments using an Intelligent Transportation System as a use case. The presented work demonstrates that the proposed framework is able to detect drifted nodes in a variety of non-iid scenarios at different stages of drift and different levels of system exposure.

التعلم الآلي بنية الشبكات والإنترنت

Real-Time Anomaly Detection in Edge Streams

313 - Siddharth Bhatia , Rui Liu , Bryan Hooi 2020

Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprisin g edges. In this work, we propose MIDAS, which focuses on detecting microcluster anomalies, or suddenly arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of service attacks in network traffic data. We further propose MIDAS-F, to solve the problem by which anomalies are incorporated into the algorithms internal states, creating a `poisoning effect that can allow future anomalies to slip through undetected. MIDAS-F introduces two modifications: 1) We modify the anomaly scoring function, aiming to reduce the `poisoning effect of newly arriving edges; 2) We introduce a conditional merge step, which updates the algorithms data structures after each time tick, but only if the anomaly score is below a threshold value, also to reduce the `poisoning effect. Experiments show that MIDAS-F has significantly higher accuracy than MIDAS. MIDAS has the following properties: (a) it detects microcluster anomalies while providing theoretical guarantees about its false positive probability; (b) it is online, thus processing each edge in constant time and constant memory, and also processes the data orders-of-magnitude faster than state-of-the-art approaches; (c) it provides up to 62% higher ROC-AUC than state-of-the-art approaches.

التعلم الآلي الشبكات الاجتماعية والمعلومات التعلم الالي

CURIE: A Cellular Automaton for Concept Drift Detection

252 - Jesus L. Lobo , Javier Del Ser , Eneko Osaba 2020

Data stream mining extracts information from large quantities of data flowing fast and continuously (data streams). They are usually affected by changes in the data distribution, giving rise to a phenomenon referred to as concept drift. Thus, learnin g models must detect and adapt to such changes, so as to exhibit a good predictive performance after a drift has occurred. In this regard, the development of effective drift detection algorithms becomes a key factor in data stream mining. In this work we propose CU RIE, a drift detector relying on cellular automata. Specifically, in CU RIE the distribution of the data stream is represented in the grid of a cellular automata, whose neighborhood rule can then be utilized to detect possible distribution changes over the stream. Computer simulations are presented and discussed to show that CU RIE, when hybridized with other base learners, renders a competitive behavior in terms of detection metrics and classification accuracy. CU RIE is compared with well-established drift detectors over synthetic datasets with varying drift characteristics.

التعلم الآلي التعلم الالي