Subscribe to the gold package and get unlimited access to Shamra Academy

MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams

129 0 0.0 ( 0 )

Download Cite

Added by Siddharth Bhatia

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Siddharth Bhatia - Bryan Hooi - Minji Yoon

Machine Learning Artificial Intelligence

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose MIDAS, which focuses on detecting microcluster anomalies, or suddenly arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of service attacks in network traffic data. MIDAS has the following properties: (a) it detects microcluster anomalies while providing theoretical guarantees about its false positive probability; (b) it is online, thus processing each edge in constant time and constant memory, and also processes the data 162-644 times faster than state-of-the-art approaches; (c) it provides 42%-48% higher accuracy (in terms of AUC) than state-of-the-art approaches.

rate research

Computing Graph Descriptors on Edge Streams

304 - Zohair Raza Hassan , Imdadullah Khan , Mudassir Shabbir 2021

Graph feature extraction is a fundamental task in graphs analytics. Using feature vectors (graph descriptors) in tandem with data mining algorithms that operate on Euclidean data, one can solve problems such as classification, clustering, and anomaly detection on graph-structured data. This idea has proved fruitful in the past, with spectral-based graph descriptors providing state-of-the-art classification accuracy on benchmark datasets. However, these algorithms do not scale to large graphs since: 1) they require storing the entire graph in memory, and 2) the end-user has no control over the algorithms runtime. In this paper, we present single-pass streaming algorithms to approximate structural features of graphs (counts of subgraphs of order $k geq 4$). Operating on edge streams allows us to avoid keeping the entire graph in memory, and controlling the sample size enables us to control the time taken by the algorithm. We demonstrate the efficacy of our descriptors by analyzing the approximation error, classification accuracy, and scalability to massive graphs. Our experiments showcase the effect of the sample size on approximation error and predictive accuracy. The proposed descriptors are applicable on graphs with millions of edges within minutes and outperform the state-of-the-art descriptors in classification accuracy.

Machine Learning Artificial Intelligence Data Structures and Algorithms

Real-Time Anomaly Detection in Edge Streams

313 - Siddharth Bhatia , Rui Liu , Bryan Hooi 2020

Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose MIDAS, which focuses on detecting microcluster anomalies, or suddenly arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of service attacks in network traffic data. We further propose MIDAS-F, to solve the problem by which anomalies are incorporated into the algorithms internal states, creating a `poisoning effect that can allow future anomalies to slip through undetected. MIDAS-F introduces two modifications: 1) We modify the anomaly scoring function, aiming to reduce the `poisoning effect of newly arriving edges; 2) We introduce a conditional merge step, which updates the algorithms data structures after each time tick, but only if the anomaly score is below a threshold value, also to reduce the `poisoning effect. Experiments show that MIDAS-F has significantly higher accuracy than MIDAS. MIDAS has the following properties: (a) it detects microcluster anomalies while providing theoretical guarantees about its false positive probability; (b) it is online, thus processing each edge in constant time and constant memory, and also processes the data orders-of-magnitude faster than state-of-the-art approaches; (c) it provides up to 62% higher ROC-AUC than state-of-the-art approaches.

Machine Learning Social and Information Networks Machine Learning

MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift

233 - Siddharth Bhatia , Arjit Jain , Shivin Srivastava 2021

Given a stream of entries over time in a multi-aspect data setting where concept drift is present, how can we detect anomalous activities? Most of the existing unsupervised anomaly detection approaches seek to detect anomalous events in an offline fashion and require a large amount of data for training. This is not practical in real-life scenarios where we receive the data in a streaming manner and do not know the size of the stream beforehand. Thus, we need a data-efficient method that can detect and adapt to changing data trends, or concept drift, in an online manner. In this work, we propose MemStream, a streaming multi-aspect anomaly detection framework, allowing us to detect unusual events as they occur while being resilient to concept drift. We leverage the power of a denoising autoencoder to learn representations and a memory module to learn the dynamically changing trend in data without the need for labels. We prove the optimum memory size required for effective drift handling. Furthermore, MemStream makes use of two architecture design choices to be robust to memory poisoning. Experimental results show the effectiveness of our approach compared to state-of-the-art streaming baselines using 2 synthetic datasets and 11 real-world datasets.

Machine Learning Artificial Intelligence

Edge Intelligence for Empowering IoT-based Healthcare Systems

65 - Vahideh Hayyolalam , Moayad Aloqaily , Oznur Ozkasap 2021

The demand for real-time, affordable, and efficient smart healthcare services is increasing exponentially due to the technological revolution and burst of population. To meet the increasing demands on this critical infrastructure, there is a need for intelligent methods to cope with the existing obstacles in this area. In this regard, edge computing technology can reduce latency and energy consumption by moving processes closer to the data sources in comparison to the traditional centralized cloud and IoT-based healthcare systems. In addition, by bringing automated insights into the smart healthcare systems, artificial intelligence (AI) provides the possibility of detecting and predicting high-risk diseases in advance, decreasing medical costs for patients, and offering efficient treatments. The objective of this article is to highlight the benefits of the adoption of edge intelligent technology, along with AI in smart healthcare systems. Moreover, a novel smart healthcare model is proposed to boost the utilization of AI and edge technology in smart healthcare systems. Additionally, the paper discusses issues and research directions arising when integrating these different technologies together.

Machine Learning Artificial Intelligence Distributed Parallel and Cluster Computing

Imputing Missing Events in Continuous-Time Event Streams

97 - Hongyuan Mei , Guanghui Qin , Jason Eisner 2019

Events in the world may be caused by other, unobserved events. We consider sequences of events in continuous time. Given a probability model of complete sequences, we propose particle smoothing---a form of sequential importance sampling---to impute the missing events in an incomplete sequence. We develop a trainable family of proposal distributions based on a type of bidirectional continuous-time LSTM: Bidirectionality lets the proposals condition on future observations, not just on the past as in particle filtering. Our method can sample an ensemble of possible complete sequences (particles), from which we form a single consensus prediction that has low Bayes risk under our chosen loss metric. We experiment in multiple synthetic and real domains, using different missingness mechanisms, and modeling the complete sequences in each domain with a neural Hawkes process (Mei & Eisner 2017). On held-out incomplete sequences, our method is effective at inferring the ground-truth unobserved events, with particle smoothing consistently improving upon particle filtering.

Machine Learning Artificial Intelligence Machine Learning

MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions