Incorporating Feedback into Tree-based Anomaly Detection

68 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shubhomoy Das

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Shubhomoy Das - Weng-Keen Wong - Alan Fern

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Anomaly detectors are often used to produce a ranked list of statistical anomalies, which are examined by human analysts in order to extract the actual anomalies of interest. Unfortunately, in realworld applications, this process can be exceedingly difficult for the analyst since a large fraction of high-ranking anomalies are false positives and not interesting from the application perspective. In this paper, we aim to make the analysts job easier by allowing for analyst feedback during the investigation process. Ideally, the feedback influences the ranking of the anomaly detector in a way that reduces the number of false positives that must be examined before discovering the anomalies of interest. In particular, we introduce a novel technique for incorporating simple binary feedback into tree-based anomaly detectors. We focus on the Isolation Forest algorithm as a representative tree-based anomaly detector, and show that we can significantly improve its performance by incorporating feedback, when compared with the baseline algorithm that does not incorporate feedback. Our technique is simple and scales well as the size of the data increases, which makes it suitable for interactive discovery of anomalies in large datasets.

قيم البحث

325 - Vatsal Sharan , Parikshit Gopalan , Udi Wieder 2018

We consider the problem of finding anomalies in high-dimensional data using popular PCA based anomaly scores. The naive algorithms for computing these scores explicitly compute the PCA of the covariance matrix which uses space quadratic in the dimens ionality of the data. We give the first streaming algorithms that use space that is linear or sublinear in the dimension. We prove general results showing that emph{any} sketch of a matrix that satisfies a certain operator norm guarantee can be used to approximate these scores. We instantiate these results with powerful matrix sketching techniques such as Frequent Directions and random projections to derive efficient and practical algorithms for these problems, which we validate over real-world data sets. Our main technical contribution is to prove matrix perturbation inequalities for operators arising in the computation of these measures.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Dependency-based Anomaly Detection: Framework, Methods and Benchmark

271 - Sha Lu , Lin Liu , Jiuyong Li 2020

Anomaly detection is an important research problem because anomalies often contain critical insights for understanding the unusual behavior in data. One type of anomaly detection approach is dependency-based, which identifies anomalies by examining t he violations of the normal dependency among variables. These methods can discover subtle and meaningful anomalies with better interpretation. Existing dependency-based methods adopt different implementations and show different strengths and weaknesses. However, the theoretical fundamentals and the general process behind them have not been well studied. This paper proposes a general framework, DepAD, to provide a unified process for dependency-based anomaly detection. DepAD decomposes unsupervised anomaly detection tasks into feature selection and prediction problems. Utilizing off-the-shelf techniques, the DepAD framework can have various instantiations to suit different application domains. Comprehensive experiments have been conducted over one hundred instantiated DepAD methods with 32 real-world datasets to evaluate the performance of representative techniques in DepAD. To show the effectiveness of DepAD, we compare two DepAD methods with nine state-of-the-art anomaly detection methods, and the results show that DepAD methods outperform comparison methods in most cases. Through the DepAD framework, this paper gives guidance and inspiration for future research of dependency-based anomaly detection and provides a benchmark for its evaluation.

التعلم الآلي الذكاء الاصطناعي

MSTREAM: Fast Anomaly Detection in Multi-Aspect Streams

111 - Siddharth Bhatia , Arjit Jain , Pan Li 2020

Given a stream of entries in a multi-aspect data setting i.e., entries having multiple dimensions, how can we detect anomalous activities in an unsupervised manner? For example, in the intrusion detection setting, existing work seeks to detect anomal ous events or edges in dynamic graph streams, but this does not allow us to take into account additional attributes of each entry. Our work aims to define a streaming multi-aspect data anomaly detection framework, termed MSTREAM which can detect unusual group anomalies as they occur, in a dynamic manner. MSTREAM has the following properties: (a) it detects anomalies in multi-aspect data including both categorical and numeric attributes; (b) it is online, thus processing each record in constant time and constant memory; (c) it can capture the correlation between multiple aspects of the data. MSTREAM is evaluated over the KDDCUP99, CICIDS-DoS, UNSW-NB 15 and CICIDS-DDoS datasets, and outperforms state-of-the-art baselines.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Incorporating Reachability Knowledge into a Multi-Spatial Graph Convolution Based Seq2Seq Model for Traffic Forecasting

98 - Jiexia Ye , Furong Zheng , Juanjuan Zhao 2021

Accurate traffic state prediction is the foundation of transportation control and guidance. It is very challenging due to the complex spatiotemporal dependencies in traffic data. Existing works cannot perform well for multi-step traffic prediction th at involves long future time period. The spatiotemporal information dilution becomes serve when the time gap between input step and predicted step is large, especially when traffic data is not sufficient or noisy. To address this issue, we propose a multi-spatial graph convolution based Seq2Seq model. Our main novelties are three aspects: (1) We enrich the spatiotemporal information of model inputs by fusing multi-view features (time, location and traffic states) (2) We build multiple kinds of spatial correlations based on both prior knowledge and data-driven knowledge to improve model performance especially in insufficient or noisy data cases. (3) A spatiotemporal attention mechanism based on reachability knowledge is novelly designed to produce high-level features fed into decoder of Seq2Seq directly to ease information dilution. Our model is evaluated on two real world traffic datasets and achieves better performance than other competitors.

التعلم الآلي الذكاء الاصطناعي

Flow-based Anomaly Detection

183 - {L}ukasz Maziarka , Marek Smieja , Marcin Sendera 2020

We propose OneFlow - a flow-based one-class classifier for anomaly (outliers) detection that finds a minimal volume bounding region. Contrary to density-based methods, OneFlow is constructed in such a way that its result typically does not depend on the structure of outliers. This is caused by the fact that during training the gradient of the cost function is propagated only over the points located near to the decision boundary (behavior similar to the support vectors in SVM). The combination of flow models and Bernstein quantile estimator allows OneFlow to find a parametric form of bounding region, which can be useful in various applications including describing shapes from 3D point clouds. Experiments show that the proposed model outperforms related methods on real-world anomaly detection problems.

التعلم الآلي التعلم الالي