No Arabic abstract
In the class of streaming anomaly detection algorithms for univariate time series, the size of the sliding window over which various statistics are calculated is an important parameter. To address the anomalous variation in the scale of the pseudo-periodicity of time series, we define a streaming multi-scale anomaly score with a streaming PCA over a multi-scale lag-matrix. We define three methods of aggregation of the multi-scale anomaly scores. We evaluate their performance on Yahoo! and Numenta dataset for unsupervised anomaly detection benchmark. To the best of authors knowledge, this is the first time a multi-scale streaming anomaly detection has been proposed and systematically studied.
Nowadays, multi-sensor technologies are applied in many fields, e.g., Health Care (HC), Human Activity Recognition (HAR), and Industrial Control System (ICS). These sensors can generate a substantial amount of multivariate time-series data. Unsupervised anomaly detection on multi-sensor time-series data has been proven critical in machine learning researches. The key challenge is to discover generalized normal patterns by capturing spatial-temporal correlation in multi-sensor data. Beyond this challenge, the noisy data is often intertwined with the training data, which is likely to mislead the model by making it hard to distinguish between the normal, abnormal, and noisy data. Few of previous researches can jointly address these two challenges. In this paper, we propose a novel deep learning-based anomaly detection algorithm called Deep Convolutional Autoencoding Memory network (CAE-M). We first build a Deep Convolutional Autoencoder to characterize spatial dependence of multi-sensor data with a Maximum Mean Discrepancy (MMD) to better distinguish between the noisy, normal, and abnormal data. Then, we construct a Memory Network consisting of linear (Autoregressive Model) and non-linear predictions (Bidirectional LSTM with Attention) to capture temporal dependence from time-series data. Finally, CAE-M jointly optimizes these two subnetworks. We empirically compare the proposed approach with several state-of-the-art anomaly detection methods on HAR and HC datasets. Experimental results demonstrate that our proposed model outperforms these existing methods.
The unsupervised detection of anomalies in time series data has important applications in user behavioral modeling, fraud detection, and cybersecurity. Anomaly detection has, in fact, been extensively studied in categorical sequences. However, we often have access to time series data that represent paths through networks. Examples include transaction sequences in financial networks, click streams of users in networks of cross-referenced documents, or travel itineraries in transportation networks. To reliably detect anomalies, we must account for the fact that such data contain a large number of independent observations of paths constrained by a graph topology. Moreover, the heterogeneity of real systems rules out frequency-based anomaly detection techniques, which do not account for highly skewed edge and degree statistics. To address this problem, we introduce HYPA, a novel framework for the unsupervised detection of anomalies in large corpora of variable-length temporal paths in a graph. HYPA provides an efficient analytical method to detect paths with anomalous frequencies that result from nodes being traversed in unexpected chronological order.
On-line detection of anomalies in time series is a key technique used in various event-sensitive scenarios such as robotic system monitoring, smart sensor networks and data center security. However, the increasing diversity of data sources and the variety of demands make this task more challenging than ever. Firstly, the rapid increase in unlabeled data means supervised learning is becoming less suitable in many cases. Secondly, a large portion of time series data have complex seasonality features. Thirdly, on-line anomaly detection needs to be fast and reliable. In light of this, we have developed a prediction-driven, unsupervised anomaly detection scheme, which adopts a backbone model combining the decomposition and the inference of time series data. Further, we propose a novel metric, Local Trend Inconsistency (LTI), and an efficient detection algorithm that computes LTI in a real-time manner and scores each data point robustly in terms of its probability of being anomalous. We have conducted extensive experimentation to evaluate our algorithm with several datasets from both public repositories and production environments. The experimental results show that our scheme outperforms existing representative anomaly detection algorithms in terms of the commonly used metric, Area Under Curve (AUC), while achieving the desired efficiency.
We propose a Bayesian nonparametric approach to modelling and predicting a class of functional time series with application to energy markets, based on fully observed, noise-free functional data. Traders in such contexts conceive profitable strategies if they can anticipate the impact of their bidding actions on the aggregate demand and supply curves, which in turn need to be predicted reliably. Here we propose a simple Bayesian nonparametric method for predicting such curves, which take the form of monotonic bounded step functions. We borrow ideas from population genetics by defining a class of interacting particle systems to model the functional trajectory, and develop an implementation strategy which uses ideas from Markov chain Monte Carlo and approximate Bayesian computation techniques and allows to circumvent the intractability of the likelihood. Our approach shows great adaptation to the degree of smoothness of the curves and the volatility of the functional series, proves to be robust to an increase of the forecast horizon and yields an uncertainty quantification for the functional forecasts. We illustrate the model and discuss its performance with simulated datasets and on real data relative to the Italian natural gas market.
In this paper, we study the problems of principal Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting. We propose a simple and efficient algorithm, Gen-Oja, for these problems. We prove the global convergence of our algorithm, borrowing ideas from the theory of fast-mixing Markov chains and two-time-scale stochastic approximation, showing that it achieves the optimal rate of convergence. In the process, we develop tools for understanding stochastic processes with Markovian noise which might be of independent interest.