No Arabic abstract
Several statistical approaches based on reproducing kernels have been proposed to detect abrupt changes arising in the full distribution of the observations and not only in the mean or variance. Some of these approaches enjoy good statistical properties (oracle inequality, ldots). Nonetheless, they have a high computational cost both in terms of time and memory. This makes their application difficult even for small and medium sample sizes ($n< 10^4$). This computational issue is addressed by first describing a new efficient and exact algorithm for kernel multiple change-point detection with an improved worst-case complexity that is quadratic in time and linear in space. It allows dealing with medium size signals (up to $n approx 10^5$). Second, a faster but approximation algorithm is described. It is based on a low-rank approximation to the Gram matrix. It is linear in time and space. This approximation algorithm can be applied to large-scale signals ($n geq 10^6$). These exact and approximation algorithms have been implemented in texttt{R} and texttt{C} for various kernels. The computational and statistical performances of these new algorithms have been assessed through empirical experiments. The runtime of the new algorithms is observed to be faster than that of other considered procedures. Finally, simulations confirmed the higher statistical accuracy of kernel-based approaches to detect changes that are not only in the mean. These simulations also illustrate the flexibility of kernel-based approaches to analyze complex biological profiles made of DNA copy number and allele B frequencies. An R package implementing the approach will be made available on github.
This manuscript makes two contributions to the field of change-point detection. In a general change-point setting, we provide a generic algorithm for aggregating local homogeneity tests into an estimator of change-points in a time series. Interestingly, we establish that the error rates of the collection of test directly translate into detection properties of the change-point estimator. This generic scheme is then applied to the problem of possibly sparse multivariate mean change-point detection setting. When the noise is Gaussian, we derive minimax optimal rates that are adaptive to the unknown sparsity and to the distance between change-points. For sub-Gaussian noise, we introduce a variant that is optimal in almost all sparsity regimes.
Vector Auto-Regressive (VAR) models capture lead-lag temporal dynamics of multivariate time series data. They have been widely used in macroeconomics, financial econometrics, neuroscience and functional genomics. In many applications, the data exhibit structural changes in their autoregressive dynamics, which correspond to changes in the transition matrices of the VAR model that specify such dynamics. We present the R package VARDetect that implements two classes of algorithms to detect multiple change points in piecewise stationary VAR models. The first exhibits sublinear computational complexity in the number of time points and is best suited for structured sparse models, while the second exhibits linear time complexity and is designed for models whose transition matrices are assumed to have a low rank plus sparse decomposition. The package also has functions to generate data from the various variants of VAR models discussed, which is useful in simulation studies, as well as to visualize the results through network layouts.
Structural changes occur in dynamic networks quite frequently and its detection is an important question in many situations such as fraud detection or cybersecurity. Real-life networks are often incompletely observed due to individual non-response or network size. In the present paper we consider the problem of change-point detection at a temporal sequence of partially observed networks. The goal is to test whether there is a change in the network parameters. Our approach is based on the Matrix CUSUM test statistic and allows growing size of networks. We show that the proposed test is minimax optimal and robust to missing links. We also demonstrate the good behavior of our approach in practice through simulation study and a real-data application.
In many modern applications, large-scale sensor networks are used to perform statistical inference tasks. In this paper, we propose Bayesian methods for multiple change-point detection using a sensor network in which a fusion center (FC) can receive a data stream from each sensor. Due to communication limitations, the FC monitors only a subset of the sensors at each time slot. Since the number of change points can be high, we adopt the false discovery rate (FDR) criterion for controlling the rate of false alarms, while minimizing the average detection delay (ADD). We propose two Bayesian detection procedures that handle the communication limitations by monitoring the subset of the sensors with the highest posterior probabilities of change points having occurred. This monitoring policy aims to minimize the delay between the occurrence of each change point and its declaration using the corresponding posterior probabilities. One of the proposed procedures is more conservative than the second one in terms of having lower FDR at the expense of higher ADD. It is analytically shown that both procedures control the FDR under a specified tolerated level and are also scalable in the sense that they attain an ADD that does not increase asymptotically with the number of sensors. In addition, it is demonstrated that the proposed detection procedures are useful for trading off between reduced ADD and reduced average number of observations drawn until discovery. Numerical simulations are conducted for validating the analytical results and for demonstrating the properties of the proposed procedures.
From a sequence of similarity networks, with edges representing certain similarity measures between nodes, we are interested in detecting a change-point which changes the statistical property of the networks. After the change, a subset of anomalous nodes which compares dissimilarly with the normal nodes. We study a simple sequential change detection procedure based on node-wise average similarity measures, and study its theoretical property. Simulation and real-data examples demonstrate such a simply stopping procedure has reasonably good performance. We further discuss the faulty sensor isolation (estimating anomalous nodes) using community detection.