No Arabic abstract
In many real-world problems of real-time monitoring high-dimensional streaming data, one wants to detect an undesired event or change quickly once it occurs, but under the sampling control constraint in the sense that one might be able to only observe or use selected components data for decision-making per time step in the resource-constrained environments. In this paper, we propose to incorporate multi-armed bandit approaches into sequential change-point detection to develop an efficient bandit change-point detection algorithm. Our proposed algorithm, termed Thompson-Sampling-Shiryaev-Roberts-Pollak (TSSRP), consists of two policies per time step: the adaptive sampling policy applies the Thompson Sampling algorithm to balance between exploration for acquiring long-term knowledge and exploitation for immediate reward gain, and the statistical decision policy fuses the local Shiryaev-Roberts-Pollak statistics to determine whether to raise a global alarm by sum shrinkage techniques. Extensive numerical simulations and case studies demonstrate the statistical and computational efficiency of our proposed TSSRP algorithm.
Robust real-time monitoring of high-dimensional data streams has many important real-world applications such as industrial quality control, signal detection, biosurveillance, but unfortunately it is highly non-trivial to develop efficient schemes due to two challenges: (1) the unknown sparse number or subset of affected data streams and (2) the uncertainty of model specification for high-dimensional data. In this article, motivated by the detection of smaller persistent changes in the presence of larger transient outliers, we develop a family of efficient real-time robust detection schemes for high-dimensional data streams through monitoring feature spaces such as PCA or wavelet coefficients when the feature coefficients are from Tukey-Hubers gross error models with outliers. We propose to construct a new local detection statistic for each feature called $L_{alpha}$-CUSUM statistic that can reduce the effect of outliers by using the Box-Cox transformation of the likelihood function, and then raise a global alarm based upon the sum of the soft-thresholding transformation of these local $L_{alpha}$-CUSUM statistics so that to filter out unaffected features. In addition, we propose a new concept called false alarm breakdown point to measure the robustness of online monitoring schemes, and also characterize the breakdown point of our proposed schemes. Asymptotic analysis, extensive numerical simulations and case study of nonlinear profile monitoring are conducted to illustrate the robustness and usefulness of our proposed schemes.
Structural breaks have been commonly seen in applications. Specifically for detection of change points in time, research gap still remains on the setting in ultra high dimension, where the covariates may bear spurious correlations. In this paper, we propose a two-stage approach to detect change points in ultra high dimension, by firstly proposing the dynamic titled current correlation screening method to reduce the input dimension, and then detecting possible change points in the framework of group variable selection. Not only the spurious correlation between ultra-high dimensional covariates is taken into consideration in variable screening, but non-convex penalties are studied in change point detection in the ultra high dimension. Asymptotic properties are derived to guarantee the asymptotic consistency of the selection procedure, and the numerical investigations show the promising performance of the proposed approach.
Topological Data Analysis (TDA) is a rapidly growing field, which studies methods for learning underlying topological structures present in complex data representations. TDA methods have found recent success in extracting useful geometric structures for a wide range of applications, including protein classification, neuroscience, and time-series analysis. However, in many such applications, one is also interested in sequentially detecting changes in this topological structure. We propose a new method called Persistence Diagram based Change-Point (PD-CP), which tackles this problem by integrating the widely-used persistence diagrams in TDA with recent developments in nonparametric change-point detection. The key novelty in PD-CP is that it leverages the distribution of points on persistence diagrams for online detection of topological changes. We demonstrate the effectiveness of PD-CP in an application to solar flare monitoring.
Because of the curse-of-dimensionality, high-dimensional processes present challenges to traditional multivariate statistical process monitoring (SPM) techniques. In addition, the unknown underlying distribution and complicated dependency among variables such as heteroscedasticity increase uncertainty of estimated parameters, and decrease the effectiveness of control charts. In addition, the requirement of sufficient reference samples limits the application of traditional charts in high dimension low sample size scenarios (small n, large p). More difficulties appear in detecting and diagnosing abnormal behaviors that are caused by a small set of variables, i.e., sparse changes. In this article, we propose a changepoint based control chart to detect sparse shifts in the mean vector of high-dimensional heteroscedastic processes. Our proposed method can start monitoring when the number of observations is a lot smaller than the dimensionality. The simulation results show its robustness to nonnormality and heteroscedasticity. A real data example is used to illustrate the effectiveness of the proposed control chart in high-dimensional applications. Supplementary material and code are provided online.
This manuscript makes two contributions to the field of change-point detection. In a general change-point setting, we provide a generic algorithm for aggregating local homogeneity tests into an estimator of change-points in a time series. Interestingly, we establish that the error rates of the collection of test directly translate into detection properties of the change-point estimator. This generic scheme is then applied to the problem of possibly sparse multivariate mean change-point detection setting. When the noise is Gaussian, we derive minimax optimal rates that are adaptive to the unknown sparsity and to the distance between change-points. For sub-Gaussian noise, we introduce a variant that is optimal in almost all sparsity regimes.