Structural changes occur in dynamic networks quite frequently and its detection is an important question in many situations such as fraud detection or cybersecurity. Real-life networks are often incompletely observed due to individual non-response or network size. In the present paper we consider the problem of change-point detection at a temporal sequence of partially observed networks. The goal is to test whether there is a change in the network parameters. Our approach is based on the Matrix CUSUM test statistic and allows growing size of networks. We show that the proposed test is minimax optimal and robust to missing links. We also demonstrate the good behavior of our approach in practice through simulation study and a real-data application.
From a sequence of similarity networks, with edges representing certain similarity measures between nodes, we are interested in detecting a change-point which changes the statistical property of the networks. After the change, a subset of anomalous nodes which compares dissimilarly with the normal nodes. We study a simple sequential change detection procedure based on node-wise average similarity measures, and study its theoretical property. Simulation and real-data examples demonstrate such a simply stopping procedure has reasonably good performance. We further discuss the faulty sensor isolation (estimating anomalous nodes) using community detection.
This manuscript makes two contributions to the field of change-point detection. In a general change-point setting, we provide a generic algorithm for aggregating local homogeneity tests into an estimator of change-points in a time series. Interestingly, we establish that the error rates of the collection of test directly translate into detection properties of the change-point estimator. This generic scheme is then applied to the problem of possibly sparse multivariate mean change-point detection setting. When the noise is Gaussian, we derive minimax optimal rates that are adaptive to the unknown sparsity and to the distance between change-points. For sub-Gaussian noise, we introduce a variant that is optimal in almost all sparsity regimes.
Outliers arise in networks due to different reasons such as fraudulent behavior of malicious users or default in measurement instruments and can significantly impair network analyses. In addition, real-life networks are likely to be incompletely observed, with missing links due to individual non-response or machine failures. Identifying outliers in the presence of missing links is therefore a crucial problem in network analysis. In this work, we introduce a new algorithm to detect outliers in a network that simultaneously predicts the missing links. The proposed method is statistically sound: we prove that, under fairly general assumptions, our algorithm exactly detects the outliers, and achieves the best known error for the prediction of missing links with polynomial computation cost. It is also computationally efficient: we prove sub-linear convergence of our algorithm. We provide a simulation study which demonstrates the good behavior of the algorithm in terms of outliers detection and prediction of the missing links. We also illustrate the method with an application in epidemiology, and with the analysis of a political Twitter network. The method is freely available as an R package on the Comprehensive R Archive Network.
Estimating the matrix of connections probabilities is one of the key questions when studying sparse networks. In this work, we consider networks generated under the sparse graphon model and the in-homogeneous random graph model with missing observations. Using the Stochastic Block Model as a parametric proxy, we bound the risk of the maximum likelihood estimator of network connections probabilities , and show that it is minimax optimal. When risk is measured in Frobenius norm, no estimator running in polynomial time has been shown to attain the minimax optimal rate of convergence for this problem. Thus, maximum likelihood estimation is of particular interest as computationally efficient approximations to it have been proposed in the literature and are often used in practice.
We consider a high-dimensional regression model with a possible change-point due to a covariate threshold and develop the Lasso estimator of regression coefficients as well as the threshold parameter. Our Lasso estimator not only selects covariates but also selects a model between linear and threshold regression models. Under a sparsity assumption, we derive non-asymptotic oracle inequalities for both the prediction risk and the $ell_1$ estimation loss for regression coefficients. Since the Lasso estimator selects variables simultaneously, we show that oracle inequalities can be established without pretesting the existence of the threshold effect. Furthermore, we establish conditions under which the estimation error of the unknown threshold parameter can be bounded by a nearly $n^{-1}$ factor even when the number of regressors can be much larger than the sample size ($n$). We illustrate the usefulness of our proposed estimation method via Monte Carlo simulations and an application to real data.