ترغب بنشر مسار تعليمي؟ اضغط هنا

High-dimensional, multiscale online changepoint detection

137   0   0.0 ( 0 )
 نشر من قبل Richard Samworth
 تاريخ النشر 2020
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

We introduce a new method for high-dimensional, online changepoint detection in settings where a $p$-variate Gaussian data stream may undergo a change in mean. The procedure works by performing likelihood ratio tests against simple alternatives of different scales in each coordinate, and then aggregating test statistics across scales and coordinates. The algorithm is online in the sense that both its storage requirements and worst-case computational complexity per new observation are independent of the number of previous observations; in practice, it may even be significantly faster than this. We prove that the patience, or average run length under the null, of our procedure is at least at the desired nominal level, and provide guarantees on its response delay under the alternative that depend on the sparsity of the vector of mean change. Simulations confirm the practical effectiveness of our proposal, which is implemented in the R package ocd, and we also demonstrate its utility on a seismology data set.



قيم البحث

اقرأ أيضاً

We propose a new method for changepoint estimation in partially-observed, high-dimensional time series that undergo a simultaneous change in mean in a sparse subset of coordinates. Our first methodological contribution is to introduce a MissCUSUM tra nsformation (a generalisation of the popular Cumulative Sum statistics), that captures the interaction between the signal strength and the level of missingness in each coordinate. In order to borrow strength across the coordinates, we propose to project these MissCUSUM statistics along a direction found as the solution to a penalised optimisation problem tailored to the specific sparsity structure. The changepoint can then be estimated as the location of the peak of the absolute value of the projected univariate series. In a model that allows different missingness probabilities in different component series, we identify that the key interaction between the missingness and the signal is a weighted sum of squares of the signal change in each coordinate, with weights given by the observation probabilities. More specifically, we prove that the angle between the estimated and oracle projection directions, as well as the changepoint location error, are controlled with high probability by the sum of two terms, both involving this weighted sum of squares, and representing the error incurred due to noise and the error due to missingness respectively. A lower bound confirms that our changepoint estimator, which we call MissInspect, is optimal up to a logarithmic factor. The striking effectiveness of the MissInspect methodology is further demonstrated both on simulated data, and on an oceanographic data set covering the Neogene period.
The analysis of record-breaking events is of interest in fields such as climatology, hydrology, economy or sports. In connection with the record occurrence, we propose three distribution-free statistics for the changepoint detection problem. They are CUSUM-type statistics based on the upper and/or lower record indicators which occur in a series. Using a version of the functional central limit theorem, we show that the CUSUM-type statistics are asymptotically Kolmogorov distributed. The main results under the null hypothesis are based on series of independent and identically distributed random variables, but a statistic to deal with series with seasonal component and serial correlation is also proposed. A Monte Carlo study of size, power and changepoint estimate has been performed. Finally, the methods are illustrated by analyzing the time series of temperatures at Madrid, Spain. The $textsf{R}$ package $texttt{RecordTest}$ publicly available on CRAN implements the proposed methods.
Structural breaks have been commonly seen in applications. Specifically for detection of change points in time, research gap still remains on the setting in ultra high dimension, where the covariates may bear spurious correlations. In this paper, we propose a two-stage approach to detect change points in ultra high dimension, by firstly proposing the dynamic titled current correlation screening method to reduce the input dimension, and then detecting possible change points in the framework of group variable selection. Not only the spurious correlation between ultra-high dimensional covariates is taken into consideration in variable screening, but non-convex penalties are studied in change point detection in the ultra high dimension. Asymptotic properties are derived to guarantee the asymptotic consistency of the selection procedure, and the numerical investigations show the promising performance of the proposed approach.
228 - Luc Pronzato , HaiYing Wang 2020
We consider a design problem where experimental conditions (design points $X_i$) are presented in the form of a sequence of i.i.d. random variables, generated with an unknown probability measure $mu$, and only a given proportion $alphain(0,1)$ can be selected. The objective is to select good candidates $X_i$ on the fly and maximize a concave function $Phi$ of the corresponding information matrix. The optimal solution corresponds to the construction of an optimal bounded design measure $xi_alpha^*leq mu/alpha$, with the difficulty that $mu$ is unknown and $xi_alpha^*$ must be constructed online. The construction proposed relies on the definition of a threshold $tau$ on the directional derivative of $Phi$ at the current information matrix, the value of $tau$ being fixed by a certain quantile of the distribution of this directional derivative. Combination with recursive quantile estimation yields a nonlinear two-time-scale stochastic approximation method. It can be applied to very long design sequences since only the current information matrix and estimated quantile need to be stored. Convergence to an optimum design is proved. Various illustrative examples are presented.
While there have been a lot of recent developments in the context of Bayesian model selection and variable selection for high dimensional linear models, there is not much work in the presence of change point in literature, unlike the frequentist coun terpart. We consider a hierarchical Bayesian linear model where the active set of covariates that affects the observations through a mean model can vary between different time segments. Such structure may arise in social sciences/ economic sciences, such as sudden change of house price based on external economic factor, crime rate changes based on social and built-environment factors, and others. Using an appropriate adaptive prior, we outline the development of a hierarchical Bayesian methodology that can select the true change point as well as the true covariates, with high probability. We provide the first detailed theoretical analysis for posterior consistency with or without covariates, under suitable conditions. Gibbs sampling techniques provide an efficient computational strategy. We also consider small sample simulation study as well as application to crime forecasting applications.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا