No Arabic abstract
The $DDalpha$-classifier, a nonparametric fast and very robust procedure, is described and applied to fifty classification problems regarding a broad spectrum of real-world data. The procedure first transforms the data from their original property space into a depth space, which is a low-dimensional unit cube, and then separates them by a projective invariant procedure, called $alpha$-procedure. To each data point the transformation assigns its depth values with respect to the given classes. Several alternative depth notions (spatial depth, Mahalanobis depth, projection depth, and Tukey depth, the latter two being approximated by univariate projections) are used in the procedure, and compared regarding their average error rates. With the Tukey depth, which fits the distributions shape best and is most robust, `outsiders, that is data points having zero depth in all classes, need an additional treatment for classification. Evidence is also given about the dimension of the extended feature space needed for linear separation. The $DDalpha$-procedure is available as an R-package.
Early detection of changes in the frequency of events is an important task, in, for example, disease surveillance, monitoring of high-quality processes, reliability monitoring and public health. In this article, we focus on detecting changes in multivariate event data, by monitoring the time-between-events (TBE). Existing multivariate TBE charts are limited in the sense that, they only signal after an event occurred for each of the individual processes. This results in delays (i.e., long time to signal), especially if it is of interest to detect a change in one or a few of the processes. We propose a bivariate TBE (BTBE) chart which is able to signal in real time. We derive analytical expressions for the control limits and average time-to-signal performance, conduct a performance evaluation and compare our chart to an existing method. The findings showed that our method is a realistic approach to monitor bivariate time-between-event data, and has better detection ability than existing methods. A large benefit of our method is that it signals in real-time and that due to the analytical expressions no simulation is needed. The proposed method is implemented on a real-life dataset related to AIDS.
We develop a distribution-free, unsupervised anomaly detection method called ECAD, which wraps around any regression algorithm and sequentially detects anomalies. Rooted in conformal prediction, ECAD does not require data exchangeability but approximately controls the Type-I error when data are normal. Computationally, it involves no data-splitting and efficiently trains ensemble predictors to increase statistical power. We demonstrate the superior performance of ECAD on detecting anomalous spatio-temporal traffic flow.
Background: All-in-one station-based health monitoring devices are implemented in elder homes in Hong Kong to support the monitoring of vital signs of the elderly. During a pilot study, it was discovered that the systolic blood pressure was incorrectly measured during multiple weeks. A real-time solution was needed to identify future data quality issues as soon as possible. Methods: Control charts are an effective tool for real-time monitoring and signaling issues (changes) in data. In this study, as in other healthcare applications, many observations are missing. Few methods are available for monitoring data with missing observations. A data quality monitoring method is developed to signal issues with the accuracy of the collected data quickly. This method has the ability to deal with missing observations. A Hotellings T-squared control chart is selected as the basis for our proposed method. Findings: The proposed method is retrospectively validated on a case study with a known measurement error in the systolic blood pressure measurements. The method is able to adequately detect this data quality problem. The proposed method was integrated into a personalized telehealth monitoring system and prospectively implemented in a second case study. It was found that the proposed scheme supports the control of data quality. Conclusions: Data quality is an important issue and control charts are useful for real-time monitoring of data quality. However, these charts must be adjusted to account for missing data that often occur in healthcare context.
We propose a versatile joint regression framework for count responses. The method is implemented in the R add-on package GJRM and allows for modelling linear and non-linear dependence through the use of several copulae. Moreover, the parameters of the marginal distributions of the count responses and of the copula can be specified as flexible functions of covariates. Motivated by a football application, we also discuss an extension which forces the regression coefficients of the marginal (linear) predictors to be equal via a suitable penalisation. Model fitting is based on a trust region algorithm which estimates simultaneously all the parameters of the joint models. We investigate the proposals empirical performance in two simulation studies, the first one designed for arbitrary count data, the other one reflecting football-specific settings. Finally, the method is applied to FIFA World Cup data, showing its competitiveness to the standard approach with regard to predictive performance.
One of the classic concerns in statistics is determining if two samples come from thesame population, i.e. homogeneity testing. In this paper, we propose a homogeneitytest in the context of Functional Data Analysis, adopting an idea from multivariatedata analysis: the data depth plot (DD-plot). This DD-plot is a generalization of theunivariate Q-Q plot (quantile-quantile plot). We propose some statistics based onthese DD-plots, and we use bootstrapping techniques to estimate their distributions.We estimate the finite-sample size and power of our test via simulation, obtainingbetter results than other homogeneity test proposed in the literature. Finally, weillustrate the procedure in samples of real heterogeneous data and get consistent results.