No Arabic abstract
We give the distribution of $M_n$, the maximum of a sequence of $n$ observations from a moving average of order 1. Solutions are first given in terms of repeated integrals and then for the case where the underlying independent random variables are discrete. When the correlation is positive, $$ P(M_n max^n_{i=1} X_i leq x) = sum_{j=1}^infty beta_{jx} u_{jx}^{n} approx B_{x} r{1x}^{n} $$ where ${ u_{jx}}$ are the eigenvalues of a certain matrix, $r_{1x}$ is the maximum magnitude of the eigenvalues, and $I$ depends on the number of possible values of the underlying random variables. The eigenvalues do not depend on $x$ only on its range.
We give the distribution of $M_n$, the maximum of a sequence of $n$ observations from a moving average of order 1. Solutions are first given in terms of repeated integrals and then for the case where the underlying independent random variables have an absolutely continuous density. When the correlation is positive, $$ P(M_n %max^n_{i=1} X_i leq x) = sum_{j=1}^infty beta_{jx} u_{jx}^{n} approx B_{x} u_{1x}^{n} $$ where %${X_i}$ is a moving average of order 1 with positive correlation, and ${ u_{jx}}$ are the eigenvalues (singular values) of a Fredholm kernel and $ u_{1x}$ is the eigenvalue of maximum magnitude. A similar result is given when the correlation is negative. The result is analogous to large deviations expansions for estimates, since the maximum need not be standardized to have a limit. % there are more terms, and $$P(M_n <x) approx B_{x} (1+ u_{1x})^n.$$ For the continuous case the integral equations for the left and right eigenfunctions are converted to first order linear differential equations. The eigenvalues satisfy an equation of the form $$sum_{i=1}^infty w_i(lambda-theta_i)^{-1}=lambda-theta_0$$ for certain known weights ${w_i}$ and eigenvalues ${theta_i}$ of a given matrix. This can be solved by truncating the sum to an increasing number of terms.
We argue against the use of generally weighted moving average (GWMA) control charts. Our primary reasons are the following: 1) There is no recursive formula for the GWMA control chart statistic, so all previous data must be stored and used in the calculation of each chart statistic. 2) The Markovian property does not apply to the GWMA statistics, so computer simulation must be used to determine control limits and the statistical performance. 3) An appropriately designed, and much simpler, exponentially weighted moving average (EWMA) chart provides as good or better statistical performance. 4) In some cases the GWMA chart gives more weight to past data values than to current values.
The analysis of record-breaking events is of interest in fields such as climatology, hydrology, economy or sports. In connection with the record occurrence, we propose three distribution-free statistics for the changepoint detection problem. They are CUSUM-type statistics based on the upper and/or lower record indicators which occur in a series. Using a version of the functional central limit theorem, we show that the CUSUM-type statistics are asymptotically Kolmogorov distributed. The main results under the null hypothesis are based on series of independent and identically distributed random variables, but a statistic to deal with series with seasonal component and serial correlation is also proposed. A Monte Carlo study of size, power and changepoint estimate has been performed. Finally, the methods are illustrated by analyzing the time series of temperatures at Madrid, Spain. The $textsf{R}$ package $texttt{RecordTest}$ publicly available on CRAN implements the proposed methods.
We consider the problem of computing the joint distribution of order statistics of stochastically independent random variables in one- and two-group models. While recursive formulas for evaluating the joint cumulative distribution function of such order statistics exist in the literature for a longer time, their numerical implementation remains a challenging task. We tackle this task by presenting novel generalizations of known recursions which we utilize to obtain exact results (calculated in rational arithmetic) as well as faithfully rounded results. Finally, some applications in stepwise multiple hypothesis testing are discussed.
Labeling patients in electronic health records with respect to their statuses of having a disease or condition, i.e. case or control statuses, has increasingly relied on prediction models using high-dimensional variables derived from structured and unstructured electronic health record data. A major hurdle currently is a lack of valid statistical inference methods for the case probability. In this paper, considering high-dimensional sparse logistic regression models for prediction, we propose a novel bias-corrected estimator for the case probability through the development of linearization and variance enhancement techniques. We establish asymptotic normality of the proposed estimator for any loading vector in high dimensions. We construct a confidence interval for the case probability and propose a hypothesis testing procedure for patient case-control labelling. We demonstrate the proposed method via extensive simulation studies and application to real-world electronic health record data.