No Arabic abstract
We prove several results concerning classifications, based on successive observations $(X_1,..., X_n)$ of an unknown stationary and ergodic process, for membership in a given class of processes, such as the class of all finite order Markov chains.
Finitarily Markovian processes are those processes ${X_n}_{n=-infty}^{infty}$ for which there is a finite $K$ ($K = K({X_n}_{n=-infty}^0$) such that the conditional distribution of $X_1$ given the entire past is equal to the conditional distribution of $X_1$ given only ${X_n}_{n=1-K}^0$. The least such value of $K$ is called the memory length. We give a rather complete analysis of the problems of universally estimating the least such value of $K$, both in the backward sense that we have just described and in the forward sense, where one observes successive values of ${X_n}$ for $n geq 0$ and asks for the least value $K$ such that the conditional distribution of $X_{n+1}$ given ${X_i}_{i=n-K+1}^n$ is the same as the conditional distribution of $X_{n+1}$ given ${X_i}_{i=-infty}^n$. We allow for finite or countably infinite alphabet size.
This paper introduces the concept of random context representations for the transition probabilities of a finite-alphabet stochastic process. Processes with these representations generalize context tree processes (a.k.a. variable length Markov chains), and are proven to coincide with processes whose transition probabilities are almost surely continuous functions of the (infinite) past. This is similar to a classical result by Kalikow about continuous transition probabilities. Existence and uniqueness of a minimal random context representation are proven, and an estimator of the transition probabilities based on this representation is shown to have very good pastwise adaptativity properties. In particular, it achieves minimax performance, up to logarithmic factors, for binary renewal processes with bounded $2+gamma$ moments.
Bailey showed that the general pointwise forecasting for stationary and ergodic time series has a negative solution. However, it is known that for Markov chains the problem can be solved. Morvai showed that there is a stopping time sequence ${lambda_n}$ such that $P(X_{lambda_n+1}=1|X_0,...,X_{lambda_n}) $ can be estimated from samples $(X_0,...,X_{lambda_n})$ such that the difference between the conditional probability and the estimate vanishes along these stoppping times for all stationary and ergodic binary time series. We will show it is not possible to estimate the above conditional probability along a stopping time sequence for all stationary and ergodic binary time series in a pointwise sense such that if the time series turns out to be a Markov chain, the predictor will predict eventually for all $n$.
The problem of extracting as much information as possible from a sequence of observations of a stationary stochastic process $X_0,X_1,...X_n$ has been considered by many authors from different points of view. It has long been known through the work of D. Bailey that no universal estimator for $textbf{P}(X_{n+1}|X_0,X_1,...X_n)$ can be found which converges to the true estimator almost surely. Despite this result, for restricted classes of processes, or for sequences of estimators along stopping times, universal estimators can be found. We present here a survey of some of the recent work that has been done along these lines.
This study concerns problems of time-series forecasting under the weakest of assumptions. Related results are surveyed and are points of departure for the developments here, some of which are new and others are new derivations of previous findings. The contributions in this study are all negative, showing that various plausible prediction problems are unsolvable, or in other cases, are not solvable by predictors which are known to be consistent when mixing conditions hold.