ترغب بنشر مسار تعليمي؟ اضغط هنا

Bayesian Context Trees: Modelling and exact inference for discrete time series

90   0   0.0 ( 0 )
 نشر من قبل Ioannis Kontoyiannis
 تاريخ النشر 2020
والبحث باللغة English




اسأل ChatGPT حول البحث

We develop a new Bayesian modelling framework for the class of higher-order, variable-memory Markov chains, and introduce an associated collection of methodological tools for exact inference with discrete time series. We show that a version of the context tree weighting algorithm can compute the prior predictive likelihood exactly (averaged over both models and parameters), and two related algorithms are introduced, which identify the a posteriori most likely models and compute their exact posterior probabilities. All three algorithms are deterministic and have linear-time complexity. A family of variable-dimension Markov chain Monte Carlo samplers is also provided, facilitating further exploration of the posterior. The performance of the proposed methods in model selection, Markov order estimation and prediction is illustrated through simulation experiments and real-world applications with data from finance, genetics, neuroscience, and animal communication. The associated algorithms are implemented in the R package BCT.



قيم البحث

اقرأ أيضاً

A general Bayesian framework is introduced for mixture modelling and inference with real-valued time series. At the top level, the state space is partitioned via the choice of a discrete context tree, so that the resulting partition depends on the va lues of some of the most recent samples. At the bottom level, a different model is associated with each region of the partition. This defines a very rich and flexible class of mixture models, for which we provide algorithms that allow for efficient, exact Bayesian inference. In particular, we show that the maximum a posteriori probability (MAP) model (including the relevant MAP context tree partition) can be precisely identified, along with its exact posterior probability. The utility of this general framework is illustrated in detail when a different autoregressive (AR) model is used in each state-space region, resulting in a mixture-of-AR model class. The performance of the associated algorithmic tools is demonstrated in the problems of model selection and forecasting on both simulated and real-world data, where they are found to provide results as good or better than state-of-the-art methods.
210 - G. Morvai , B. Weiss 2007
Let ${X_n}$ be a stationary and ergodic time series taking values from a finite or countably infinite set ${cal X}$. Assume that the distribution of the process is otherwise unknown. We propose a sequence of stopping times $lambda_n$ along which we w ill be able to estimate the conditional probability $P(X_{lambda_n+1}=x|X_0,...,X_{lambda_n})$ from data segment $(X_0,...,X_{lambda_n})$ in a pointwise consistent way for a restricted class of stationary and ergodic finite or countably infinite alphabet time series which includes among others all stationary and ergodic finitarily Markovian processes. If the stationary and ergodic process turns out to be finitarily Markovian (among others, all stationary and ergodic Markov chains are included in this class) then $ lim_{nto infty} {nover lambda_n}>0$ almost surely. If the stationary and ergodic process turns out to possess finite entropy rate then $lambda_n$ is upperbounded by a polynomial, eventually almost surely.
342 - G. Morvai , S. Yakowitz , 2007
The setting is a stationary, ergodic time series. The challenge is to construct a sequence of functions, each based on only finite segments of the past, which together provide a strongly consistent estimator for the conditional probability of the nex t observation, given the infinite past. Ornstein gave such a construction for the case that the values are from a finite set, and recently Algoet extended the scheme to time series with coordinates in a Polish space. The present study relates a different solution to the challenge. The algorithm is simple and its verification is fairly transparent. Some extensions to regression, pattern recognition, and on-line forecasting are mentioned.
151 - G. Morvai , B. Weiss 2008
The problem of extracting as much information as possible from a sequence of observations of a stationary stochastic process $X_0,X_1,...X_n$ has been considered by many authors from different points of view. It has long been known through the work o f D. Bailey that no universal estimator for $textbf{P}(X_{n+1}|X_0,X_1,...X_n)$ can be found which converges to the true estimator almost surely. Despite this result, for restricted classes of processes, or for sequences of estimators along stopping times, universal estimators can be found. We present here a survey of some of the recent work that has been done along these lines.
Inferring linear dependence between time series is central to our understanding of natural and artificial systems. Unfortunately, the hypothesis tests that are used to determine statistically significant directed or multivariate relationships from ti me-series data often yield spurious associations (Type I errors) or omit causal relationships (Type II errors). This is due to the autocorrelation present in the analysed time series -- a property that is ubiquitous across diverse applications, from brain dynamics to climate change. Here we show that, for limited data, this issue cannot be mediated by fitting a time-series model alone (e.g., in Granger causality or prewhitening approaches), and instead that the degrees of freedom in statistical tests should be altered to account for the effective sample size induced by cross-correlations in the observations. This insight enabled us to derive modified hypothesis tests for any multivariate correlation-based measures of linear dependence between covariance-stationary time series, including Granger causality and mutual information with Gaussian marginals. We use both numerical simulations (generated by autoregressive models and digital filtering) as well as recorded fMRI-neuroimaging data to show that our tests are unbiased for a variety of stationary time series. Our experiments demonstrate that the commonly used $F$- and $chi^2$-tests can induce significant false-positive rates of up to $100%$ for both measures, with and without prewhitening of the signals. These findings suggest that many dependencies reported in the scientific literature may have been, and may continue to be, spuriously reported or missed if modified hypothesis tests are not used when analysing time series.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا