No Arabic abstract
Recent years have seen a huge development in spatial modelling and prediction methodology, driven by the increased availability of remote-sensing data and the reduced cost of distributed-processing technology. It is well known that modelling and prediction using infinite-dimensional process models is not possible with large data sets, and that both approximate models and, often, approximate-inference methods, are needed. The problem of fitting simple global spatial models to large data sets has been solved through the likes of multi-resolution approximations and nearest-neighbour techniques. Here we tackle the next challenge, that of fitting complex, nonstationary, multi-scale models to large data sets. We propose doing this through the use of superpositions of spatial processes with increasing spatial scale and increasing degrees of nonstationarity. Computation is facilitated through the use of Gaussian Markov random fields and parallel Markov chain Monte Carlo based on graph colouring. The resulting model allows for both distributed computing and distributed data. Importantly, it provides opportunities for genuine model and data scaleability and yet is still able to borrow strength across large spatial scales. We illustrate a two-scale version on a data set of sea-surface temperature containing on the order of one million observations, and compare our approach to state-of-the-art spatial modelling and prediction methods.
In recent years dynamical modelling has been provided with a range of breakthrough methods to perform exact Bayesian inference. However it is often computationally unfeasible to apply exact statistical methodologies in the context of large datasets and complex models. This paper considers a nonlinear stochastic differential equation model observed with correlated measurement errors and an application to protein folding modelling. An Approximate Bayesian Computation (ABC) MCMC algorithm is suggested to allow inference for model parameters within reasonable time constraints. The ABC algorithm uses simulations of subsamples from the assumed data generating model as well as a so-called early rejection strategy to speed up computations in the ABC-MCMC sampler. Using a considerate amount of subsamples does not seem to degrade the quality of the inferential results for the considered applications. A simulation study is conducted to compare our strategy with exact Bayesian inference, the latter resulting two orders of magnitude slower than ABC-MCMC for the considered setup. Finally the ABC algorithm is applied to a large size protein data. The suggested methodology is fairly general and not limited to the exemplified model and data.
Bayes additive regression trees(BART) is a nonparametric regression model which has gained wide -spread popularity in recent years due to its flexibility and high accuracy of estimation .In spatio-temporal related model,the spatio or temporal variables are playing an important role in the model.The BART models select variables with uniform prior distribution that means treat every variable equally.Applying the BART model directly without properly using these prior information is not appropriate.This paper is aimed at a modification to the BART by fixing part of the trees structure.We call this model partially fixed BART.By this new model we can improve efficiency of estimation.When we dont know the prior information,we can still use the new model to get more accurate estimation and more structure information for future use.Data experiments and real data examples show the improvement comparing to the original Bart model.
Many modern statistical applications involve inference for complicated stochastic models for which the likelihood function is difficult or even impossible to calculate, and hence conventional likelihood-based inferential echniques cannot be used. In such settings, Bayesian inference can be performed using Approximate Bayesian Computation (ABC). However, in spite of many recent developments to ABC methodology, in many applications the computational cost of ABC necessitates the choice of summary statistics and tolerances that can potentially severely bias the estimate of the posterior. We propose a new piecewise ABC approach suitable for discretely observed Markov models that involves writing the posterior density of the parameters as a product of factors, each a function of only a subset of the data, and then using ABC within each factor. The approach has the advantage of side-stepping the need to choose a summary statistic and it enables a stringent tolerance to be set, making the posterior less approximate. We investigate two methods for estimating the posterior density based on ABC samples for each of the factors: the first is to use a Gaussian approximation for each factor, and the second is to use a kernel density estimate. Both methods have their merits. The Gaussian approximation is simple, fast, and probably adequate for many applications. On the other hand, using instead a kernel density estimate has the benefit of consistently estimating the true ABC posterior as the number of ABC samples tends to infinity. We illustrate the piecewise ABC approach for three examples; in each case, the approach enables exact matching between simulations and data and offers fast and accurate inference.
Let $V$ be a finite set of indices, and let $B_i$, $i=1,ldots,m$, be subsets of $V$ such that $V=bigcup_{i=1}^{m}B_i$. Let $X_i$, $iin V$, be independent random variables, and let $X_{B_i}=(X_j)_{jin B_i}$. In this paper, we propose a recursive computation method to calculate the conditional expectation $Ebigl[prod_{i=1}^mchi_i(X_{B_i}) ,|, Nbigr]$ with $N=sum_{iin V}X_i$ given, where $chi_i$ is an arbitrary function. Our method is based on the recursive summation/integration technique using the Markov property in statistics. To extract the Markov property, we define an undirected graph whose cliques are $B_j$, and obtain its chordal extension, from which we present the expressions of the recursive formula. This methodology works for a class of distributions including the Poisson distribution (that is, the conditional distribution is the multinomial). This problem is motivated from the evaluation of the multiplicity-adjusted $p$-value of scan statistics in spatial epidemiology. As an illustration of the approach, we present the real data analyses to detect temporal and spatial clustering.
Process data refer to data recorded in the log files of computer-based items. These data, represented as timestamped action sequences, keep track of respondents response processes of solving the items. Process data analysis aims at enhancing educational assessment accuracy and serving other assessment purposes by utilizing the rich information contained in response processes. The R package ProcData presented in this article is designed to provide tools for processing, describing, and analyzing process data. We define an S3 class proc for organizing process data and extend generic methods summary and print for class proc. Two feature extraction methods for process data are implemented in the package for compressing information in the irregular response processes into regular numeric vectors. ProcData also provides functions for fitting and making predictions from a neural-network-based sequence model. These functions call relevant functions in package keras for constructing and training neural networks. In addition, several response process generators and a real dataset of response processes of the climate control item in the 2012 Programme for International Student Assessment are included in the package.