Do you want to publish a course? Click here

Rademacher complexity for Markov chains : Applications to kernel smoothing and Metropolis-Hasting

228   0   0.0 ( 0 )
 Publication date 2018
and research's language is English




Ask ChatGPT about the research

Following the seminal approach by Talagrand, the concept of Rademacher complexity for independent sequences of random variables is extended to Markov chains. The proposed notion of block Rademacher complexity (of a class of functions) follows from renewal theory and allows to control the expected values of suprema (over the class of functions) of empirical processes based on Harris Markov chains as well as the excess probability. For classes of Vapnik-Chervonenkis type, bounds on the block Rademacher complexity are established. These bounds depend essentially on the sample size and the probability tails of the regeneration times. The proposed approach is employed to obtain convergence rates for the kernel density estimator of the stationary measure and to derive concentration inequalities for the Metropolis-Hasting algorithm.



rate research

Read More

We extend Hoeffdings lemma to general-state-space and not necessarily reversible Markov chains. Let ${X_i}_{i ge 1}$ be a stationary Markov chain with invariant measure $pi$ and absolute spectral gap $1-lambda$, where $lambda$ is defined as the operator norm of the transition kernel acting on mean zero and square-integrable functions with respect to $pi$. Then, for any bounded functions $f_i: x mapsto [a_i,b_i]$, the sum of $f_i(X_i)$ is sub-Gaussian with variance proxy $frac{1+lambda}{1-lambda} cdot sum_i frac{(b_i-a_i)^2}{4}$. This result differs from the classical Hoeffdings lemma by a multiplicative coefficient of $(1+lambda)/(1-lambda)$, and simplifies to the latter when $lambda = 0$. The counterpart of Hoeffdings inequality for Markov chains immediately follows. Our results assume none of countable state space, reversibility and time-homogeneity of Markov chains and cover time-dependent functions with various ranges. We illustrate the utility of these results by applying them to six problems in statistics and machine learning.
Markov chain Monte Carlo (MCMC) produces a correlated sample for estimating expectations with respect to a target distribution. A fundamental question is when should sampling stop so that we have good estimates of the desired quantities? The key to answering this question lies in assessing the Monte Carlo error through a multivariate Markov chain central limit theorem (CLT). The multivariate nature of this Monte Carlo error largely has been ignored in the MCMC literature. We present a multivariate framework for terminating simulation in MCMC. We define a multivariate effective sample size, estimating which requires strongly consistent estimators of the covariance matrix in the Markov chain CLT; a property we show for the multivariate batch means estimator. We then provide a lower bound on the number of minimum effective samples required for a desired level of precision. This lower bound depends on the problem only in the dimension of the expectation being estimated, and not on the underlying stochastic process. This result is obtained by drawing a connection between terminating simulation via effective sample size and terminating simulation using a relative standard deviation fixed-volume sequential stopping rule; which we demonstrate is an asymptotically valid procedure. The finite sample properties of the proposed method are demonstrated in a variety of examples.
69 - Vivekananda Roy , Aixin Tan , 2015
The naive importance sampling estimator, based on samples from a single importance density, can be numerically unstable. Instead, we consider generalized importance sampling estimators where samples from more than one probability distribution are combined. We study this problem in the Markov chain Monte Carlo context, where independent samples are replaced with Markov chain samples. If the chains converge to their respective target distributions at a polynomial rate, then under two finite moment conditions, we show a central limit theorem holds for the generalized estimators. Further, we develop an easy to implement method to calculate valid asymptotic standard errors based on batch means. We also provide a batch means estimator for calculating asymptotically valid standard errors of Geyer(1994) reverse logistic estimator. We illustrate the method using a Bayesian variable selection procedure in linear regression. In particular, the generalized importance sampling estimator is used to perform empirical Bayes variable selection and the batch means estimator is used to obtain standard errors in a high-dimensional setting where current methods are not applicable.
This paper proposes a family of weighted batch means variance estimators, which are computationally efficient and can be conveniently applied in practice. The focus is on Markov chain Monte Carlo simulations and estimation of the asymptotic covariance matrix in the Markov chain central limit theorem, where conditions ensuring strong consistency are provided. Finite sample performance is evaluated through auto-regressive, Bayesian spatial-temporal, and Bayesian logistic regression examples, where the new estimators show significant computational gains with a minor sacrifice in variance compared with existing methods.
106 - Quan Zhou , Hyunwoong Chang 2021
We consider MCMC methods for learning equivalence classes of sparse Gaussian DAG models when $p = e^{o(n)}$. The main contribution of this work is a rapid mixing result for a random walk Metropolis-Hastings algorithm, which we prove using a canonical path method. It reveals that the complexity of Bayesian learning of sparse equivalence classes grows only polynomially in $n$ and $p$, under some common high-dimensional assumptions. Further, a series of high-dimensional consistency results is obtained by the path method, including the strong selection consistency of an empirical Bayes model for structure learning and the consistency of a greedy local search on the restricted search space. Rapid mixing and slow mixing results for other structure-learning MCMC methods are also derived. Our path method and mixing time results yield crucial insights into the computational aspects of high-dimensional structure learning, which may be used to develop more efficient MCMC algorithms.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا