No Arabic abstract
We present a new Bayesian nonparametric approach to estimating the spectral density of a stationary time series. A nonparametric prior based on a mixture of B-spline distributions is specified and can be regarded as a generalization of the Bernstein polynomial prior of Petrone (1999a,b) and Choudhuri et al. (2004). Whittles likelihood approximation is used to obtain the pseudo-posterior distribution. This method allows for a data-driven choice of the number of mixture components and the location of knots. Posterior samples are obtained using a Metropolis-within-Gibbs Markov chain Monte Carlo algorithm, and mixing is improved using parallel tempering. We conduct a simulation study to demonstrate that for complicated spectral densities, the B-spline prior provides more accurate Monte Carlo estimates in terms of $L_1$-error and uniform coverage probabilities than the Bernstein polynomial prior. We apply the algorithm to annual mean sunspot data to estimate the solar cycle. Finally, we demonstrate the algorithms ability to estimate a spectral density with sharp features, using real gravitational wave detector data from LIGOs sixth science run, recoloured to match the Advanced LIGO target sensitivity.
Mutual information is a widely-used information theoretic measure to quantify the amount of association between variables. It is used extensively in many applications such as image registration, diagnosis of failures in electrical machines, pattern recognition, data mining and tests of independence. The main goal of this paper is to provide an efficient estimator of the mutual information based on the approach of Al Labadi et. al. (2021). The estimator is explored through various examples and is compared to its frequentist counterpart due to Berrett et al. (2019). The results show the good performance of the procedure by having a smaller mean squared error.
Mixture models are regularly used in density estimation applications, but the problem of estimating the mixing distribution remains a challenge. Nonparametric maximum likelihood produce estimates of the mixing distribution that are discrete, and these may be hard to interpret when the true mixing distribution is believed to have a smooth density. In this paper, we investigate an algorithm that produces a sequence of smooth estimates that has been conjectured to converge to the nonparametric maximum likelihood estimator. Here we give a rigorous proof of this conjecture, and propose a new data-driven stopping rule that produces smooth near-maximum likelihood estimates of the mixing density, and simulations demonstrate the quality empirical performance of this estimator.
We develop a fully Bayesian nonparametric regression model based on a Levy process prior named MLABS (Multivariate Levy Adaptive B-Spline regression) model, a multivariate version of the LARK (Levy Adaptive Regression Kernels) models, for estimating unknown functions with either varying degrees of smoothness or high interaction orders. Levy process priors have advantages of encouraging sparsity in the expansions and providing automatic selection over the number of basis functions. The unknown regression function is expressed as a weighted sum of tensor product of B-spline basis functions as the elements of an overcomplete system, which can deal with multi-dimensional data. The B-spline basis can express systematically functions with varying degrees of smoothness. By changing a set of degrees of the tensor product basis function, MLABS can adapt the smoothness of target functions due to the nice properties of B-spline bases. The local support of the B-spline basis enables the MLABS to make more delicate predictions than other existing methods in the two-dimensional surface data. Experiments on various simulated and real-world datasets illustrate that the MLABS model has comparable performance on regression and classification problems. We also show that the MLABS model has more stable and accurate predictive abilities than state-of-the-art nonparametric regression models in relatively low-dimensional data.
We describe an efficient implementation of Bayesian quantum phase estimation in the presence of noise and multiple eigenstates. The main contribution of this work is the dynamic switching between different representations of the phase distributions, namely truncated Fourier series and normal distributions. The Fourier-series representation has the advantage of being exact in many cases, but suffers from increasing complexity with each update of the prior. This necessitates truncation of the series, which eventually causes the distribution to become unstable. We derive bounds on the error in representing normal distributions with a truncated Fourier series, and use these to decide when to switch to the normal-distribution representation. This representation is much simpler, and was proposed in conjunction with rejection filtering for approximate Bayesian updates. We show that, in many cases, the update can be done exactly using analytic expressions, thereby greatly reducing the time complexity of the updates. Finally, when dealing with a superposition of several eigenstates, we need to estimate the relative weights. This can be formulated as a convex optimization problem, which we solve using a gradient-projection algorithm. By updating the weights at exponentially scaled iterations we greatly reduce the computational complexity without affecting the overall accuracy.
We develop a scalable multi-step Monte Carlo algorithm for inference under a large class of nonparametric Bayesian models for clustering and classification. Each step is embarrassingly parallel and can be implemented using the same Markov chain Monte Carlo sampler. The simplicity and generality of our approach makes inference for a wide range of Bayesian nonparametric mixture models applicable to large datasets. Specifically, we apply the approach to inference under a product partition model with regression on covariates. We show results for inference with two motivating data sets: a large set of electronic health records (EHR) and a bank telemarketing dataset. We find interesting clusters and favorable classification performance relative to other widely used competing classifiers.