ترغب بنشر مسار تعليمي؟ اضغط هنا

Computationally Efficient Bayesian Estimation of High Dimensional Copulas with Discrete and Mixed Margins

52   0   0.0 ( 0 )
 نشر من قبل David Gunawan
 تاريخ النشر 2016
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Estimating copulas with discrete marginal distributions is challenging, especially in high dimensions, because computing the likelihood contribution of each observation requires evaluating $2^{J}$ terms, with $J$ the number of discrete variables. Currently, data augmentation methods are used to carry out inference for discrete copula and, in practice, the computation becomes infeasible when $J$ is large. Our article proposes two new fast Bayesian approaches for estimating high dimensional copulas with discrete margins, or a combination of discrete and continuous margins. Both methods are based on recent advances in Bayesian methodology that work with an unbiased estimate of the likelihood rather than the likelihood itself, and our key observation is that we can estimate the likelihood of a discrete copula unbiasedly with much less computation than evaluating the likelihood exactly or with current simulation methods that are based on augmenting the model with latent variables. The first approach builds on the pseudo marginal method that allows Markov chain Monte Carlo simulation from the posterior distribution using only an unbiased estimate of the likelihood. The second approach is based on a Variational Bayes approximation to the posterior and also uses an unbiased estimate of the likelihood. We show that Monte Carlo and randomised quasi Monte Carlo methods can be used with both approaches to reduce the variability of the estimate of the likelihood, and hence enable us to carry out Bayesian inference for high values of $J$ for some classes of copulas where the computation was previously too expensive. Our article also introduces {em a correlated quasi random number pseudo marginal} approach into the literature. The methodology is illustrated through several real and simulated data examples.



قيم البحث

اقرأ أيضاً

307 - Liang Ding , Xiaowei Zhang 2020
Stochastic kriging has been widely employed for simulation metamodeling to predict the response surface of a complex simulation model. However, its use is limited to cases where the design space is low-dimensional, because the number of design points required for stochastic kriging to produce accurate prediction, in general, grows exponentially in the dimension of the design space. The large sample size results in both a prohibitive sample cost for running the simulation model and a severe computational challenge due to the need of inverting large covariance matrices. Based on tensor Markov kernels and sparse grid experimental designs, we develop a novel methodology that dramatically alleviates the curse of dimensionality. We show that the sample complexity of the proposed methodology grows very mildly in the dimension, even under model misspecification. We also develop fast algorithms that compute stochastic kriging in its exact form without any approximation schemes. We demonstrate via extensive numerical experiments that our methodology can handle problems with a design space of more than 10,000 dimensions, improving both prediction accuracy and computational efficiency by orders of magnitude relative to typical alternative methods in practice.
99 - Ewout van den Berg 2020
We describe an efficient implementation of Bayesian quantum phase estimation in the presence of noise and multiple eigenstates. The main contribution of this work is the dynamic switching between different representations of the phase distributions, namely truncated Fourier series and normal distributions. The Fourier-series representation has the advantage of being exact in many cases, but suffers from increasing complexity with each update of the prior. This necessitates truncation of the series, which eventually causes the distribution to become unstable. We derive bounds on the error in representing normal distributions with a truncated Fourier series, and use these to decide when to switch to the normal-distribution representation. This representation is much simpler, and was proposed in conjunction with rejection filtering for approximate Bayesian updates. We show that, in many cases, the update can be done exactly using analytic expressions, thereby greatly reducing the time complexity of the updates. Finally, when dealing with a superposition of several eigenstates, we need to estimate the relative weights. This can be formulated as a convex optimization problem, which we solve using a gradient-projection algorithm. By updating the weights at exponentially scaled iterations we greatly reduce the computational complexity without affecting the overall accuracy.
Clustering task of mixed data is a challenging problem. In a probabilistic framework, the main difficulty is due to a shortage of conventional distributions for such data. In this paper, we propose to achieve the mixed data clustering with a Gaussian copula mixture model, since copulas, and in particular the Gaussian ones, are powerful tools for easily modelling the distribution of multivariate variables. Indeed, considering a mixing of continuous, integer and ordinal variables (thus all having a cumulative distribution function), this copula mixture model defines intra-component dependencies similar to a Gaussian mixture, so with classical correlation meaning. Simultaneously, it preserves standard margins associated to continuous, integer and ordered features, namely the Gaussian, the Poisson and the ordered multinomial distributions. As an interesting by-product, the proposed mixture model generalizes many well-known ones and also provides tools of visualization based on the parameters. At a practical level, the Bayesian inference is retained and it is achieved with a Metropolis-within-Gibbs sampler. Experiments on simulated and real data sets finally illustrate the expected advantages of the proposed model for mixed data: flexible and meaningful parametrization combined with visualization features.
We propose a new method for changepoint estimation in partially-observed, high-dimensional time series that undergo a simultaneous change in mean in a sparse subset of coordinates. Our first methodological contribution is to introduce a MissCUSUM tra nsformation (a generalisation of the popular Cumulative Sum statistics), that captures the interaction between the signal strength and the level of missingness in each coordinate. In order to borrow strength across the coordinates, we propose to project these MissCUSUM statistics along a direction found as the solution to a penalised optimisation problem tailored to the specific sparsity structure. The changepoint can then be estimated as the location of the peak of the absolute value of the projected univariate series. In a model that allows different missingness probabilities in different component series, we identify that the key interaction between the missingness and the signal is a weighted sum of squares of the signal change in each coordinate, with weights given by the observation probabilities. More specifically, we prove that the angle between the estimated and oracle projection directions, as well as the changepoint location error, are controlled with high probability by the sum of two terms, both involving this weighted sum of squares, and representing the error incurred due to noise and the error due to missingness respectively. A lower bound confirms that our changepoint estimator, which we call MissInspect, is optimal up to a logarithmic factor. The striking effectiveness of the MissInspect methodology is further demonstrated both on simulated data, and on an oceanographic data set covering the Neogene period.
The mixed-logit model is a flexible tool in transportation choice analysis, which provides valuable insights into inter and intra-individual behavioural heterogeneity. However, applications of mixed-logit models are limited by the high computational and data requirements for model estimation. When estimating on small samples, the Bayesian estimation approach becomes vulnerable to over and under-fitting. This is problematic for investigating the behaviour of specific population sub-groups or market segments with low data availability. Similar challenges arise when transferring an existing model to a new location or time period, e.g., when estimating post-pandemic travel behaviour. We propose an Early Stopping Bayesian Data Assimilation (ESBDA) simulator for estimation of mixed-logit which combines a Bayesian statistical approach with Machine Learning methodologies. The aim is to improve the transferability of mixed-logit models and to enable the estimation of robust choice models with low data availability. This approach can provide new insights into choice behaviour where the traditional estimation of mixed-logit models was not possible due to low data availability, and open up new opportunities for investment and planning decisions support. The ESBDA estimator is benchmarked against the Direct Application approach, a basic Bayesian simulator with random starting parameter values and a Bayesian Data Assimilation (BDA) simulator without early stopping. The ESBDA approach is found to effectively overcome under and over-fitting and non-convergence issues in simulation. Its resulting models clearly outperform those of the reference simulators in predictive accuracy. Furthermore, models estimated with ESBDA tend to be more robust, with significant parameters with signs and values consistent with behavioural theory, even when estimated on small samples.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا