No Arabic abstract
A high level of physical detail in a molecular model improves its ability to perform high accuracy simulations, but can also significantly affect its complexity and computational cost. In some situations, it is worthwhile to add additional complexity to a model to capture properties of interest; in others, additional complexity is unnecessary and can make simulations computationally infeasible. In this work we demonstrate the use of Bayes factors for molecular model selection, using Monte Carlo sampling techniques to evaluate the evidence for different levels of complexity in the two-centered Lennard-Jones + quadrupole (2CLJQ) fluid model. Examining three levels of nested model complexity, we demonstrate that the use of variable quadrupole and bond length parameters in this model framework is justified only sometimes. We also explore the effect of the Bayesian prior distribution on the Bayes factors, as well as ways to propose meaningful prior distributions. This Bayesian Markov Chain Monte Carlo (MCMC) process is enabled by the use of analytical surrogate models that accurately approximate the physical properties of interest. This work paves the way for further atomistic model selection work via Bayesian inference and surrogate modeling
The universal tendency in scanning probe microscopy (SPM) over the last two decades is to transition from simple 2D imaging to complex detection and spectroscopic imaging modes. The emergence of complex SPM engines brings forth the challenge of reliable data interpretation, i.e. conversion from detected signal to descriptors specific to tip-surface interactions and subsequently to materials properties. Here, we implemented a Bayesian inference approach for the analysis of the image formation mechanisms in band excitation (BE) SPM. Compared to the point estimates in classical functional fit approaches, Bayesian inference allows for the incorporation of extant knowledge of materials and probe behavior in the form of corresponding prior distribution and return the information on the material functionality in the form of readily interpretable posterior distributions. We note that in application of Bayesian methods, special care should be made for proper setting on the problem as model selection vs. establishing practical parameter equivalence. We further explore the non-linear mechanical behaviors at topological defects in a classical ferroelectric material, PbTiO3. We observe the non-trivial evolution of Duffing resonance frequency and the nonlinearity of the sample surface, suggesting the presence of the hidden elements of domain structure. These observations suggest that the spectrum of anomalous behaviors at the ferroelectric domain walls can be significantly broader than previously believed and can extend to non-conventional mechanical properties in addition to static and microwave conductance.
Variational Bayes (VB) has been used to facilitate the calculation of the posterior distribution in the context of Bayesian inference of the parameters of nonlinear models from data. Previously an analytical formulation of VB has been derived for nonlinear model inference on data with additive gaussian noise as an alternative to nonlinear least squares. Here a stochastic solution is derived that avoids some of the approximations required of the analytical formulation, offering a solution that can be more flexibly deployed for nonlinear model inference problems. The stochastic VB solution was used for inference on a biexponential toy case and the algorithmic parameter space explored, before being deployed on real data from a magnetic resonance imaging study of perfusion. The new method was found to achieve comparable parameter recovery to the analytic solution and be competitive in terms of computational speed despite being reliant on sampling.
To build a flexible and interpretable model for document analysis, we develop deep autoencoding topic model (DATM) that uses a hierarchy of gamma distributions to construct its multi-stochastic-layer generative network. In order to provide scalable posterior inference for the parameters of the generative network, we develop topic-layer-adaptive stochastic gradient Riemannian MCMC that jointly learns simplex-constrained global parameters across all layers and topics, with topic and layer specific learning rates. Given a posterior sample of the global parameters, in order to efficiently infer the local latent representations of a document under DATM across all stochastic layers, we propose a Weibull upward-downward variational encoder that deterministically propagates information upward via a deep neural network, followed by a Weibull distribution based stochastic downward generative model. To jointly model documents and their associated labels, we further propose supervised DATM that enhances the discriminative power of its latent representations. The efficacy and scalability of our models are demonstrated on both unsupervised and supervised learning tasks on big corpora.
Models defined by stochastic differential equations (SDEs) allow for the representation of random variability in dynamical systems. The relevance of this class of models is growing in many applied research areas and is already a standard tool to model e.g. financial, neuronal and population growth dynamics. However inference for multidimensional SDE models is still very challenging, both computationally and theoretically. Approximate Bayesian computation (ABC) allow to perform Bayesian inference for models which are sufficiently complex that the likelihood function is either analytically unavailable or computationally prohibitive to evaluate. A computationally efficient ABC-MCMC algorithm is proposed, halving the running time in our simulations. Focus is on the case where the SDE describes latent dynamics in state-space models; however the methodology is not limited to the state-space framework. Simulation studies for a pharmacokinetics/pharmacodynamics model and for stochastic chemical reactions are considered and a MATLAB package implementing our ABC-MCMC algorithm is provided.
We consider the problem of variable selection in high-dimensional settings with missing observations among the covariates. To address this relatively understudied problem, we propose a new synergistic procedure -- adaptive Bayesian SLOPE -- which effectively combines the SLOPE method (sorted $l_1$ regularization) together with the Spike-and-Slab LASSO method. We position our approach within a Bayesian framework which allows for simultaneous variable selection and parameter estimation, despite the missing values. As with the Spike-and-Slab LASSO, the coefficients are regarded as arising from a hierarchical model consisting of two groups: (1) the spike for the inactive and (2) the slab for the active. However, instead of assigning independent spike priors for each covariate, here we deploy a joint SLOPE spike prior which takes into account the ordering of coefficient magnitudes in order to control for false discoveries. Through extensive simulations, we demonstrate satisfactory performance in terms of power, FDR and estimation bias under a wide range of scenarios. Finally, we analyze a real dataset consisting of patients from Paris hospitals who underwent a severe trauma, where we show excellent performance in predicting platelet levels. Our methodology has been implemented in C++ and wrapped into an R package ABSLOPE for public use.