Studying the neurological, genetic and evolutionary basis of human vocal communication mechanisms using animal vocalization models is an important field of neuroscience. The data sets typically comprise structured sequences of syllables or `songs produced by animals from different genotypes under different social contexts. We develop a novel Bayesian semiparametric framework for inference in such data sets. Our approach is built on a novel class of mixed effects Markov transition models for the songs that accommodates exogenous influences of genotype and context as well as animal-specific heterogeneity. We design efficient Markov chain Monte Carlo algorithms for posterior computation. Crucial advantages of the proposed approach include its ability to provide insights into key scientific queries related to global and local influences of the exogenous predictors on the transition dynamics via automated tests of hypotheses. The methodology is illustrated using simulation experiments and the aforementioned motivating application in neuroscience.
This paper demonstrates the advantages of sharing information about unknown features of covariates across multiple model components in various nonparametric regression problems including multivariate, heteroscedastic, and semi-continuous responses. In this paper, we present methodology which allows for information to be shared nonparametrically across various model components using Bayesian sum-of-tree models. Our simulation results demonstrate that sharing of information across related model components is often very beneficial, particularly in sparse high-dimensional problems in which variable selection must be conducted. We illustrate our methodology by analyzing medical expenditure data from the Medical Expenditure Panel Survey (MEPS). To facilitate the Bayesian nonparametric regression analysis, we develop two novel models for analyzing the MEPS data using Bayesian additive regression trees - a heteroskedastic log-normal hurdle model with a shrink-towards-homoskedasticity prior, and a gamma hurdle model.
We consider the problem of flexible modeling of higher order Markov chains when an upper bound on the order of the chain is known but the true order and nature of the serial dependence are unknown. We propose Bayesian nonparametric methodology based on conditional tensor factorizations, which can characterize any transition probability with a specified maximal order. The methodology selects the important lags and captures higher order interactions among the lags, while also facilitating calculation of Bayes factors for a variety of hypotheses of interest. We design efficient Markov chain Monte Carlo algorithms for posterior computation, allowing for uncertainty in the set of important lags to be included and in the nature and order of the serial dependence. The methods are illustrated using simulation experiments and real world applications.
Understanding how adult humans learn non-native speech categories such as tone information has shed novel insights into the mechanisms underlying experience-dependent brain plasticity. Scientists have traditionally examined these questions using longitudinal learning experiments under a multi-category decision making paradigm. Drift-diffusion processes are popular in such contexts for their ability to mimic underlying neural mechanisms. Motivated by these problems, we develop a novel Bayesian semiparametric inverse Gaussian drift-diffusion mixed model for multi-alternative decision making in longitudinal settings. We design a Markov chain Monte Carlo algorithm for posterior computation. We evaluate the methods empirical performances through synthetic experiments. Applied to our motivating longitudinal tone learning study, the method provides novel insights into how the biologically interpretable model parameters evolve with learning, differ between input-response tone combinations, and differ between well and poorly performing adults.
We consider the problem of multivariate density deconvolution when the interest lies in estimating the distribution of a vector-valued random variable but precise measurements of the variable of interest are not available, observations being contaminated with additive measurement errors. The existing sparse literature on the problem assumes the density of the measurement errors to be completely known. We propose robust Bayesian semiparametric multivariate deconvolution approaches when the measurement error density is not known but replicated proxies are available for each unobserved value of the random vector. Additionally, we allow the variability of the measurement errors to depend on the associated unobserved value of the vector of interest through unknown relationships which also automatically includes the case of multivariate multiplicative measurement errors. Basic properties of finite mixture models, multivariate normal kernels and exchangeable priors are exploited in many novel ways to meet the modeling and computational challenges. Theoretical results that show the flexibility of the proposed methods are provided. We illustrate the efficiency of the proposed methods in recovering the true density of interest through simulation experiments. The methodology is applied to estimate the joint consumption pattern of different dietary components from contaminated 24 hour recalls.
We introduce a new nonparametric density estimator inspired by Markov Chains, and generalizing the well-known Kernel Density Estimator (KDE). Our estimator presents several benefits with respect to the usual ones and can be used straightforwardly as a foundation in all density-based algorithms. We prove the consistency of our estimator and we find it typically outperforms KDE in situations of large sample size and high dimensionality. We also employ our density estimator to build a local outlier detector, showing very promising results when applied to some realistic datasets.