No Arabic abstract
Exponential family random graph models (ERGMs) can be understood in terms of a set of structural biases that act on an underlying reference distribution. This distribution determines many aspects of the behavior and interpretation of the ERGM families incorporating it. One important innovation in this area has been the development of an ERGM reference model that produces realistic behavior when generalized to sparse networks of varying size. Here, we show that this model can be derived from a latent dynamic process in which tie formation takes place within small local settings between which individuals move. This derivation provides one possible micro-process interpretation of the sparse ERGM reference model, and sheds light on the conditions under which constant mean degree scaling can emerge.
Gaussian process regression (GPR) model is a popular nonparametric regression model. In GPR, features of the regression function such as varying degrees of smoothness and periodicities are modeled through combining various covarinace kernels, which are supposed to model certain effects. The covariance kernels have unknown parameters which are estimated by the EM-algorithm or Markov Chain Monte Carlo. The estimated parameters are keys to the inference of the features of the regression functions, but identifiability of these parameters has not been investigated. In this paper, we prove identifiability of covariance kernel parameters in two radial basis mixed kernel GPR and radial basis and periodic mixed kernel GPR. We also provide some examples about non-identifiable cases in such mixed kernel GPRs.
Many social and other networks exhibit stable size scaling relationships, such that features such as mean degree or reciprocation rates change slowly or are approximately constant as the number of vertices increases. Statistical network models built on top of simple Bernoulli baseline (or reference) measures often behave unrealistically in this respect, leading to the development of sparse reference models that preserve features such as mean degree scaling. In this paper, we generalize recent work on the micro-foundations of such reference models to the case of sparse directed graphs with non-vanishing reciprocity, providing a dynamic process interpretation of the emergence of stable macroscopic behavior.
The analysis of high dimensional survival data is challenging, primarily due to the problem of overfitting which occurs when spurious relationships are inferred from data that subsequently fail to exist in test data. Here we propose a novel method of extracting a low dimensional representation of covariates in survival data by combining the popular Gaussian Process Latent Variable Model (GPLVM) with a Weibull Proportional Hazards Model (WPHM). The combined model offers a flexible non-linear probabilistic method of detecting and extracting any intrinsic low dimensional structure from high dimensional data. By reducing the covariate dimension we aim to diminish the risk of overfitting and increase the robustness and accuracy with which we infer relationships between covariates and survival outcomes. In addition, we can simultaneously combine information from multiple data sources by expressing multiple datasets in terms of the same low dimensional space. We present results from several simulation studies that illustrate a reduction in overfitting and an increase in predictive performance, as well as successful detection of intrinsic dimensionality. We provide evidence that it is advantageous to combine dimensionality reduction with survival outcomes rather than performing unsupervised dimensionality reduction on its own. Finally, we use our model to analyse experimental gene expression data and detect and extract a low dimensional representation that allows us to distinguish high and low risk groups with superior accuracy compared to doing regression on the original high dimensional data.
Discrete random probability measures and the exchangeable random partitions they induce are key tools for addressing a variety of estimation and prediction problems in Bayesian inference. Indeed, many popular nonparametric priors, such as the Dirichlet and the Pitman-Yor process priors, select discrete probability distributions almost surely and, therefore, automatically induce exchangeable random partitions. Here we focus on the family of Gibbs-type priors, a recent and elegant generalization of the Dirichlet and the Pitman-Yor process priors. These random probability measures share properties that are appealing both from a theoretical and an applied point of view: (i) they admit an intuitive characterization in terms of their predictive structure justifying their use in terms of a precise assumption on the learning mechanism; (ii) they stand out in terms of mathematical tractability; (iii) they include several interesting special cases besides the Dirichlet and the Pitman-Yor processes. The goal of our paper is to provide a systematic and unified treatment of Gibbs-type priors and highlight their implications for Bayesian nonparametric inference. We will deal with their distributional properties, the resulting estimators, frequentist asymptotic validation and the construction of time-dependen
Statistical inference for sparse covariance matrices is crucial to reveal dependence structure of large multivariate data sets, but lacks scalable and theoretically supported Bayesian methods. In this paper, we propose beta-mixture shrinkage prior, computationally more efficient than the spike and slab prior, for sparse covariance matrices and establish its minimax optimality in high-dimensional settings. The proposed prior consists of beta-mixture shrinkage and gamma priors for off-diagonal and diagonal entries, respectively. To ensure positive definiteness of the resulting covariance matrix, we further restrict the support of the prior to a subspace of positive definite matrices. We obtain the posterior convergence rate of the induced posterior under the Frobenius norm and establish a minimax lower bound for sparse covariance matrices. The class of sparse covariance matrices for the minimax lower bound considered in this paper is controlled by the number of nonzero off-diagonal elements and has more intuitive appeal than those appeared in the literature. The obtained posterior convergence rate coincides with the minimax lower bound unless the true covariance matrix is extremely sparse. In the simulation study, we show that the proposed method is computationally more efficient than competitors, while achieving comparable performance. Advantages of the shrinkage prior are demonstrated based on two real data sets.