No Arabic abstract
Applications such as the analysis of microbiome data have led to renewed interest in statistical methods for compositional data, i.e., multivariate data in the form of probability vectors that contain relative proportions. In particular, there is considerable interest in modeling interactions among such relative proportions. To this end we propose a class of exponential family models that accommodate general patterns of pairwise interaction while being supported on the probability simplex. Special cases include the family of Dirichlet distributions as well as Aitchisons additive logistic normal distributions. Generally, the distributions we consider have a density that features a difficult to compute normalizing constant. To circumvent this issue, we design effective estimation methods based on generaliz
A common challenge in estimating parameters of probability density functions is the intractability of the normalizing constant. While in such cases maximum likelihood estimation may be implemented using numerical integration, the approach becomes computationally intensive. In contrast, the score matching method of Hyvarinen (2005) avoids direct calculation of the normalizing constant and yields closed-form estimates for exponential families of continuous distributions over $mathbb{R}^m$. Hyvarinen (2007) extended the approach to distributions supported on the non-negative orthant $mathbb{R}_+^m$. In this paper, we give a generalized form of score matching for non-negative data that improves estimation efficiency. We also generalize the regularized score matching method of Lin et al. (2016) for non-negative Gaussian graphical models, with improved theoretical guarantees.
Estimation of density functions supported on general domains arises when the data is naturally restricted to a proper subset of the real space. This problem is complicated by typically intractable normalizing constants. Score matching provides a powerful tool for estimating densities with such intractable normalizing constants, but as originally proposed is limited to densities on $mathbb{R}^m$ and $mathbb{R}_+^m$. In this paper, we offer a natural generalization of score matching that accommodates densities supported on a very general class of domains. We apply the framework to truncated graphical and pairwise interaction models, and provide theoretical guarantees for the resulting estimators. We also generalize a recently proposed method from bounded to unbounded domains, and empirically demonstrate the advantages of our method.
Recent advance in diffusion models incorporates the Stochastic Differential Equation (SDE), which brings the state-of-the art performance on image generation tasks. This paper improves such diffusion models by analyzing the model at the zero diffusion time. In real datasets, the score function diverges as the diffusion time ($t$) decreases to zero, and this observation leads an argument that the score estimation fails at $t=0$ with any neural network structure. Subsequently, we introduce Unbounded Diffusion Model (UDM) that resolves the score diverging problem with an easily applicable modification to any diffusion models. Additionally, we introduce a new SDE that overcomes the theoretic and practical limitations of Variance Exploding SDE. On top of that, the introduced Soft Truncation method improves the sample quality by mitigating the loss scale issue that happens at $t=0$. We further provide a theoretic result of the proposed method to uncover the behind mechanism of the diffusion models.
A common challenge in estimating parameters of probability density functions is the intractability of the normalizing constant. While in such cases maximum likelihood estimation may be implemented using numerical integration, the approach becomes computationally intensive. The score matching method of Hyvarinen [2005] avoids direct calculation of the normalizing constant and yields closed-form estimates for exponential families of continuous distributions over $mathbb{R}^m$. Hyvarinen [2007] extended the approach to distributions supported on the non-negative orthant, $mathbb{R}_+^m$. In this paper, we give a generalized form of score matching for non-negative data that improves estimation efficiency. As an example, we consider a general class of pairwise interaction models. Addressing an overlooked inexistence problem, we generalize the regularized score matching method of Lin et al. [2016] and improve its theoretical guarantees for non-negative Gaussian graphical models.
Partial correlations quantify linear association between two variables adjusting for the influence of the remaining variables. They form the backbone for graphical models and are readily obtained from the inverse of the covariance matrix. For compositional data, the covariance structure is specified from log ratios of variables, so unless we try to open the data via a normalization, this implies changes in the definition and interpretation of partial correlations. In the present work, we elucidate how results derived by Aitchison (1986) lead to a natural definition of partial correlation that has a number of advantages over current measures of association. For this, we show that the residuals of log-ratios between a variable with a reference, when adjusting for all remaining variables including the reference, are reference-independent. Since the reference itself can be controlled for, correlations between residuals are defined for the variables directly without the necessity to recur to ratios except when specifying which variables are partialled out. Thus, perhaps surprisingly, partial correlations do not have the problems commonly found with measures of pairwise association on compositional data. They are well-defined between two variables, are properly scaled, and allow for negative association. By design, they are subcompositionally incoherent, but they share this property with conventional partial correlations (where results change when adjusting for the influence of fewer variables). We discuss the equivalence with normalization-based approaches whenever the normalizing variables are controlled for. We also discuss the partial variances and correlations we obtain from a previously studied data set of Roman glass cups.