No Arabic abstract
When observations are organized into groups where commonalties exist amongst them, the dependent random measures can be an ideal choice for modeling. One of the propositions of the dependent random measures is that the atoms of the posterior distribution are shared amongst groups, and hence groups can borrow information from each other. When normalized dependent random measures prior with independent increments are applied, we can derive appropriate exchangeable probability partition function (EPPF), and subsequently also deduce its inference algorithm given any mixture model likelihood. We provide all necessary derivation and solution to this framework. For demonstration, we used mixture of Gaussians likelihood in combination with a dependent structure constructed by linear combinations of CRMs. Our experiments show superior performance when using this framework, where the inferred values including the mixing weights and the number of clusters both respond appropriately to the number of completely random measure used.
In this paper, we study unitary Gaussian processes with independent increments with which the unitary equivalence to a Hudson-Parthasarathy evolution systems is proved. This gives a generalization of results in [16] and [17] in the absence of the stationarity condition.
In this paper, we are concerned with obtaining distribution-free concentration inequalities for mixture of independent Bernoulli variables that incorporate a notion of variance. Missing mass is the total probability mass associated to the outcomes that have not been seen in a given sample which is an important quantity that connects density estimates obtained from a sample to the population for discrete distributions. Therefore, we are specifically motivated to apply our method to study the concentration of missing mass - which can be expressed as a mixture of Bernoulli - in a novel way. We not only derive - for the first time - Bernstein-like large deviation bounds for the missing mass whose exponents behave almost linearly with respect to deviation size, but also sharpen McAllester and Ortiz (2003) and Berend and Kontorovich (2013) for large sample sizes in the case of small deviations which is the most interesting case in learning theory. In the meantime, our approach shows that the heterogeneity issue introduced in McAllester and Ortiz (2003) is resolvable in the case of missing mass in the sense that one can use standard inequalities but it may not lead to strong results. Thus, we postulate that our results are general and can be applied to provide potentially sharp Bernstein-like bounds under some constraints.
We study the problem of sampling from the power posterior distribution in Bayesian Gaussian mixture models, a robust version of the classical posterior. This power posterior is known to be non-log-concave and multi-modal, which leads to exponential mixing times for some standard MCMC algorithms. We introduce and study the Reflected Metropolis-Hastings Random Walk (RMRW) algorithm for sampling. For symmetric two-component Gaussian mixtures, we prove that its mixing time is bounded as $d^{1.5}(d + Vert theta_{0} Vert^2)^{4.5}$ as long as the sample size $n$ is of the order $d (d + Vert theta_{0} Vert^2)$. Notably, this result requires no conditions on the separation of the two means. En route to proving this bound, we establish some new results of possible independent interest that allow for combining Poincar{e} inequalities for conditional and marginal densities.
This is a continuation of the earlier work cite{SSS} to characterize stationary unitary increment Gaussian processes. The earlier assumption of uniform continuity is replaced by weak continuity and with a technical assumption on the domain of the generator, unitary equivalence of the processes to the solution of Hudson-Parthasarathy equation is proved.
A number of machine learning tasks entail a high degree of invariance: the data distribution does not change if we act on the data with a certain group of transformations. For instance, labels of images are invariant under translations of the images. Certain neural network architectures -- for instance, convolutional networks -- are believed to owe their success to the fact that they exploit such invariance properties. With the objective of quantifying the gain achieved by invariant architectures, we introduce two classes of models: invariant random features and invariant kernel methods. The latter includes, as a special case, the neural tangent kernel for convolutional networks with global average pooling. We consider uniform covariates distributions on the sphere and hypercube and a general invariant target function. We characterize the test error of invariant methods in a high-dimensional regime in which the sample size and number of hidden units scale as polynomials in the dimension, for a class of groups that we call `degeneracy $alpha$, with $alpha leq 1$. We show that exploiting invariance in the architecture saves a $d^alpha$ factor ($d$ stands for the dimension) in sample size and number of hidden units to achieve the same test error as for unstructured architectures. Finally, we show that output symmetrization of an unstructured kernel estimator does not give a significant statistical improvement; on the other hand, data augmentation with an unstructured kernel estimator is equivalent to an invariant kernel estimator and enjoys the same improvement in statistical efficiency.