No Arabic abstract
Recent work in unsupervised learning has focused on efficient inference and learning in latent variables models. Training these models by maximizing the evidence (marginal likelihood) is typically intractable. Thus, a common approximation is to maximize the Evidence Lower BOund (ELBO) instead. Variational autoencoders (VAE) are a powerful and widely-used class of generative models that optimize the ELBO efficiently for large datasets. However, the VAEs default Gaussian choice for the prior imposes a strong constraint on its ability to represent the true posterior, thereby degrading overall performance. A Gaussian mixture model (GMM) would be a richer prior, but cannot be handled efficiently within the VAE framework because of the intractability of the Kullback-Leibler divergence for GMMs. We deviate from the common VAE framework in favor of one with an analytical solution for Gaussian mixture prior. To perform efficient inference for GMM priors, we introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs. This new objective allows us to incorporate richer, multi-modal priors into the autoencoding framework. We provide empirical studies on a range of datasets and show that our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
We introduce a notion of complexity for systems of linear forms called sequential Cauchy-Schwarz complexity, which is parametrized by two positive integers $k,ell$ and refines the notion of Cauchy-Schwarz complexity introduced by Green and Tao. We prove that if a system of linear forms has sequential Cauchy-Schwarz complexity at most $(k,ell)$ then any average of 1-bounded functions over this system is controlled by the $2^{1-ell}$-th power of the Gowers $U^{k+1}$-norms of the functions. For $ell=1$ this agrees with Cauchy-Schwarz complexity, but for $ell>1$ there are families of systems that have sequential Cauchy-Schwarz complexity at most $(k,ell)$ whereas their Cauchy-Schwarz complexity is greater than $k$. For instance, for $p$ prime and $kin mathbb{N}$, the system of forms $big{phi_{z_1,z_2}(x,t_1,t_2)= x+z_1 t_1+z_2t_2;|; z_1,z_2in [0,p-1], z_1+z_2<kbig}$ can be viewed as a $2$-dimensional analogue of arithmetic progressions of length $k$. We prove that this system has sequential Cauchy-Schwarz complexity at most $(k-2,ell)$ for some $ell=O_{k,p}(1)$, even for $p<k$, whereas its Cauchy-Schwarz complexity can be strictly greater than $k-2$. In fact we prove this for the $M$-dimensional analogues of these systems for any $Mgeq 2$, obtaining polynomial true-complexity bounds for these and other families of systems. In a separate paper, we use these results to give a new proof of the inverse theorem for Gowers norms on vector spaces $mathbb{F}_p^n$, and applications concerning ergodic actions of $mathbb{F}_p^{omega}$.
Dimensionality reduction is a crucial first step for many unsupervised learning tasks including anomaly detection and clustering. Autoencoder is a popular mechanism to accomplish dimensionality reduction. In order to make dimensionality reduction effective for high-dimensional data embedding nonlinear low-dimensional manifold, it is understood that some sort of geodesic distance metric should be used to discriminate the data samples. Inspired by the success of geodesic distance approximators such as ISOMAP, we propose to use a minimum spanning tree (MST), a graph-based algorithm, to approximate the local neighborhood structure and generate structure-preserving distances among data points. We use this MST-based distance metric to replace the Euclidean distance metric in the embedding function of autoencoders and develop a new graph regularized autoencoder, which outperforms a wide range of alternative methods over 20 benchmark anomaly detection datasets. We further incorporate the MST regularizer into two generative adversarial networks and find that using the MST regularizer improves the performance of anomaly detection substantially for both generative adversarial networks. We also test our MST regularized autoencoder on two datasets in a clustering application and witness its superior performance as well.
An important component of autoencoders is the method by which the information capacity of the latent representation is minimized or limited. In this work, the rank of the covariance matrix of the codes is implicitly minimized by relying on the fact that gradient descent learning in multi-layer linear networks leads to minimum-rank solutions. By inserting a number of extra linear layers between the encoder and the decoder, the system spontaneously learns representations with a low effective dimension. The model, dubbed Implicit Rank-Minimizing Autoencoder (IRMAE), is simple, deterministic, and learns compact latent spaces. We demonstrate the validity of the method on several image generation and representation learning tasks.
In this paper, we treat the image generation task using an autoencoder, a representative latent model. Unlike many studies regularizing the latent variables distribution by assuming a manually specified prior, we approach the image generation task using an autoencoder by directly estimating the latent distribution. To this end, we introduce latent density estimator which captures latent distribution explicitly and propose its structure. Through experiments, we show that our generative model generates images with the improved visual quality compared to previous autoencoder-based generative models.
The Cauchy-Schwarz (CS) inequality -- one of the most widely used and important inequalities in mathematics -- can be formulated as an upper bound to the strength of correlations between classically fluctuating quantities. Quantum mechanical correlations can, however, exceed classical bounds.Here we realize four-wave mixing of atomic matter waves using colliding Bose-Einstein condensates, and demonstrate the violation of a multimode CS inequality for atom number correlations in opposite zones of the collision halo. The correlated atoms have large spatial separations and therefore open new opportunities for extending fundamental quantum-nonlocality tests to ensembles of massive particles.