Neural Expectation Maximization

352 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Sjoerd van Steenkiste

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Klaus Greff - Sjoerd van Steenkiste - Jurgen Schmidhuber

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Many real world tasks such as reasoning and physical interaction require identification and manipulation of conceptual entities. A first step towards solving these tasks is the automated discovery of distributed symbol-like representations. In this paper, we explicitly formalize this problem as inference in a spatial mixture model where each component is parametrized by a neural network. Based on the Expectation Maximization framework we then derive a differentiable clustering method that simultaneously learns how to group and represent individual entities. We evaluate our method on the (sequential) perceptual grouping task and find that it is able to accurately recover the constituent objects. We demonstrate that the learned representations are useful for next-step prediction.

قيم البحث

74 - Sjoerd van Steenkiste , Michael Chang , Klaus Greff 2018

Common-sense physical reasoning is an essential ingredient for any intelligent agent operating in the real-world. For example, it can be used to simulate the environment, or to infer the state of parts of the world that are currently unobserved. In o rder to match real-world conditions this causal knowledge must be learned without access to supervised data. To address this problem we present a novel method that learns to discover objects and model their physical interactions from raw visual images in a purely emph{unsupervised} fashion. It incorporates prior knowledge about the compositional nature of human perception to factor interactions between object-pairs and learn efficiently. On videos of bouncing balls we show the superior modelling capabilities of our method compared to other unsupervised neural approaches that do not incorporate such prior knowledge. We demonstrate its ability to handle occlusion and show that it can extrapolate learned knowledge to scenes with different numbers of objects.

التعلم الآلي الذكاء الاصطناعي الحوسبة العصبية والتطورية

Fast Incremental Expectation Maximization for finite-sum optimization: nonasymptotic convergence

83 - Gersende Fort , E. Moulines (CMAP 2020

Fast Incremental Expectation Maximization (FIEM) is a version of the EM framework for large datasets. In this paper, we first recast FIEM and other incremental EM type algorithms in the {em Stochastic Approximation within EM} framework. Then, we prov ide nonasymptotic bounds for the convergence in expectation as a function of the number of examples $n$ and of the maximal number of iterations $kmax$. We propose two strategies for achieving an $epsilon$-approximate stationary point, respectively with $kmax = O(n^{2/3}/epsilon)$ and $kmax = O(sqrt{n}/epsilon^{3/2})$, both strategies relying on a random termination rule before $kmax$ and on a constant step size in the Stochastic Approximation step. Our bounds provide some improvements on the literature. First, they allow $kmax$ to scale as $sqrt{n}$ which is better than $n^{2/3}$ which was the best rate obtained so far; it is at the cost of a larger dependence upon the tolerance $epsilon$, thus making this control relevant for small to medium accuracy with respect to the number of examples $n$. Second, for the $n^{2/3}$-rate, the numerical illustrations show that thanks to an optimized choice of the step size and of the bounds in terms of quantities characterizing the optimization problem at hand, our results desig a less conservative choice of the step size and provide a better control of the convergence in expectation.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Neural Mixture Models with Expectation-Maximization for End-to-end Deep Clustering

192 - Dumindu Tissera , Kasun Vithanage , Rukshan Wijesinghe 2021

Any clustering algorithm must synchronously learn to model the clusters and allocate data to those clusters in the absence of labels. Mixture model-based methods model clusters with pre-defined statistical distributions and allocate data to those clu sters based on the cluster likelihoods. They iteratively refine those distribution parameters and member assignments following the Expectation-Maximization (EM) algorithm. However, the cluster representability of such hand-designed distributions that employ a limited amount of parameters is not adequate for most real-world clustering tasks. In this paper, we realize mixture model-based clustering with a neural network where the final layer neurons, with the aid of an additional transformation, approximate cluster distribution outputs. The network parameters pose as the parameters of those distributions. The result is an elegant, much-generalized representation of clusters than a restricted mixture of hand-designed distributions. We train the network end-to-end via batch-wise EM iterations where the forward pass acts as the E-step and the backward pass acts as the M-step. In image clustering, the mixture-based EM objective can be used as the clustering objective along with existing representation learning methods. In particular, we show that when mixture-EM optimization is fused with consistency optimization, it improves the sole consistency optimization performance in clustering. Our trained networks outperform single-stage deep clustering methods that still depend on k-means, with unsupervised classification accuracy of 63.8% in STL10, 58% in CIFAR10, 25.9% in CIFAR100, and 98.9% in MNIST.

التعلم الآلي الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

A Stochastic Path-Integrated Differential EstimatoR Expectation Maximization Algorithm

73 - Gersende Fort , Hoi-To Wai 2020

The Expectation Maximization (EM) algorithm is of key importance for inference in latent variable models including mixture of regressors and experts, missing observations. This paper introduces a novel EM algorithm, called texttt{SPIDER-EM}, for infe rence from a training set of size $n$, $n gg 1$. At the core of our algorithm is an estimator of the full conditional expectation in the {sf E}-step, adapted from the stochastic path-integrated differential estimator ({tt SPIDER}) technique. We derive finite-time complexity bounds for smooth non-convex likelihood: we show that for convergence to an $epsilon$-approximate stationary point, the complexity scales as $K_{operatorname{Opt}} (n,epsilon )={cal O}(epsilon^{-1})$ and $K_{operatorname{CE}}( n,epsilon ) = n+ sqrt{n} {cal O}(epsilon^{-1} )$, where $K_{operatorname{Opt}}( n,epsilon )$ and $K_{operatorname{CE}}(n, epsilon )$ are respectively the number of {sf M}-steps and the number of per-sample conditional expectations evaluations. This improves over the state-of-the-art algorithms. Numerical results support our findings.

التعلم الآلي الذكاء الاصطناعي نظرية الإحصاء

Mutual Information Maximization in Graph Neural Networks

83 - Xinhan Di , Pengqian Yu , Rui Bu 2019

A variety of graph neural networks (GNNs) frameworks for representation learning on graphs have been recently developed. These frameworks rely on aggregation and iteration scheme to learn the representation of nodes. However, information between node s is inevitably lost in the scheme during learning. In order to reduce the loss, we extend the GNNs frameworks by exploring the aggregation and iteration scheme in the methodology of mutual information. We propose a new approach of enlarging the normal neighborhood in the aggregation of GNNs, which aims at maximizing mutual information. Based on a series of experiments conducted on several benchmark datasets, we show that the proposed approach improves the state-of-the-art performance for four types of graph tasks, including supervised and semi-supervised graph classification, graph link prediction and graph edge generation and classification.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي