ترغب بنشر مسار تعليمي؟ اضغط هنا

Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions

75   0   0.0 ( 0 )
 نشر من قبل Sjoerd van Steenkiste
 تاريخ النشر 2018
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Common-sense physical reasoning is an essential ingredient for any intelligent agent operating in the real-world. For example, it can be used to simulate the environment, or to infer the state of parts of the world that are currently unobserved. In order to match real-world conditions this causal knowledge must be learned without access to supervised data. To address this problem we present a novel method that learns to discover objects and model their physical interactions from raw visual images in a purely emph{unsupervised} fashion. It incorporates prior knowledge about the compositional nature of human perception to factor interactions between object-pairs and learn efficiently. On videos of bouncing balls we show the superior modelling capabilities of our method compared to other unsupervised neural approaches that do not incorporate such prior knowledge. We demonstrate its ability to handle occlusion and show that it can extrapolate learned knowledge to scenes with different numbers of objects.

قيم البحث

اقرأ أيضاً

Many real world tasks such as reasoning and physical interaction require identification and manipulation of conceptual entities. A first step towards solving these tasks is the automated discovery of distributed symbol-like representations. In this p aper, we explicitly formalize this problem as inference in a spatial mixture model where each component is parametrized by a neural network. Based on the Expectation Maximization framework we then derive a differentiable clustering method that simultaneously learns how to group and represent individual entities. We evaluate our method on the (sequential) perceptual grouping task and find that it is able to accurately recover the constituent objects. We demonstrate that the learned representations are useful for next-step prediction.
Any clustering algorithm must synchronously learn to model the clusters and allocate data to those clusters in the absence of labels. Mixture model-based methods model clusters with pre-defined statistical distributions and allocate data to those clu sters based on the cluster likelihoods. They iteratively refine those distribution parameters and member assignments following the Expectation-Maximization (EM) algorithm. However, the cluster representability of such hand-designed distributions that employ a limited amount of parameters is not adequate for most real-world clustering tasks. In this paper, we realize mixture model-based clustering with a neural network where the final layer neurons, with the aid of an additional transformation, approximate cluster distribution outputs. The network parameters pose as the parameters of those distributions. The result is an elegant, much-generalized representation of clusters than a restricted mixture of hand-designed distributions. We train the network end-to-end via batch-wise EM iterations where the forward pass acts as the E-step and the backward pass acts as the M-step. In image clustering, the mixture-based EM objective can be used as the clustering objective along with existing representation learning methods. In particular, we show that when mixture-EM optimization is fused with consistency optimization, it improves the sole consistency optimization performance in clustering. Our trained networks outperform single-stage deep clustering methods that still depend on k-means, with unsupervised classification accuracy of 63.8% in STL10, 58% in CIFAR10, 25.9% in CIFAR100, and 98.9% in MNIST.
The Expectation Maximization (EM) algorithm is of key importance for inference in latent variable models including mixture of regressors and experts, missing observations. This paper introduces a novel EM algorithm, called texttt{SPIDER-EM}, for infe rence from a training set of size $n$, $n gg 1$. At the core of our algorithm is an estimator of the full conditional expectation in the {sf E}-step, adapted from the stochastic path-integrated differential estimator ({tt SPIDER}) technique. We derive finite-time complexity bounds for smooth non-convex likelihood: we show that for convergence to an $epsilon$-approximate stationary point, the complexity scales as $K_{operatorname{Opt}} (n,epsilon )={cal O}(epsilon^{-1})$ and $K_{operatorname{CE}}( n,epsilon ) = n+ sqrt{n} {cal O}(epsilon^{-1} )$, where $K_{operatorname{Opt}}( n,epsilon )$ and $K_{operatorname{CE}}(n, epsilon )$ are respectively the number of {sf M}-steps and the number of per-sample conditional expectations evaluations. This improves over the state-of-the-art algorithms. Numerical results support our findings.
Fast Incremental Expectation Maximization (FIEM) is a version of the EM framework for large datasets. In this paper, we first recast FIEM and other incremental EM type algorithms in the {em Stochastic Approximation within EM} framework. Then, we prov ide nonasymptotic bounds for the convergence in expectation as a function of the number of examples $n$ and of the maximal number of iterations $kmax$. We propose two strategies for achieving an $epsilon$-approximate stationary point, respectively with $kmax = O(n^{2/3}/epsilon)$ and $kmax = O(sqrt{n}/epsilon^{3/2})$, both strategies relying on a random termination rule before $kmax$ and on a constant step size in the Stochastic Approximation step. Our bounds provide some improvements on the literature. First, they allow $kmax$ to scale as $sqrt{n}$ which is better than $n^{2/3}$ which was the best rate obtained so far; it is at the cost of a larger dependence upon the tolerance $epsilon$, thus making this control relevant for small to medium accuracy with respect to the number of examples $n$. Second, for the $n^{2/3}$-rate, the numerical illustrations show that thanks to an optimized choice of the step size and of the bounds in terms of quantities characterizing the optimization problem at hand, our results desig a less conservative choice of the step size and provide a better control of the convergence in expectation.
Learning structured representations of visual scenes is currently a major bottleneck to bridging perception with reasoning. While there has been exciting progress with slot-based models, which learn to segment scenes into sets of objects, learning co nfigurational properties of entire groups of objects is still under-explored. To address this problem, we introduce Constellation, a network that learns relational abstractions of static visual scenes, and generalises these abstractions over sensory particularities, thus offering a potential basis for abstract relational reasoning. We further show that this basis, along with language association, provides a means to imagine sensory content in new ways. This work is a first step in the explicit representation of visual relationships and using them for complex cognitive procedures.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا