ترغب بنشر مسار تعليمي؟ اضغط هنا

Generative Temporal Models with Spatial Memory for Partially Observed Environments

70   0   0.0 ( 0 )
 نشر من قبل Marco Fraccaro
 تاريخ النشر 2018
والبحث باللغة English




اسأل ChatGPT حول البحث

In model-based reinforcement learning, generative and temporal models of environments can be leveraged to boost agent performance, either by tuning the agents representations during training or via use as part of an explicit planning mechanism. However, their application in practice has been limited to simplistic environments, due to the difficulty of training such models in larger, potentially partially-observed and 3D environments. In this work we introduce a novel action-conditioned generative model of such challenging environments. The model features a non-parametric spatial memory system in which we store learned, disentangled representations of the environment. Low-dimensional spatial updates are computed using a state-space model that makes use of knowledge on the prior dynamics of the moving agent, and high-dimensional visual observations are modelled with a Variational Auto-Encoder. The result is a scalable architecture capable of performing coherent predictions over hundreds of time steps across a range of partially observed 2D and 3D environments.

قيم البحث

اقرأ أيضاً

Real-world networks such as social and communication networks are too large to be observed entirely. Such networks are often partially observed such that network size, network topology, and nodes of the original network are unknown. In this paper we formalize the Adaptive Graph Exploring problem. We assume that we are given an incomplete snapshot of a large network and additional nodes can be discovered by querying nodes in the currently observed network. The goal of this problem is to maximize the number of observed nodes within a given query budget. Querying which set of nodes maximizes the size of the observed network? We formulate this problem as an exploration-exploitation problem and propose a novel nonparametric multi-arm bandit (MAB) algorithm for identifying which nodes to be queried. Our contributions include: (1) $i$KNN-UCB, a novel nonparametric MAB algorithm, applies $k$-nearest neighbor UCB to the setting when the arms are presented in a vector space, (2) provide theoretical guarantee that $i$KNN-UCB algorithm has sublinear regret, and (3) applying $i$KNN-UCB algorithm on synthetic networks and real-world networks from different domains, we show that our method discovers up to 40% more nodes compared to existing baselines.
120 - Andrew J. Black 2018
We present an importance sampling algorithm that can produce realisations of Markovian epidemic models that exactly match observations, taken to be the number of a single event type over a period of time. The importance sampling can be used to constr uct an efficient particle filter that targets the states of a system and hence estimate the likelihood to perform Bayesian parameter inference. When used in a particle marginal Metropolis Hastings scheme, the importance sampling provides a large speed-up in terms of the effective sample size per unit of computational time, compared to simple bootstrap sampling. The algorithm is general, with minimal restrictions, and we show how it can be applied to any discrete-state continuous-time Markov chain where we wish to exactly match the number of a single event type over a period of time.
Despite recent advances, the remaining bottlenecks in deep generative models are necessity of extensive training and difficulties with generalization from small number of training examples. We develop a new generative model called Generative Matching Network which is inspired by the recently proposed matching networks for one-shot learning in discriminative tasks. By conditioning on the additional input dataset, our model can instantly learn new concepts that were not available in the training data but conform to a similar generative process. The proposed framework does not explicitly restrict diversity of the conditioning data and also does not require an extensive inference procedure for training or adaptation. Our experiments on the Omniglot dataset demonstrate that Generative Matching Networks significantly improve predictive performance on the fly as more additional data is available and outperform existing state of the art conditional generative models.
Neural samplers such as variational autoencoders (VAEs) or generative adversarial networks (GANs) approximate distributions by transforming samples from a simple random source---the latent space---to samples from a more complex distribution represent ed by a dataset. While the manifold hypothesis implies that the density induced by a dataset contains large regions of low density, the training criterions of VAEs and GANs will make the latent space densely covered. Consequently points that are separated by low-density regions in observation space will be pushed together in latent space, making stationary distances poor proxies for similarity. We transfer ideas from Riemannian geometry to this setting, letting the distance between two points be the shortest path on a Riemannian manifold induced by the transformation. The method yields a principled distance measure, provides a tool for visual inspection of deep generative models, and an alternative to linear interpolation in latent space. In addition, it can be applied for robot movement generalization using previously learned skills. The method is evaluated on a synthetic dataset with known ground truth; on a simulated robot arm dataset; on human motion capture data; and on a generative model of handwritten digits.
Humans use spatial language to naturally describe object locations and their relations. Interpreting spatial language not only adds a perceptual modality for robots, but also reduces the barrier of interfacing with humans. Previous work primarily con siders spatial language as goal specification for instruction following tasks in fully observable domains, often paired with reference paths for reward-based learning. However, spatial language is inherently subjective and potentially ambiguous or misleading. Hence, in this paper, we consider spatial language as a form of stochastic observation. We propose SLOOP (Spatial Language Object-Oriented POMDP), a new framework for partially observable decision making with a probabilistic observation model for spatial language. We apply SLOOP to object search in city-scale environments. To interpret ambiguous, context-dependent prepositions (e.g. front), we design a simple convolutional neural network that predicts the language providers latent frame of reference (FoR) given the environment context. Search strategies are computed via an online POMDP planner based on Monte Carlo Tree Search. Evaluation based on crowdsourced language data, collected over areas of five cities in OpenStreetMap, shows that our approach achieves faster search and higher success rate compared to baselines, with a wider margin as the spatial language becomes more complex. Finally, we demonstrate the proposed method in AirSim, a realistic simulator where a drone is tasked to find cars in a neighborhood environment.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا