أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Ivo Danihelka

Muesli: Combining Improvements in Policy Optimization

146 - Matteo Hessel , Ivo Danihelka , Fabio Viola 2021

We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZeros state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.

التعلم الآلي الذكاء الاصطناعي

Causally Correct Partial Models for Reinforcement Learning

175 - Danilo J. Rezende , Ivo Danihelka , George Papamakarios 2020

In reinforcement learning, we can learn a model of future observations and rewards, and use it to plan the agents next actions. However, jointly modeling future observations can be computationally expensive or even intractable if the observations are high-dimensional (e.g. images). For this reason, previous works have considered partial models, which model only part of the observation. In this paper, we show that partial models can be causally incorrect: they are confounded by the observations they dont model, and can therefore lead to incorrect planning. To address this, we introduce a general family of partial models that are provably causally correct, yet remain fast because they do not need to fully model future observations.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

The Cramer Distance as a Solution to Biased Wasserstein Gradients

91 - Marc G. Bellemare , Ivo Danihelka , Will Dabney 2017

The Wasserstein probability metric has received much attention from the machine learning community. Unlike the Kullback-Leibler divergence, which strictly measures change in probability, the Wasserstein metric reflects the underlying geometry between outcomes. The value of being sensitive to this geometry has been demonstrated, among others, in ordinal regression and generative modelling. In this paper we describe three natural properties of probability divergences that reflect requirements from machine learning: sum invariance, scale sensitivity, and unbiased sample gradients. The Wasserstein metric possesses the first two properties but, unlike the Kullback-Leibler divergence, does not possess the third. We provide empirical evidence suggesting that this is a serious issue in practice. Leveraging insights from probabilistic forecasting we propose an alternative to the Wasserstein metric, the Cramer distance. We show that the Cramer distance possesses all three desired properties, combining the best of the Wasserstein and Kullback-Leibler divergences. To illustrate the relevance of the Cramer distance in practice we design a new algorithm, the Cramer Generative Adversarial Network (GAN), and show that it performs significantly better than the related Wasserstein GAN.

التعلم الآلي التعلم الالي

Comparison of Maximum Likelihood and GAN-based training of Real NVPs

161 - Ivo Danihelka , Balaji Lakshminarayanan , Benigno Uria 2017

We train a generator by maximum likelihood and we also train the same generator architecture by Wasserstein GAN. We then compare the generated samples, exact log-probability densities and approximate Wasserstein distances. We show that an independent critic trained to approximate Wasserstein distance between the validation set and the generator distribution helps detect overfitting. Finally, we use ideas from the one-shot learning literature to develop a novel fast learning critic.

التعلم الآلي

Associative Long Short-Term Memory

302 - Ivo Danihelka , Greg Wayne , Benigno Uria 2016

We investigate a new method to augment recurrent neural networks with extra memory without increasing the number of network parameters. The system has an associative memory based on complex-valued vectors and is closely related to Holographic Reduced Representations and Long Short-Term Memory networks. Holographic Reduced Representations have limited capacity: as they store more information, each retrieval becomes noisier due to interference. Our system in contrast creates redundant copies of stored information, which enables retrieval with reduced noise. Experiments demonstrate faster learning on multiple memorization tasks.

الحوسبة العصبية والتطورية

DRAW: A Recurrent Neural Network For Image Generation

361 - Karol Gregor , Ivo Danihelka , Alex Graves 2015

This paper introduces the Deep Recurrent Attentive Writer (DRAW) neural network architecture for image generation. DRAW networks combine a novel spatial attention mechanism that mimics the foveation of the human eye, with a sequential variational aut o-encoding framework that allows for the iterative construction of complex images. The system substantially improves on the state of the art for generative models on MNIST, and, when trained on the Street View House Numbers dataset, it generates images that cannot be distinguished from real data with the naked eye.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي الحوسبة العصبية والتطورية

Deep AutoRegressive Networks

81 - Karol Gregor , Ivo Danihelka , Andriy Mnih 2013

We introduce a deep, generative autoencoder capable of learning hierarchies of distributed representations from data. Successive deep stochastic hidden layers are equipped with autoregressive connections, which enable the model to be sampled from qui ckly and exactly via ancestral sampling. We derive an efficient approximate parameter estimation method based on the minimum description length (MDL) principle, which can be seen as maximising a variational lower bound on the log-likelihood, with a feedforward neural network implementing approximate inference. We demonstrate state-of-the-art generative performance on a number of classic data sets: several UCI data sets, MNIST and Atari 2600 games.

التعلم الآلي التعلم الالي

Optimistic Simulated Exploration as an Incentive for Real Exploration

76 - Ivo Danihelka 2009

Many reinforcement learning exploration techniques are overly optimistic and try to explore every state. Such exploration is impossible in environments with the unlimited number of states. I propose to use simulated exploration with an optimistic mod el to discover promising paths for real exploration. This reduces the needs for the real exploration.

التعلم الآلي الذكاء الاصطناعي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد