ترغب بنشر مسار تعليمي؟ اضغط هنا

Hyperbolic Discounting and Learning over Multiple Horizons

191   0   0.0 ( 0 )
 نشر من قبل William Fedus
 تاريخ النشر 2019
والبحث باللغة English




اسأل ChatGPT حول البحث

Reinforcement learning (RL) typically defines a discount factor as part of the Markov Decision Process. The discount factor values future rewards by an exponential scheme that leads to theoretical convergence guarantees of the Bellman equation. However, evidence from psychology, economics and neuroscience suggests that humans and animals instead have hyperbolic time-preferences. In this work we revisit the fundamentals of discounting in RL and bridge this disconnect by implementing an RL agent that acts via hyperbolic discounting. We demonstrate that a simple approach approximates hyperbolic discount functions while still using familiar temporal-difference learning techniques in RL. Additionally, and independent of hyperbolic discounting, we make a surprising discovery that simultaneously learning value functions over multiple time-horizons is an effective auxiliary task which often improves over a strong value-based RL agent, Rainbow.



قيم البحث

اقرأ أيضاً

Efficient optimisation of black-box problems that comprise both continuous and categorical inputs is important, yet poses significant challenges. We propose a new approach, Continuous and Categorical Bayesian Optimisation (CoCaBO), which combines the strengths of multi-armed bandits and Bayesian optimisation to select values for both categorical and continuous inputs. We model this mixed-type space using a Gaussian Process kernel, designed to allow sharing of information across multiple categorical variables, each with multiple possible values; this allows CoCaBO to leverage all available data efficiently. We extend our method to the batch setting and propose an efficient selection procedure that dynamically balances exploration and exploitation whilst encouraging batch diversity. We demonstrate empirically that our method outperforms existing approaches on both synthetic and real-world optimisation tasks with continuous and categorical inputs.
The performance of modern machine learning methods highly depends on their hyperparameter configurations. One simple way of selecting a configuration is to use default settings, often proposed along with the publication and implementation of a new al gorithm. Those default values are usually chosen in an ad-hoc manner to work good enough on a wide variety of datasets. To address this problem, different automatic hyperparameter configuration algorithms have been proposed, which select an optimal configuration per dataset. This principled approach usually improves performance but adds additional algorithmic complexity and computational costs to the training procedure. As an alternative to this, we propose learning a set of complementary default values from a large database of prior empirical results. Selecting an appropriate configuration on a new dataset then requires only a simple, efficient and embarrassingly parallel search over this set. We demonstrate the effectiveness and efficiency of the approach we propose in comparison to random search and Bayesian Optimization.
While Multiple Instance (MI) data are point patterns -- sets or multi-sets of unordered points -- appropriate statistical point pattern models have not been used in MI learning. This article proposes a framework for model-based MI learning using poin t process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed.
298 - Diane Oyen , Terran Lane 2013
Bayesian network structure learning algorithms with limited data are being used in domains such as systems biology and neuroscience to gain insight into the underlying processes that produce observed data. Learning reliable networks from limited data is difficult, therefore transfer learning can improve the robustness of learned networks by leveraging data from related tasks. Existing transfer learning algorithms for Bayesian network structure learning give a single maximum a posteriori estimate of network models. Yet, many other models may be equally likely, and so a more informative result is provided by Bayesian structure discovery. Bayesian structure discovery algorithms estimate posterior probabilities of structural features, such as edges. We present transfer learning for Bayesian structure discovery which allows us to explore the shared and unique structural features among related tasks. Efficient computation requires that our transfer learning objective factors into local calculations, which we prove is given by a broad class of transfer biases. Theoretically, we show the efficiency of our approach. Empirically, we show that compared to single task learning, transfer learning is better able to positively identify true edges. We apply the method to whole-brain neuroimaging data.
508 - Fei Lu , Mauro Maggioni , Sui Tang 2019
Systems of interacting particles or agents have wide applications in many disciplines such as Physics, Chemistry, Biology and Economics. These systems are governed by interaction laws, which are often unknown: estimating them from observation data is a fundamental task that can provide meaningful insights and accurate predictions of the behaviour of the agents. In this paper, we consider the inverse problem of learning interaction laws given data from multiple trajectories, in a nonparametric fashion, when the interaction kernels depend on pairwise distances. We establish a condition for learnability of interaction kernels, and construct estimators that are guaranteed to converge in a suitable $L^2$ space, at the optimal min-max rate for 1-dimensional nonparametric regression. We propose an efficient learning algorithm based on least squares, which can be implemented in parallel for multiple trajectories and is therefore well-suited for the high dimensional, big data regime. Numerical simulations on a variety examples, including opinion dynamics, predator-swarm dynamics and heterogeneous particle dynamics, suggest that the learnability condition is satisfied in models used in practice, and the rate of convergence of our estimator is consistent with the theory. These simulations also suggest that our estimators are robust to noise in the observations, and produce accurate predictions of dynamics in relative large time intervals, even when they are learned from data collected in short time intervals.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا