Renewal Monte Carlo: Renewal theory based reinforcement learning

225 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jayakumar Subramanian

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Jayakumar Subramanian - Aditya Mahajan

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we present an online reinforcement learning algorithm, called Renewal Monte Carlo (RMC), for infinite horizon Markov decision processes with a designated start state. RMC is a Monte Carlo algorithm and retains the advantages of Monte Carlo methods including low bias, simplicity, and ease of implementation while, at the same time, circumvents their key drawbacks of high variance and delayed (end of episode) updates. The key ideas behind RMC are as follows. First, under any reasonable policy, the reward process is ergodic. So, by renewal theory, the performance of a policy is equal to the ratio of expected discounted reward to the expected discounted time over a regenerative cycle. Second, by carefully examining the expression for performance gradient, we propose a stochastic approximation algorithm that only requires estimates of the expected discounted reward and discounted time over a regenerative cycle and their gradients. We propose two unbiased estimators for evaluating performance gradients---a likelihood ratio based estimator and a simultaneous perturbation based estimator---and show that for both estimators, RMC converges to a locally optimal policy. We generalize the RMC algorithm to post-decision state models and also present a variant that converges faster to an approximately optimal policy. We conclude by presenting numerical experiments on a randomly generated MDP, event-triggered communication, and inventory management.

قيم البحث

332 - Sebastian Schulze , Owain Evans 2018

Active Reinforcement Learning (ARL) is a twist on RL where the agent observes reward information only if it pays a cost. This subtle change makes exploration substantially more challenging. Powerful principles in RL like optimism, Thompson sampling, and random exploration do not help with ARL. We relate ARL in tabular environments to Bayes-Adaptive MDPs. We provide an ARL algorithm using Monte-Carlo Tree Search that is asymptotically Bayes optimal. Experimentally, this algorithm is near-optimal on small Bandit problems and MDPs. On larger MDPs it outperforms a Q-learner augmented with specialised heuristics for ARL. By analysing exploration behaviour in detail, we uncover obstacles to scaling up simulation-based algorithms for ARL.

التعلم الآلي التعلم الالي

Intermittent Demand Forecasting with Renewal Processes

123 - Ali Caner Turkmen , Tim Januschowski , Yuyang Wang 2020

Intermittency is a common and challenging problem in demand forecasting. We introduce a new, unified framework for building intermittent demand forecasting models, which incorporates and allows to generalize existing methods in several directions. Ou r framework is based on extensions of well-established model-based methods to discrete-time renewal processes, which can parsimoniously account for patterns such as aging, clustering and quasi-periodicity in demand arrivals. The connection to discrete-time renewal processes allows not only for a principled extension of Croston-type models, but also for an natural inclusion of neural network based models---by replacing exponential smoothing with a recurrent neural network. We also demonstrate that modeling continuous-time demand arrivals, i.e., with a temporal point process, is possible via a trivial extension of our framework. This leads to more flexible modeling in scenarios where data of individual purchase orders are directly available with granular timestamps. Complementing this theoretical advancement, we demonstrate the efficacy of our framework for forecasting practice via an extensive empirical study on standard intermittent demand data sets, in which we report predictive accuracy in a variety of scenarios that compares favorably to the state of the art.

التعلم الآلي تطبيقات الإحصاء التعلم الالي

Quantum renewal processes

74 - Bassano Vacchini 2019

We introduce a general construction of master equations with memory kernel whose solutions are given by completely positive trace preserving maps. These dynamics going beyond the Lindblad paradigm are obtained with reference to classical renewal proc esses, so that they are termed quantum renewal processes. They can be described by means of semigroup dynamics interrupted by jumps, separated by independently distributed time intervals, following suitable waiting time distributions. In this framework, one can further introduce modified processes, in which the first few events follow different distributions. A crucial role, marking an important difference with respect to the classical case, is played by operator ordering. Indeed, for the same choice of basic quantum transformations, different quantum dynamics arise. In particular, for the case of modified processes, it is natural to consider the time inverted operator ordering, in which the last few events are distributed differently.

فيزياء الكم الفيزياء الرياضية الفيزياء الرياضية

Cluster-Based Social Reinforcement Learning

156 - Mahak Goindani , Jennifer Neville 2020

Social Reinforcement Learning methods, which model agents in large networks, are useful for fake news mitigation, personalized teaching/healthcare, and viral marketing, but it is challenging to incorporate inter-agent dependencies into the models eff ectively due to network size and sparse interaction data. Previous social RL approaches either ignore agents dependencies or model them in a computationally intensive manner. In this work, we incorporate agent dependencies efficiently in a compact model by clustering users (based on their payoff and contribution to the goal) and combine this with a method to easily derive personalized agent-level policies from cluster-level policies. We also propose a dynamic clustering approach that captures changing user behavior. Experiments on real-world datasets illustrate that our proposed approach learns more accurate policy estimates and converges more quickly, compared to several baselines that do not use agent correlations or only use static clusters.

التعلم الآلي التعلم الالي

Renewal-anomalous-heterogeneous files

140 - Ophir Flomenbom 2010

Renewal-anomalous-heterogeneous files are solved. A simple file is made of Brownian hard spheres that diffuse stochastically in an effective 1D channel. Generally, Brownian files are heterogeneous: the spheres diffusion coefficients are distributed a nd the initial spheres density is non-uniform. In renewal-anomalous files, the distribution of waiting times for individual jumps is exponential as in Brownian files, yet obeys: {psi}_{alpha} (t)~t^(-1-{alpha}), 0<{alpha}<1. The file is renewal as all the particles attempt to jump at the same time. It is shown that the mean square displacement (MSD) in a renewal-anomalous-heterogeneous file, <r^2>, obeys, <r^2>~[<r^2>_{nrml}]^{alpha}, where <r^2 >_{nrml} is the MSD in the corresponding Brownian file. This scaling is an outcome of an exact relation (derived here) connecting probability density functions of Brownian files and renewal-anomalous files. It is also shown that non-renewal-anomalous files are slower than the corresponding renewal ones.

مادة مكثفة ناعمة العمليات الفرعية