Reward-Weighted Regression Converges to a Global Optimum

167 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Dylan Ashley

تاريخ النشر 2021

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Miroslav v{S}trupl - Francesco Faccio - Dylan R. Ashley

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Reward-Weighted Regression (RWR) belongs to a family of widely known iterative Reinforcement Learning algorithms based on the Expectation-Maximization framework. In this family, learning at each iteration consists of sampling a batch of trajectories using the current policy and fitting a new policy to maximize a return-weighted log-likelihood of actions. Although RWR is known to yield monotonic improvement of the policy under certain circumstances, whether and under which conditions RWR converges to the optimal policy have remained open questions. In this paper, we provide for the first time a proof that RWR converges to a global optimum when no function approximation is used.

قيم البحث

68 - Hunter Lang , David Sontag , Aravindan Vijayaraghavan 2020

We prove that the $alpha$-expansion algorithm for MAP inference always returns a globally optimal assignment for Markov Random Fields with Potts pairwise potentials, with a catch: the returned assignment is only guaranteed to be optimal for an instan ce within a small perturbation of the original problem instance. In other words, all local minima with respect to expansion moves are global minima to slightly perturb

التعلم الالي الذكاء الاصطناعي بنى وهياكل البيانات والخوارزميات

Shapley Q-value: A Local Reward Approach to Solve Global Reward Games

93 - Jianhong Wang , Yuan Zhang , Tae-Kyun Kim 2019

Cooperative game is a critical research area in the multi-agent reinforcement learning (MARL). Global reward game is a subclass of cooperative games, where all agents aim to maximize the global reward. Credit assignment is an important problem studie d in the global reward game. Most of previous works stood by the view of non-cooperative-game theoretical framework with the shared reward approach, i.e., each agent being assigned a shared global reward directly. This, however, may give each agent an inaccurate reward on its contribution to the group, which could cause inefficient learning. To deal with this problem, we i) introduce a cooperative-game theoretical framework called extended convex game (ECG) that is a superset of global reward game, and ii) propose a local reward approach called Shapley Q-value. Shapley Q-value is able to distribute the global reward, reflecting each agents own contribution in contrast to the shared reward approach. Moreover, we derive an MARL algorithm called Shapley Q-value deep deterministic policy gradient (SQDDPG), using Shapley Q-value as the critic for each agent. We evaluate SQDDPG on Cooperative Navigation, Prey-and-Predator and Traffic Junction, compared with the state-of-the-art algorithms, e.g., MADDPG, COMA, Independent DDPG and Independent A2C. In the experiments, SQDDPG shows a significant improvement on the convergence rate. Finally, we plot Shapley Q-value and validate the property of fair credit assignment.

التعلم الآلي الذكاء الاصطناعي أنظمة متعددة العملاء

The global optimum of shallow neural network is attained by ridgelet transform

82 - Sho Sonoda , Isao Ishikawa , Masahiro Ikeda 2018

We prove that the global minimum of the backpropagation (BP) training problem of neural networks with an arbitrary nonlinear activation is given by the ridgelet transform. A series of computational experiments show that there exists an interesting si milarity between the scatter plot of hidden parameters in a shallow neural network after the BP training and the spectrum of the ridgelet transform. By introducing a continuous model of neural networks, we reduce the training problem to a convex optimization in an infinite dimensional Hilbert space, and obtain the explicit expression of the global optimizer via the ridgelet transform.

التعلم الالي التعلم الآلي

On the benefits of maximum likelihood estimation for Regression and Forecasting

166 - Pranjal Awasthi , Abhimanyu Das , Rajat Sen 2021

We advocate for a practical Maximum Likelihood Estimation (MLE) approach for regression and forecasting, as an alternative to the typical approach of Empirical Risk Minimization (ERM) for a specific target metric. This approach is better suited to ca pture inductive biases such as prior domain knowledge in datasets, and can output post-hoc estimators at inference time that can optimize different types of target metrics. We present theoretical results to demonstrate that our approach is always competitive with any estimator for the target metric under some general conditions, and in many practical settings (such as Poisson Regression) can actually be much superior to ERM. We demonstrate empirically that our method instantiated with a well-designed general purpose mixture likelihood family can obtain superior performance over ERM for a variety of tasks across time-series forecasting and regression datasets with different data distributions.

التعلم الالي الذكاء الاصطناعي التعلم الآلي

Asking Easy Questions: A User-Friendly Approach to Active Reward Learning

370 - Erdem B{i}y{i}k , Malayandi Palan , Nicholas C. Landolfi 2019

Robots can learn the right reward function by querying a human expert. Existing approaches attempt to choose questions where the robot is most uncertain about the humans response; however, they do not consider how easy it will be for the human to ans wer! In this paper we explore an information gain formulation for optimally selecting questions that naturally account for the humans ability to answer. Our approach identifies questions that optimize the trade-off between robot and human uncertainty, and determines when these questions become redundant or costly. Simulations and a user study show our method not only produces easy questions, but also ultimately results in faster reward learning.

علم الروبوتات الذكاء الاصطناعي التعلم الآلي