ترغب بنشر مسار تعليمي؟ اضغط هنا

Optimistic Reinforcement Learning by Forward Kullback-Leibler Divergence Optimization

300   0   0.0 ( 0 )
 نشر من قبل Taisuke Kobayashi
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English
 تأليف Taisuke Kobayashi




اسأل ChatGPT حول البحث

This paper addresses a new interpretation of reinforcement learning (RL) as reverse Kullback-Leibler (KL) divergence optimization, and derives a new optimization method using forward KL divergence. Although RL originally aims to maximize return indirectly through optimization of policy, the recent work by Levine has proposed a different derivation process with explicit consideration of optimality as stochastic variable. This paper follows this concept and formulates the traditional learning laws for both value function and policy as the optimization problems with reverse KL divergence including optimality. Focusing on the asymmetry of KL divergence, the new optimization problems with forward KL divergence are derived. Remarkably, such new optimization problems can be regarded as optimistic RL. That optimism is intuitively specified by a hyperparameter converted from an uncertainty parameter. In addition, it can be enhanced when it is integrated with prioritized experience replay and eligibility traces, both of which accelerate learning. The effects of this expected optimism was investigated through learning tendencies on numerical simulations using Pybullet. As a result, moderate optimism accelerated learning and yielded higher rewards. In a realistic robotic simulation, the proposed method with the moderate optimism outperformed one of the state-of-the-art RL method.



قيم البحث

اقرأ أيضاً

We propose a method to fuse posterior distributions learned from heterogeneous datasets. Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors and proceeds using a simple assign-and-average app roach. The components of the dataset posteriors are assigned to the proposed global model components by solving a regularized variant of the assignment problem. The global components are then updated based on these assignments by their mean under a KL divergence. For exponential family variational distributions, our formulation leads to an efficient non-parametric algorithm for computing the fused model. Our algorithm is easy to describe and implement, efficient, and competitive with state-of-the-art on motion capture analysis, topic modeling, and federated learning of Bayesian neural networks.
Renyi divergence is related to Renyi entropy much like Kullback-Leibler divergence is related to Shannons entropy, and comes up in many settings. It was introduced by Renyi as a measure of information that satisfies almost the same axioms as Kullback -Leibler divergence, and depends on a parameter that is called its order. In particular, the Renyi divergence of order 1 equals the Kullback-Leibler divergence. We review and extend the most important properties of Renyi divergence and Kullback-Leibler divergence, including convexity, continuity, limits of $sigma$-algebras and the relation of the special order 0 to the Gaussian dichotomy and contiguity. We also show how to generalize the Pythagorean inequality to orders different from 1, and we extend the known equivalence between channel capacity and minimax redundancy to continuous channel inputs (for all orders) and present several other minimax results.
Variational Inference (VI) is a popular alternative to asymptotically exact sampling in Bayesian inference. Its main workhorse is optimization over a reverse Kullback-Leibler divergence (RKL), which typically underestimates the tail of the posterior leading to miscalibration and potential degeneracy. Importance sampling (IS), on the other hand, is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures. The quality of IS crucially depends on the choice of the proposal distribution. Ideally, the proposal distribution has heavier tails than the target, which is rarely achievable by minimizing the RKL. We thus propose a novel combination of optimization and sampling techniques for approximate Bayesian inference by constructing an IS proposal distribution through the minimization of a forward KL (FKL) divergence. This approach guarantees asymptotic consistency and a fast convergence towards both the optimal IS estimator and the optimal variational approximation. We empirically demonstrate on real data that our method is competitive with variational boosting and MCMC.
We introduce hardness in relative entropy, a new notion of hardness for search problems which on the one hand is satisfied by all one-way functions and on the other hand implies both next-block pseudoentropy and inaccessible entropy, two forms of com putational entropy used in recent constructions of pseudorandom generators and statistically hiding commitment schemes, respectively. Thus, hardness in relative entropy unifies the latter two notions of computational entropy and sheds light on the apparent duality between them. Additionally, it yields a more modular and illuminating proof that one-way functions imply next-block inaccessible entropy, similar in structure to the proof that one-way functions imply next-block pseudoentropy (Vadhan and Zheng, STOC 12).
We study the problem of spectrum estimation from transmission data of a known phantom. The goal is to reconstruct an x-ray spectrum that can accurately model the x-ray transmission curves and reflects a realistic shape of the typical energy spectra o f the CT system. To this end, spectrum estimation is posed as an optimization problem with x-ray spectrum as unknown variables, and a Kullback-Leibler (KL) divergence constraint is employed to incorporate prior knowledge of the spectrum and enhance numerical stability of the estimation process. The formulated constrained optimization problem is convex and can be solved efficiently by use of the exponentiated-gradient (EG) algorithm. We demonstrate the effectiveness of the proposed approach on the simulated and experimental data. The comparison to the expectation-maximization (EM) method is also discussed. In simulations, the proposed algorithm is seen to yield x-ray spectra that closely match the ground truth and represent the attenuation process of x-ray photons in materials, both included and not included in the estimation process. In experiments, the calculated transmission curve is in good agreement with the measured transmission curve, and the estimated spectra exhibits physically realistic looking shapes. The results further show the comparable performance between the proposed optimization-based approach and EM. In conclusion, our formulation of a constrained optimization provides an interpretable and flexible framework for spectrum estimation. Moreover, a KL-divergence constraint can include a prior spectrum and appears to capture important features of x-ray spectrum, allowing accurate and robust estimation of x-ray spectrum in CT imaging.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا