بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Importance Sampling based Exploration in Q Learning

236 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Vijay Kumar

تاريخ النشر 2021

مجال البحث

والبحث باللغة English

تأليف Vijay Kumar - Mort Webster

التحسين والتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Approximate Dynamic Programming (ADP) is a methodology to solve multi-stage stochastic optimization problems in multi-dimensional discrete or continuous spaces. ADP approximates the optimal value function by adaptively sampling both action and state space. It provides a tractable approach to very large problems, but can suffer from the exploration-exploitation dilemma. We propose a novel approach for selecting actions using importance sampling weighted by the value function approximation in continuous decision spaces to address this dilemma. An advantage of this approach is it balances exploration and exploitation without any tuning parameters when sampling actions compared to other exploration approaches such as Epsilon Greedy, instead relying only on the approximate value function. We compare the proposed algorithm with other exploration strategies in continuous action space in the context of a multi-stage generation expansion planning problem under uncertainty.

قيم البحث

95 - Elsa Rizk , Stefan Vlaski , Ali H. Sayed 2020

Federated learning encapsulates distributed learning strategies that are managed by a central unit. Since it relies on using a selected number of agents at each iteration, and since each agent, in turn, taps into its local data, it is only natural to study optimal sampling policies for selecting agents and their data in federated learning implementations. Usually, only uniform sampling schemes are used. However, in this work, we examine the effect of importance sampling and devise schemes for sampling agents and data non-uniformly guided by a performance measure. We find that in schemes involving sampling without replacement, the performance of the resulting architecture is controlled by two factors related to data variability at each agent, and model variability across agents. We illustrate the theoretical findings with experiments on simulated and real data and show the improvement in performance that results from the proposed strategies.

التعلم الآلي

Optimal Importance Sampling for Federated Learning

83 - Elsa Rizk , Stefan Vlaski , Ali H. Sayed 2020

Federated learning involves a mixture of centralized and decentralized processing tasks, where a server regularly selects a sample of the agents and these in turn sample their local data to compute stochastic gradients for their learning updates. Thi s process runs continually. The sampling of both agents and data is generally uniform; however, in this work we consider non-uniform sampling. We derive optimal importance sampling strategies for both agent and data selection and show that non-uniform sampling without replacement improves the performance of the original FedAvg algorithm. We run experiments on a regression and classification problem to illustrate the theoretical results.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية

Conditional Importance Sampling for Off-Policy Learning

152 - Mark Rowland , Anna Harutyunyan , Hado van Hasselt 2019

The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

التعلم الآلي التعلم الالي

A Machine-Learning-Based Importance Sampling Method to Compute Rare Event Probabilities

74 - Vishwas Rao , Romit Maulik , Emil Constantinescu 2020

We develop a novel computational method for evaluating the extreme excursion probabilities arising from random initialization of nonlinear dynamical systems. The method uses excursion probability theory to formulate a sequence of Bayesian inverse pro blems that, when solved, yields the biasing distribution. Solving multiple Bayesian inverse problems can be expensive; more so in higher dimensions. To alleviate the computational cost, we build machine-learning-based surrogates to solve the Bayesian inverse problems that give rise to the biasing distribution. This biasing distribution can then be used in an importance sampling procedure to estimate the extreme excursion probabilities.

الفيزياء الحسابية

Advances in Importance Sampling

118 - Victor Elvira , Luca Martino 2021

Importance sampling (IS) is a Monte Carlo technique for the approximation of intractable distributions and integrals with respect to them. The origin of IS dates from the early 1950s. In the last decades, the rise of the Bayesian paradigm and the inc rease of the available computational resources have propelled the interest in this theoretically sound methodology. In this paper, we first describe the basic IS algorithm and then revisit the recent advances in this methodology. We pay particular attention to two sophisticated lines. First, we focus on multiple IS (MIS), the case where more than one proposal is available. Second, we describe adaptive IS (AIS), the generic methodology for adapting one or more proposals.

حساب

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الوادي الدولية الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Importance Sampling based Exploration in Q Learning

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً