ﻻ يوجد ملخص باللغة العربية
Approximate Dynamic Programming (ADP) is a methodology to solve multi-stage stochastic optimization problems in multi-dimensional discrete or continuous spaces. ADP approximates the optimal value function by adaptively sampling both action and state space. It provides a tractable approach to very large problems, but can suffer from the exploration-exploitation dilemma. We propose a novel approach for selecting actions using importance sampling weighted by the value function approximation in continuous decision spaces to address this dilemma. An advantage of this approach is it balances exploration and exploitation without any tuning parameters when sampling actions compared to other exploration approaches such as Epsilon Greedy, instead relying only on the approximate value function. We compare the proposed algorithm with other exploration strategies in continuous action space in the context of a multi-stage generation expansion planning problem under uncertainty.
Federated learning encapsulates distributed learning strategies that are managed by a central unit. Since it relies on using a selected number of agents at each iteration, and since each agent, in turn, taps into its local data, it is only natural to
Federated learning involves a mixture of centralized and decentralized processing tasks, where a server regularly selects a sample of the agents and these in turn sample their local data to compute stochastic gradients for their learning updates. Thi
The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy
We develop a novel computational method for evaluating the extreme excursion probabilities arising from random initialization of nonlinear dynamical systems. The method uses excursion probability theory to formulate a sequence of Bayesian inverse pro
Importance sampling (IS) is a Monte Carlo technique for the approximation of intractable distributions and integrals with respect to them. The origin of IS dates from the early 1950s. In the last decades, the rise of the Bayesian paradigm and the inc