ﻻ يوجد ملخص باللغة العربية
We consider a general asynchronous Stochastic Approximation (SA) scheme featuring a weighted infinity-norm contractive operator, and prove a bound on its finite-time convergence rate on a single trajectory. Additionally, we specialize the result to asynchronous $Q$-learning. The resulting bound matches the sharpest available bound for synchronous $Q$-learning, and improves over previous known bounds for asynchronous $Q$-learning.
Motivated by broad applications in reinforcement learning and machine learning, this paper considers the popular stochastic gradient descent (SGD) when the gradients of the underlying objective function are sampled from Markov processes. This Markov
Although Q-learning is one of the most successful algorithms for finding the best action-value function (and thus the optimal policy) in reinforcement learning, its implementation often suffers from large overestimation of Q-function values incurred
Stochastic approximation, a data-driven approach for finding the fixed point of an unknown operator, provides a unified framework for treating many problems in stochastic optimization and reinforcement learning. Motivated by a growing interest in mul
In machine learning and optimization community there are two main approaches for convex risk minimization problem, namely, the Stochastic Approximation (SA) and the Sample Average Approximation (SAA). In terms of oracle complexity (required number of
Asynchronous stochastic approximations (SAs) are an important class of model-free algorithms, tools and techniques that are popular in multi-agent and distributed control scenarios. To counter Bellmans curse of dimensionality, such algorithms are cou