ﻻ يوجد ملخص باللغة العربية
We analyze the Gamblers problem, a simple reinforcement learning problem where the gambler has the chance to double or lose the bets until the target is reached. This is an early example introduced in the reinforcement learning textbook by Sutton and Barto (2018), where they mention an interesting pattern of the optimal value function with high-frequency components and repeating non-smooth points. It is however without further investigation. We provide the exact formula for the optimal value function for both the discrete and the continuous cases. Though simple as it might seem, the value function is pathological: fractal, self-similar, derivative taking either zero or infinity, and not written as elementary functions. It is in fact one of the generalized Cantor functions, where it holds a complexity that has been uncharted thus far. Our analyses could provide insights into improving value function approximation, gradient-based algorithms, and Q-learning, in real applications and implementations.
In this paper, we propose a new multi-armed bandit problem called the Gamblers Ruin Bandit Problem (GRBP). In the GRBP, the learner proceeds in a sequence of rounds, where each round is a Markov Decision Process (MDP) with two actions (arms): a conti
We consider an open quantum system, with dissipation applied only to a part of its degrees of freedom, evolving via a quantum Markov dynamics. We demonstrate that, in the Zeno regime of large dissipation, the relaxation of the quantum system towards
In this memoir, we develop a general framework which allows for a simultaneous study of labeled and unlabeled near alignment data problems in $mathbb R^D$ and the Whitney near isometry extension problem for discrete and non-discrete subsets of $mathb
We argue that the celebrated Stefan condition on the moving interphase, accepted in mathematical physics up to now, can not be imposed if energy sources are spatially distributed in the volume. A method based on Tikhonov and Samarskiis ideas for nume
We advocate for a practical Maximum Likelihood Estimation (MLE) approach for regression and forecasting, as an alternative to the typical approach of Empirical Risk Minimization (ERM) for a specific target metric. This approach is better suited to ca