ترغب بنشر مسار تعليمي؟ اضغط هنا

The Gamblers Problem and Beyond

133   0   0.0 ( 0 )
 نشر من قبل Baoxiang Wang
 تاريخ النشر 2019
والبحث باللغة English




اسأل ChatGPT حول البحث

We analyze the Gamblers problem, a simple reinforcement learning problem where the gambler has the chance to double or lose the bets until the target is reached. This is an early example introduced in the reinforcement learning textbook by Sutton and Barto (2018), where they mention an interesting pattern of the optimal value function with high-frequency components and repeating non-smooth points. It is however without further investigation. We provide the exact formula for the optimal value function for both the discrete and the continuous cases. Though simple as it might seem, the value function is pathological: fractal, self-similar, derivative taking either zero or infinity, and not written as elementary functions. It is in fact one of the generalized Cantor functions, where it holds a complexity that has been uncharted thus far. Our analyses could provide insights into improving value function approximation, gradient-based algorithms, and Q-learning, in real applications and implementations.

قيم البحث

اقرأ أيضاً

In this paper, we propose a new multi-armed bandit problem called the Gamblers Ruin Bandit Problem (GRBP). In the GRBP, the learner proceeds in a sequence of rounds, where each round is a Markov Decision Process (MDP) with two actions (arms): a conti nuation action that moves the learner randomly over the state space around the current state; and a terminal action that moves the learner directly into one of the two terminal states (goal and dead-end state). The current round ends when a terminal state is reached, and the learner incurs a positive reward only when the goal state is reached. The objective of the learner is to maximize its long-term reward (expected number of times the goal state is reached), without having any prior knowledge on the state transition probabilities. We first prove a result on the form of the optimal policy for the GRBP. Then, we define the regret of the learner with respect to an omnipotent oracle, which acts optimally in each round, and prove that it increases logarithmically over rounds. We also identify a condition under which the learners regret is bounded. A potential application of the GRBP is optimal medical treatment assignment, in which the continuation action corresponds to a conservative treatment and the terminal action corresponds to a risky treatment such as surgery.
We consider an open quantum system, with dissipation applied only to a part of its degrees of freedom, evolving via a quantum Markov dynamics. We demonstrate that, in the Zeno regime of large dissipation, the relaxation of the quantum system towards a pure quantum state is linked to the evolution of a classical Markov process towards a single absorbing state. The rates of the associated classical Markov process are determined by the original quantum dynamics. Extension of this correspondence to absorbing states with internal structure allows us to establish a general criterion for having a Zeno-limit nonequilibrium stationary state of arbitrary finite rank. An application of this criterion is illustrated in the case of an open XXZ spin-1/2 chain dissipatively coupled at its edges to baths with fixed and different polarizations. For this system, we find exact nonequilibrium steady-state solutions of ranks 1 and 2.
240 - Steven B. Damelin 2021
In this memoir, we develop a general framework which allows for a simultaneous study of labeled and unlabeled near alignment data problems in $mathbb R^D$ and the Whitney near isometry extension problem for discrete and non-discrete subsets of $mathb b R^D$ with certain geometries. In addition, we survey related work of ours on clustering, dimension reduction, manifold learning, vision as well as minimal energy partitions, discrepancy and min-max optimization. Numerous open problems in harmonic analysis, computer vision, manifold learning and signal processing connected to our work are given. A significant portion of the work in this memoir is based on joint research with Charles Fefferman in the papers [48], [49], [50], [51].
We argue that the celebrated Stefan condition on the moving interphase, accepted in mathematical physics up to now, can not be imposed if energy sources are spatially distributed in the volume. A method based on Tikhonov and Samarskiis ideas for nume rical solution of the problem is developed. Mathematical modelling of energy relaxation of some processes useful in modern ion beam technologies is fulfilled. Necessity of taking into account effects completely outside the Stefan formulation is demonstrated.
We advocate for a practical Maximum Likelihood Estimation (MLE) approach for regression and forecasting, as an alternative to the typical approach of Empirical Risk Minimization (ERM) for a specific target metric. This approach is better suited to ca pture inductive biases such as prior domain knowledge in datasets, and can output post-hoc estimators at inference time that can optimize different types of target metrics. We present theoretical results to demonstrate that our approach is always competitive with any estimator for the target metric under some general conditions, and in many practical settings (such as Poisson Regression) can actually be much superior to ERM. We demonstrate empirically that our method instantiated with a well-designed general purpose mixture likelihood family can obtain superior performance over ERM for a variety of tasks across time-series forecasting and regression datasets with different data distributions.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا