ترغب بنشر مسار تعليمي؟ اضغط هنا

POMDPs in Continuous Time and Discrete Spaces

100   0   0.0 ( 0 )
 نشر من قبل Bastian Alt
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Many processes, such as discrete event systems in engineering or population dynamics in biology, evolve in discrete space and continuous time. We consider the problem of optimal decision making in such discrete state and action space systems under partial observability. This places our work at the intersection of optimal filtering and optimal control. At the current state of research, a mathematical description for simultaneous decision making and filtering in continuous time with finite state and action spaces is still missing. In this paper, we give a mathematical description of a continuous-time partial observable Markov decision process (POMDP). By leveraging optimal filtering theory we derive a Hamilton-Jacobi-Bellman (HJB) type equation that characterizes the optimal solution. Using techniques from deep learning we approximately solve the resulting partial integro-differential equation. We present (i) an approach solving the decision problem offline by learning an approximation of the value function and (ii) an online algorithm which provides a solution in belief space using deep reinforcement learning. We show the applicability on a set of toy examples which pave the way for future methods providing solutions for high dimensional problems.



قيم البحث

اقرأ أيضاً

Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-ba sed algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However there has been no formal theoretical justification for this technique. This work offers such a justification, proving that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution by increasing computational power.
In a batch of synchronized queues, customers can only be serviced all at once or not at all, implying that service remains idle if at least one queue is empty. We propose that a batch of $n$ synchronized queues in a discrete-time setting is quasi-sta ble for $n in {2,3}$ and unstable for $n geq 4$. A correspondence between such systems and a random-walk-like discrete-time Markov chain (DTMC), which operates on a quotient space of the original state-space, is derived. Using this relation, we prove the proposition by showing that the DTMC is transient for $n geq 4$ and null-recurrent (hence quasi-stability) for $n in {2,3}$ via evaluating infinite power sums over skewed binomial coefficients. Ignoring the special structure of the quotient space, the proposition can be interpreted as a result of Polyas theorem on random walks, since the dimension of said space is $d-1$.
Let a labeled dataset be given with scattered samples and consider the hypothesis of the ground-truth belonging to the reproducing kernel Hilbert space (RKHS) of a known positive-definite kernel. It is known that out-of-sample bounds can be establish ed at unseen input locations, thus limiting the risk associated with learning this function. We show how computing tight, finite-sample uncertainty bounds amounts to solving parametric quadratically constrained linear programs. In our setting, the outputs are assumed to be contaminated by bounded measurement noise that can otherwise originate from any compactly supported distribution. No independence assumptions are made on the available data. Numerical experiments are presented to compare the present results with other closed-form alternatives.
113 - Weiming Xiang 2021
This paper deals with the stability analysis problem of discrete-time switched linear systems with ranged dwell time. A novel concept called L-switching-cycle is proposed, which contains sequences of multiple activation cycles satisfying the prescrib ed ranged dwell time constraint. Based on L-switching-cycle, two sufficient conditions are proposed to ensure the global uniform asymptotic stability of discrete-time switched linear systems. It is noted that two conditions are equivalent in stability analysis with the same $L$-switching-cycle. These two sufficient conditions can be viewed as generalizations of the clock-dependent Lyapunov and multiple Lyapunov function methods, respectively. Furthermore, it has been proven that the proposed L-switching-cycle can eventually achieve the nonconservativeness in stability analysis as long as a sufficiently long L-switching-cycle is adopted. A numerical example is provided to illustrate our theoretical results.
This article derives lower bounds on the convergence rate of continuous-time gradient-based optimization algorithms. The algorithms are subjected to a time-normalization constraint that avoids a reparametrization of time in order to make the discussi on of continuous-time convergence rates meaningful. We reduce the multi-dimensional problem to a single dimension, recover well-known lower bounds from the discrete-time setting, and provide insight into why these lower bounds occur. We present algorithms that achieve the proposed lower bounds, even when the function class under consideration includes certain nonconvex functions.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا