ﻻ يوجد ملخص باللغة العربية
Many processes, such as discrete event systems in engineering or population dynamics in biology, evolve in discrete space and continuous time. We consider the problem of optimal decision making in such discrete state and action space systems under partial observability. This places our work at the intersection of optimal filtering and optimal control. At the current state of research, a mathematical description for simultaneous decision making and filtering in continuous time with finite state and action spaces is still missing. In this paper, we give a mathematical description of a continuous-time partial observable Markov decision process (POMDP). By leveraging optimal filtering theory we derive a Hamilton-Jacobi-Bellman (HJB) type equation that characterizes the optimal solution. Using techniques from deep learning we approximately solve the resulting partial integro-differential equation. We present (i) an approach solving the decision problem offline by learning an approximation of the value function and (ii) an online algorithm which provides a solution in belief space using deep reinforcement learning. We show the applicability on a set of toy examples which pave the way for future methods providing solutions for high dimensional problems.
Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-ba
In a batch of synchronized queues, customers can only be serviced all at once or not at all, implying that service remains idle if at least one queue is empty. We propose that a batch of $n$ synchronized queues in a discrete-time setting is quasi-sta
Let a labeled dataset be given with scattered samples and consider the hypothesis of the ground-truth belonging to the reproducing kernel Hilbert space (RKHS) of a known positive-definite kernel. It is known that out-of-sample bounds can be establish
This paper deals with the stability analysis problem of discrete-time switched linear systems with ranged dwell time. A novel concept called L-switching-cycle is proposed, which contains sequences of multiple activation cycles satisfying the prescrib
This article derives lower bounds on the convergence rate of continuous-time gradient-based optimization algorithms. The algorithms are subjected to a time-normalization constraint that avoids a reparametrization of time in order to make the discussi