Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

POMDPs in Continuous Time and Discrete Spaces

100 0 0.0 ( 0 )

Download Cite

Added by Bastian Alt

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Bastian Alt - Matthias Schultheis - Heinz Koeppl

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Many processes, such as discrete event systems in engineering or population dynamics in biology, evolve in discrete space and continuous time. We consider the problem of optimal decision making in such discrete state and action space systems under partial observability. This places our work at the intersection of optimal filtering and optimal control. At the current state of research, a mathematical description for simultaneous decision making and filtering in continuous time with finite state and action spaces is still missing. In this paper, we give a mathematical description of a continuous-time partial observable Markov decision process (POMDP). By leveraging optimal filtering theory we derive a Hamilton-Jacobi-Bellman (HJB) type equation that characterizes the optimal solution. Using techniques from deep learning we approximately solve the resulting partial integro-differential equation. We present (i) an approach solving the decision problem offline by learning an approximation of the value function and (ii) an online algorithm which provides a solution in belief space using deep reinforcement learning. We show the applicability on a set of toy examples which pave the way for future methods providing solutions for high dimensional problems.

rate research

Sparse tree search optimality guarantees in POMDPs with continuous observation spaces

116 - Michael H. Lim , Claire J. Tomlin , Zachary N. Sunberg 2019

Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However there has been no formal theoretical justification for this technique. This work offers such a justification, proving that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution by increasing computational power.

Machine Learning Robotics Systems and Control

Stability Results on Synchronized Queues in Discrete-Time for Arbitrary Dimension

321 - Richard Schoeffauer , Gerhard Wunder 2020

In a batch of synchronized queues, customers can only be serviced all at once or not at all, implying that service remains idle if at least one queue is empty. We propose that a batch of $n$ synchronized queues in a discrete-time setting is quasi-stable for $n in {2,3}$ and unstable for $n geq 4$. A correspondence between such systems and a random-walk-like discrete-time Markov chain (DTMC), which operates on a quotient space of the original state-space, is derived. Using this relation, we prove the proposition by showing that the DTMC is transient for $n geq 4$ and null-recurrent (hence quasi-stability) for $n in {2,3}$ via evaluating infinite power sums over skewed binomial coefficients. Ignoring the special structure of the quotient space, the proposition can be interpreted as a result of Polyas theorem on random walks, since the dimension of said space is $d-1$.

Probability Systems and Control Systems and Control

Robust Uncertainty Bounds in Reproducing Kernel Hilbert Spaces: A Convex Optimization Approach

255 - Paul Scharnhorst , Emilio T. Maddalena , Yuning Jiang 2021

Let a labeled dataset be given with scattered samples and consider the hypothesis of the ground-truth belonging to the reproducing kernel Hilbert space (RKHS) of a known positive-definite kernel. It is known that out-of-sample bounds can be established at unseen input locations, thus limiting the risk associated with learning this function. We show how computing tight, finite-sample uncertainty bounds amounts to solving parametric quadratically constrained linear programs. In our setting, the outputs are assumed to be contaminated by bounded measurement noise that can otherwise originate from any compactly supported distribution. No independence assumptions are made on the available data. Numerical experiments are presented to compare the present results with other closed-form alternatives.

Machine Learning Systems and Control Systems and Control

Necessary and Sufficient Conditions for Stability of Discrete-Time Switched Linear Systems with Ranged Dwell Time

113 - Weiming Xiang 2021

This paper deals with the stability analysis problem of discrete-time switched linear systems with ranged dwell time. A novel concept called L-switching-cycle is proposed, which contains sequences of multiple activation cycles satisfying the prescribed ranged dwell time constraint. Based on L-switching-cycle, two sufficient conditions are proposed to ensure the global uniform asymptotic stability of discrete-time switched linear systems. It is noted that two conditions are equivalent in stability analysis with the same $L$-switching-cycle. These two sufficient conditions can be viewed as generalizations of the clock-dependent Lyapunov and multiple Lyapunov function methods, respectively. Furthermore, it has been proven that the proposed L-switching-cycle can eventually achieve the nonconservativeness in stability analysis as long as a sufficiently long L-switching-cycle is adopted. A numerical example is provided to illustrate our theoretical results.

Optimization and Control Systems and Control Systems and Control

Continuous-time Lower Bounds for Gradient-based Algorithms

88 - Michael Muehlebach , Michael I. Jordan 2020

This article derives lower bounds on the convergence rate of continuous-time gradient-based optimization algorithms. The algorithms are subjected to a time-normalization constraint that avoids a reparametrization of time in order to make the discussion of continuous-time convergence rates meaningful. We reduce the multi-dimensional problem to a single dimension, recover well-known lower bounds from the discrete-time setting, and provide insight into why these lower bounds occur. We present algorithms that achieve the proposed lower bounds, even when the function class under consideration includes certain nonconvex functions.

Optimization and Control Systems and Control Systems and Control

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

POMDPs in Continuous Time and Discrete Spaces

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions