ترغب بنشر مسار تعليمي؟ اضغط هنا

Policy Synthesis for Switched Linear Systems with Markov Decision Process Switching

121   0   0.0 ( 0 )
 نشر من قبل Bo Wu
 تاريخ النشر 2020
والبحث باللغة English




اسأل ChatGPT حول البحث

We study the synthesis of mode switching protocols for a class of discrete-time switched linear systems in which the mode jumps are governed by Markov decision processes (MDPs). We call such systems MDP-JLS for brevity. Each state of the MDP corresponds to a mode in the switched system. The probabilistic state transitions in the MDP represent the mode transitions. We focus on finding a policy that selects the switching actions at each mode such that the switched system that follows these actions is guaranteed to be stable. Given a policy in the MDP, the considered MDP-JLS reduces to a Markov jump linear system (MJLS). {We consider both mean-square stability and stability with probability one. For mean-square stability, we leverage existing stability conditions for MJLSs and propose efficient semidefinite programming formulations to find a stabilizing policy in the MDP. For stability with probability one, we derive new sufficient conditions and compute a stabilizing policy using linear programming. We also extend the policy synthesis results to MDP-JLS with uncertain mode transition probabilities.

قيم البحث

اقرأ أيضاً

106 - John Jackson 2021
We present a data-driven framework for strategy synthesis for partially-known switched stochastic systems. The properties of the system are specified using linear temporal logic (LTL) over finite traces (LTLf), which is as expressive as LTL and enabl es interpretations over finite behaviors. The framework first learns the unknown dynamics via Gaussian process regression. Then, it builds a formal abstraction of the switched system in terms of an uncertain Markov model, namely an Interval Markov Decision Process (IMDP), by accounting for both the stochastic behavior of the system and the uncertainty in the learning step. Then, we synthesize a strategy on the resulting IMDP that maximizes the satisfaction probability of the LTLf specification and is robust against all the uncertainties in the abstraction. This strategy is then refined into a switching strategy for the original stochastic system. We show that this strategy is near-optimal and provide a bound on its distance (error) to the optimal strategy. We experimentally validate our framework on various case studies, including both linear and non-linear switched stochastic systems.
In this paper, we study the structural state and input observability of continuous-time switched linear time-invariant systems and unknown inputs. First, we provide necessary and sufficient conditions for their structural state and input observabilit y that can be efficiently verified in $O((m(n+p))^2)$, where $n$ is the number of state variables, $p$ is the number of unknown inputs, and $m$ is the number of modes. Moreover, we address the minimum sensor placement problem for these systems by adopting a feed-forward analysis and by providing an algorithm with a computational complexity of $ O((m(n+p)+alpha)^{2.373})$, where $alpha$ is the number of target strongly connected components of the systems digraph representation. Lastly, we explore different assumptions on both the system and unknown inputs (latent space) dynamics that add more structure to the problem, and thereby, enable us to render algorithms with lower computational complexity, which are suitable for implementation in large-scale systems.
This work attempts to approximate a linear Gaussian system with a finite-state hidden Markov model (HMM), which is found useful in solving sophisticated event-based state estimation problems. An indirect modeling approach is developed, wherein a stat e space model (SSM) is firstly identified for a Gaussian system and the SSM is then used as an emulator for learning an HMM. In the proposed method, the training data for the HMM are obtained from the data generated by the SSM through building a quantization mapping. Parameter learning algorithms are designed to learn the parameters of the HMM, through exploiting the periodical structural characteristics of the HMM. The convergence and asymptotic properties of the proposed algorithms are analyzed. The HMM learned using the proposed algorithms is applied to event-triggered state estimation, and numerical results on model learning and state estimation demonstrate the validity of the proposed algorithms.
239 - Wenjun Zeng , Yi Liu 2021
In membership/subscriber acquisition and retention, we sometimes need to recommend marketing content for multiple pages in sequence. Different from general sequential decision making process, the use cases have a simpler flow where customers per seei ng recommended content on each page can only return feedback as moving forward in the process or dropping from it until a termination state. We refer to this type of problems as sequential decision making in linear--flow. We propose to formulate the problem as an MDP with Bandits where Bandits are employed to model the transition probability matrix. At recommendation time, we use Thompson sampling (TS) to sample the transition probabilities and allocate the best series of actions with analytical solution through exact dynamic programming. The way that we formulate the problem allows us to leverage TSs efficiency in balancing exploration and exploitation and Bandits convenience in modeling actions incompatibility. In the simulation study, we observe the proposed MDP with Bandits algorithm outperforms Q-learning with $epsilon$-greedy and decreasing $epsilon$, independent Bandits, and interaction Bandits. We also find the proposed algorithms performance is the most robust to changes in the across-page interdependence strength.
128 - Mengqi Xue , Yang Tang , Wei Ren 2020
Extended from the classic switched system, themulti-dimensional switched system (MDSS) allows for subsystems(switching modes) with different state dimensions. In this work,we study the stability problem of the MDSS, whose state transi-tion at each sw itching instant is characterized by the dimensionvariation and the state jump, without extra constraint imposed.Based on the proposed transition-dependent average dwell time(TDADT) and the piecewise TDADT methods, along with the pro-posed parametric multiple Lyapunov functions (MLFs), sufficientconditions for the practical and the asymptotical stabilities of theMDSS are respectively derived for the MDSS in the presenceof unstable subsystems. The stability results for the MDSS areapplied to the consensus problem of the open multi-agent system(MAS) which exhibits dynamic circulation behaviors. It is shownthat the (practical) consensus of the open MAS with disconnectedswitching topologies can be ensured by (practically) stabilizingthe corresponding MDSS with unstable switching modes via theproposed TDADT and parametric MLF methods.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا