ترغب بنشر مسار تعليمي؟ اضغط هنا

Counterexample-Guided Strategy Improvement for POMDPs Using Recurrent Neural Networks

384   0   0.0 ( 0 )
 نشر من قبل Nils Jansen
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We study strategy synthesis for partially observable Markov decision processes (POMDPs). The particular problem is to determine strategies that provably adhere to (probabilistic) temporal logic constraints. This problem is computationally intractable and theoretically hard. We propose a novel method that combines techniques from machine learning and formal verification. First, we train a recurrent neural network (RNN) to encode POMDP strategies. The RNN accounts for memory-based decisions without the need to expand the full belief space of a POMDP. Secondly, we restrict the RNN-based strategy to represent a finite-memory strategy and implement it on a specific POMDP. For the resulting finite Markov chain, efficient formal verification techniques provide provable guarantees against temporal logic specifications. If the specification is not satisfied, counterexamples supply diagnostic information. We use this information to improve the strategy by iteratively training the RNN. Numerical experiments show that the proposed method elevates the state of the art in POMDP solving by up to three orders of magnitude in terms of solving times and model sizes.

قيم البحث

اقرأ أيضاً

369 - Xiaobin Zhang , Bo Wu , 2017
Partially Observable Markov Decision Process (POMDP) is widely used to model probabilistic behavior for complex systems. Compared with MDPs, POMDP models a system more accurate but solving a POMDP generally takes exponential time in the size of its s tate space. This makes the formal verification and synthesis problems much more challenging for POMDPs, especially when multiple system components are involved. As a promising technique to reduce the verification complexity, the abstraction method tries to find an abstract system with a smaller state space but preserves enough properties for the verification purpose. While abstraction based verification has been explored extensively for MDPs, in this paper, we present the first result of POMDP abstraction and its refinement techniques. The main idea follows the counterexample-guided abstraction refinement (CEGAR) framework. Starting with a coarse guess for the POMDP abstraction, we iteratively use counterexamples from formal verification to refine the abstraction until the abstract system can be used to infer the verification result for the original POMDP. Our main contributions have two folds: 1) we propose a novel abstract system model for POMDP and a new simulation relation to capture the partial observability then prove the preservation on a fragment of Probabilistic Computation Tree Logic (PCTL); 2) to find a proper abstract system that can prove or disprove the satisfaction relation on the concrete POMDP, we develop a novel refinement algorithm. Our work leads to a sound and complete CEGAR framework for POMDP.
Recurrent neural networks (RNNs) have been applied to a broad range of applications, including natural language processing, drug discovery, and video recognition. Their vulnerability to input perturbation is also known. Aligning with a view from soft ware defect detection, this paper aims to develop a coverage guided testing approach to systematically exploit the internal behaviour of RNNs, with the expectation that such testing can detect defects with high possibility. Technically, the long short term memory network (LSTM), a major class of RNNs, is thoroughly studied. A family of three test metrics are designed to quantify not only the values but also the temporal relations (including both step-wise and bounded-length) exhibited when LSTM processing inputs. A genetic algorithm is applied to efficiently generate test cases. The test metrics and test case generation algorithm are implemented into a tool TestRNN, which is then evaluated on a set of LSTM benchmarks. Experiments confirm that TestRNN has advantages over the state-of-art tool DeepStellar and attack-based defect detection methods, owing to its working with finer temporal semantics and the consideration of the naturalness of input perturbation. Furthermore, TestRNN enables meaningful information to be collected and exhibited for users to understand the testing results, which is an important step towards interpretable neural network testing.
Deep neural networks have been widely applied as an effective approach to handle complex and practical problems. However, one of the most fundamental open problems is the lack of formal methods to analyze the safety of their behaviors. To address thi s challenge, we propose a parallelizable technique to compute exact reachable sets of a neural network to an input set. Our method currently focuses on feed-forward neural networks with ReLU activation functions. One of the primary challenges for polytope-based approaches is identifying the intersection between intermediate polytopes and hyperplanes from neurons. In this regard, we present a new approach to construct the polytopes with the face lattice, a complete combinatorial structure. The correctness and performance of our methodology are evaluated by verifying the safety of ACAS Xu networks and other benchmarks. Compared to state-of-the-art methods such as Reluplex, Marabou, and NNV, our approach exhibits a significantly higher efficiency. Additionally, our approach is capable of constructing the complete input set given an output set, so that any input that leads to safety violation can be tracked.
Uncertain partially observable Markov decision processes (uPOMDPs) allow the probabilistic transition and observation functions of standard POMDPs to belong to a so-called uncertainty set. Such uncertainty, referred to as epistemic uncertainty, captu res uncountable sets of probability distributions caused by, for instance, a lack of data available. We develop an algorithm to compute finite-memory policies for uPOMDPs that robustly satisfy specifications against any admissible distribution. In general, computing such policies is theoretically and practically intractable. We provide an efficient solution to this problem in four steps. (1) We state the underlying problem as a nonconvex optimization problem with infinitely many constraints. (2) A dedicated dualization scheme yields a dual problem that is still nonconvex but has finitely many constraints. (3) We linearize this dual problem and (4) solve the resulting finite linear program to obtain locally optimal solutions to the original problem. The resulting problem formulation is exponentially smaller than those resulting from existing methods. We demonstrate the applicability of our algorithm using large instances of an aircraft collision-avoidance scenario and a novel spacecraft motion planning case study.
Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence. Nonetheless, popular tasks such as speech or images reco gnition, involve multi-dimensional input features that are characterized by strong internal dependencies between the dimensions of the input vector. We propose a novel quaternion recurrent neural network (QRNN), alongside with a quaternion long-short term memory neural network (QLSTM), that take into account both the external relations and these internal structural dependencies with the quaternion algebra. Similarly to capsules, quaternions allow the QRNN to code internal dependencies by composing and processing multidimensional features as single entities, while the recurrent operation reveals correlations between the elements composing the sequence. We show that both QRNN and QLSTM achieve better performances than RNN and LSTM in a realistic application of automatic speech recognition. Finally, we show that QRNN and QLSTM reduce by a maximum factor of 3.3x the number of free parameters needed, compared to real-valued RNNs and LSTMs to reach better results, leading to a more compact representation of the relevant information.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا