ترغب بنشر مسار تعليمي؟ اضغط هنا

Coupling policy iteration with semi-definite relaxation to compute accurate numerical invariants in static analysis

53   0   0.0 ( 0 )
 نشر من قبل Eric Goubault
 تاريخ النشر 2011
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We introduce a new domain for finding precise numerical invariants of programs by abstract interpretation. This domain, which consists of level sets of non-linear functions, generalizes the domain of linear templates introduced by Manna, Sankaranarayanan, and Sipma. In the case of quadratic templates, we use Shors semi-definite relaxation to derive computable yet precise abstractions of semantic functionals, and we show that the abstract fixpoint equation can be solved accurately by coupling policy iteration and semi-definite programming. We demonstrate the interest of our approach on a series of examples (filters, integration schemes) including a degenerate one (symplectic scheme).


قيم البحث

اقرأ أيضاً

185 - Erhan Bayraktar , Gu Wang 2014
With model uncertainty characterized by a convex, possibly non-dominated set of probability measures, the agent minimizes the cost of hedging a path dependent contingent claim with given expected success ratio, in a discrete-time, semi-static market of stocks and options. Based on duality results which link quantile hedging to a randomized composite hypothesis test, an arbitrage-free discretization of the market is proposed as an approximation. The discretized market has a dominating measure, which guarantees the existence of the optimal hedging strategy and helps numerical calculation of the quantile hedging price. As the discretization becomes finer, the approximate quantile hedging price converges and the hedging strategy is asymptotically optimal in the original market.
352 - Matt Kaufmann 2020
Iterative algorithms are traditionally expressed in ACL2 using recursion. On the other hand, Common Lisp provides a construct, loop, which -- like most programming languages -- provides direct support for iteration. We describe an ACL2 analogue loop$ of loop that supports efficient ACL2 programming and reasoning with iteration.
188 - Zongxia Liang , Jicheng Yao 2010
Based on a point of view that solvency and security are first, this paper considers regular-singular stochastic optimal control problem of a large insurance company facing positive transaction cost asked by reinsurer under solvency constraint. The co mpany controls proportional reinsurance and dividend pay-out policy to maximize the expected present value of the dividend pay-outs until the time of bankruptcy. The paper aims at deriving the optimal retention ratio, dividend payout level, explicit value function of the insurance company via stochastic analysis and PDE methods. The results present the best equilibrium point between maximization of dividend pay-outs and minimization of risks. The paper also gets a risk-based capital standard to ensure the capital requirement of can cover the total given risk. We present numerical results to make analysis how the model parameters, such as, volatility, premium rate, and risk level, impact on risk-based capital standard, optimal retention ratio, optimal dividend payout level and the companys profit.
Model-free reinforcement learning algorithms combined with value function approximation have recently achieved impressive performance in a variety of application domains. However, the theoretical understanding of such algorithms is limited, and exist ing results are largely focused on episodic or discounted Markov decision processes (MDPs). In this work, we present adaptive approximate policy iteration (AAPI), a learning scheme which enjoys a $tilde{O}(T^{2/3})$ regret bound for undiscounted, continuing learning in uniformly ergodic MDPs. This is an improvement over the best existing bound of $tilde{O}(T^{3/4})$ for the average-reward case with function approximation. Our algorithm and analysis rely on online learning techniques, where value functions are treated as losses. The main technical novelty is the use of a data-dependent adaptive learning rate coupled with a so-called optimistic prediction of upcoming losses. In addition to theoretical guarantees, we demonstrate the advantages of our approach empirically on several environments.
Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled applicat ion of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا