مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Coupling policy iteration with semi-definite relaxation to compute accurate numerical invariants in static analysis

53 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Eric Goubault

تاريخ النشر 2011

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Assale Adje (LIX - Ecole Polytechnique -

المنطق في علوم الحاسوب التحسين والتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We introduce a new domain for finding precise numerical invariants of programs by abstract interpretation. This domain, which consists of level sets of non-linear functions, generalizes the domain of linear templates introduced by Manna, Sankaranarayanan, and Sipma. In the case of quadratic templates, we use Shors semi-definite relaxation to derive computable yet precise abstractions of semantic functionals, and we show that the abstract fixpoint equation can be solved accurately by coupling policy iteration and semi-definite programming. We demonstrate the interest of our approach on a series of examples (filters, integration schemes) including a degenerate one (symplectic scheme).

قيم البحث

185 - Erhan Bayraktar , Gu Wang 2014

With model uncertainty characterized by a convex, possibly non-dominated set of probability measures, the agent minimizes the cost of hedging a path dependent contingent claim with given expected success ratio, in a discrete-time, semi-static market of stocks and options. Based on duality results which link quantile hedging to a randomized composite hypothesis test, an arbitrage-free discretization of the market is proposed as an approximation. The discretized market has a dominating measure, which guarantees the existence of the optimal hedging strategy and helps numerical calculation of the quantile hedging price. As the discretization becomes finer, the approximate quantile hedging price converges and the hedging strategy is asymptotically optimal in the original market.

الإحصاء والرياضيات المالية التحسين والتحكم الاحتمالات

Iteration in ACL2

352 - Matt Kaufmann 2020

Iterative algorithms are traditionally expressed in ACL2 using recursion. On the other hand, Common Lisp provides a construct, loop, which -- like most programming languages -- provides direct support for iteration. We describe an ACL2 analogue loop$ of loop that supports efficient ACL2 programming and reasoning with iteration.

المنطق في علوم الحاسوب الحساب الرمزي

Theoretical and numerical Analysis on Optimal dividend policy of an insurance company with positive transaction cost and higher solvency

188 - Zongxia Liang , Jicheng Yao 2010

Based on a point of view that solvency and security are first, this paper considers regular-singular stochastic optimal control problem of a large insurance company facing positive transaction cost asked by reinsurer under solvency constraint. The co mpany controls proportional reinsurance and dividend pay-out policy to maximize the expected present value of the dividend pay-outs until the time of bankruptcy. The paper aims at deriving the optimal retention ratio, dividend payout level, explicit value function of the insurance company via stochastic analysis and PDE methods. The results present the best equilibrium point between maximization of dividend pay-outs and minimization of risks. The paper also gets a risk-based capital standard to ensure the capital requirement of can cover the total given risk. We present numerical results to make analysis how the model parameters, such as, volatility, premium rate, and risk level, impact on risk-based capital standard, optimal retention ratio, optimal dividend payout level and the companys profit.

الإحصاء وإدارة المخاطر التحسين والتحكم الاحتمالات

Adaptive Approximate Policy Iteration

132 - Botao Hao , Nevena Lazic , Yasin Abbasi-Yadkori 2020

Model-free reinforcement learning algorithms combined with value function approximation have recently achieved impressive performance in a variety of application domains. However, the theoretical understanding of such algorithms is limited, and exist ing results are largely focused on episodic or discounted Markov decision processes (MDPs). In this work, we present adaptive approximate policy iteration (AAPI), a learning scheme which enjoys a $tilde{O}(T^{2/3})$ regret bound for undiscounted, continuing learning in uniformly ergodic MDPs. This is an improvement over the best existing bound of $tilde{O}(T^{3/4})$ for the average-reward case with function approximation. Our algorithm and analysis rely on online learning techniques, where value functions are treated as losses. The main technical novelty is the use of a data-dependent adaptive learning rate coupled with a so-called optimistic prediction of upcoming losses. In addition to theoretical guarantees, we demonstrate the advantages of our approach empirically on several environments.

التعلم الآلي التعلم الالي

Learning to Play No-Press Diplomacy with Best Response Policy Iteration

226 - Thomas Anthony , Tom Eccles , Andrea Tacchetti 2020

Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled applicat ion of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.

التعلم الآلي الذكاء الاصطناعي علوم الكمبيوتر ونظرية الألعاب

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة المستنصرية

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Coupling policy iteration with semi-definite relaxation to compute accurate numerical invariants in static analysis

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً