Sequential decision problems, dependent types and generic solutions

136 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل J\\\"urgen Koslowski

تاريخ النشر 2016

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Nicola Botta - Patrik Jansson - Cezar Ionescu

المنطق في علوم الحاسوب

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present a computer-checked generic implementation for solving finite-horizon sequential decision problems. This is a wide class of problems, including inter-temporal optimizations, knapsack, optimal bracketing, scheduling, etc. The implementation can handle time-step dependent control and state spaces, and monadic representations of uncertainty (such as stochastic, non-deterministic, fuzzy, or combinations thereof). This level of genericity is achievable in a programming language with dependent types (we have used both Idris and Agda). Dependent types are also the means that allow us to obtain a formalization and computer-checked proof of the central component of our implementation: Bellmans principle of optimality and the associated backwards induction algorithm. The formalization clarifies certain aspects of backwards induction and, by making explicit notions such as viability and reachability, can serve as a starting point for a theory of controllability of monadic dynamical systems, commonly encountered in, e.g., climate impact research.

قيم البحث

103 - Alev{s} Bizjak , Hans Bugge Grathwohl , Ranald Clouston 2016

We present guarded dependent type theory, gDTT, an extensional dependent type theory with a `later modality and clock quantifiers for programming and proving with guarded recursive and coinductive types. The later modality is used to ensure the produ ctivity of recursive definitions in a modular, type based, way. Clock quantifiers are used for controlled elimination of the later modality and for encoding coinductive types using guarded recursive types. Key to the development of gDTT are novel type and term formers involving what we call `delayed substitutions. These generalise the applicative functor rules for the later modality considered in earlier work, and are crucial for programming and proving with dependent types. We show soundness of the type theory with respect to a denotational model.

المنطق في علوم الحاسوب لغات البرمجة

Formalizing IMO Problems and Solutions in Isabelle/HOL

212 - Filip Maric 2020

The International Mathematical Olympiad (IMO) is perhaps the most celebrated mental competition in the world and as such is among the greatest grand challenges for Artificial Intelligence (AI). The IMO Grand Challenge, recently formulated, requires t o build an AI that can win a gold medal in the competition. We present some initial steps that could help to tackle this goal by creating a public repository of mechanically checked solutions of IMO Problems in the interactive theorem prover Isabelle/HOL. This repository is actively maintained by students of the Faculty of Mathematics, University of Belgrade, Serbia within the course Introduction to Interactive Theorem Proving.

المنطق في علوم الحاسوب

Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games

81 - Gabriele Farina , Tuomas Sandholm 2021

Regret minimization has proved to be a versatile tool for tree-form sequential decision making and extensive-form games. In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are curre ntly the practical state of the art for computing a Nash equilibrium. Most regret-minimization algorithms for tree-form sequential decision making, including CFR, require (i) an exact model of the players decision nodes, observation nodes, and how they are linked, and (ii) full knowledge, at all times t, about the payoffs -- even in parts of the decision space that are not encountered at time t. Recently, there has been growing interest towards relaxing some of those restrictions and making regret minimization applicable to settings for which reinforcement learning methods have traditionally been used -- for example, those in which only black-box access to the environment is available. We give the first, to our knowledge, regret-minimization algorithm that guarantees sublinear regret with high probability even when requirement (i) -- and thus also (ii) -- is dropped. We formalize an online learning setting in which the strategy space is not known to the agent and gets revealed incrementally whenever the agent encounters new decision points. We give an efficient algorithm that achieves $O(T^{3/4})$ regret with high probability for that setting, even when the agent faces an adversarial environment. Our experiments show it significantly outperforms the prior algorithms for the problem, which do not have such guarantees. It can be used in any application for which regret minimization is useful: approximating Nash equilibrium or quantal response equilibrium, approximating coarse correlated equilibrium in multi-player games, learning a best response, learning safe opponent exploitation, and online play against an unknown opponent/environment.

علوم الكمبيوتر ونظرية الألعاب التعلم الآلي

Quotients, inductive types, and quotient inductive types

114 - Marcelo P. Fiore , Andrew M. Pitts , S. C. Steenkamp 2021

This paper introduces an expressive class of indexed quotient-inductive types, called QWI types, within the framework of constructive type theory. They are initial algebras for indexed families of equational theories with possibly infinitary operator s and equations. We prove that QWI types can be derived from quotient types and inductive types in the type theory of toposes with natural number object and universes, provided those universes satisfy the Weakly Initial Set of Covers (WISC) axiom. We do so by constructing QWI types as colimits of a family of approximations to them defined by well-founded recursion over a suitable notion of size, whose definition involves the WISC axiom. We developed the proof and checked it using the Agda theorem prover.

المنطق في علوم الحاسوب

Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning

79 - Kai Wang , Sanket Shah , Haipeng Chen 2021

In the predict-then-optimize framework, the objective is to train a predictive model, mapping from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved. Recent wo rk on decision-focused learning shows that embedding the optimization problem in the training pipeline can improve decision quality and help generalize better to unseen tasks compared to relying on an intermediate loss function for evaluating prediction quality. We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) that are solved via reinforcement learning. In particular, we are given environment features and a set of trajectories from training MDPs, which we use to train a predictive model that generalizes to unseen test MDPs without trajectories. Two significant computational challenges arise in applying decision-focused learning to MDPs: (i) large state and action spaces make it infeasible for existing techniques to differentiate through MDP problems, and (ii) the high-dimensional policy space, as parameterized by a neural network, makes differentiating through a policy expensive. We resolve the first challenge by sampling provably unbiased derivatives to approximate and differentiate through optimality conditions, and the second challenge by using a low-rank approximation to the high-dimensional sample-based derivatives. We implement both Bellman--based and policy gradient--based decision-focused learning on three different MDP problems with missing parameters, and show that decision-focused learning performs better in generalization to unseen tasks.

التعلم الآلي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة حلب

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Sequential decision problems, dependent types and generic solutions

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً