Online Double Oracle

119 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yaodong Yang Mr.

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Le Cong Dinh - Yaodong Yang - Zheng Tian

الذكاء الاصطناعي علوم الكمبيوتر ونظرية الألعاب

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Solving strategic games with huge action space is a critical yet under-explored topic in economics, operations research and artificial intelligence. This paper proposes new learning algorithms for solving two-player zero-sum normal-form games where the number of pure strategies is prohibitively large. Specifically, we combine no-regret analysis from online learning with Double Oracle (DO) methods from game theory. Our method -- emph{Online Double Oracle (ODO)} -- is provably convergent to a Nash equilibrium (NE). Most importantly, unlike normal DO methods, ODO is emph{rationale} in the sense that each agent in ODO can exploit strategic adversary with a regret bound of $mathcal{O}(sqrt{T k log(k)})$ where $k$ is not the total number of pure strategies, but rather the size of emph{effective strategy set} that is linearly dependent on the support size of the NE. On tens of different real-world games, ODO outperforms DO, PSRO methods, and no-regret algorithms such as Multiplicative Weight Update by a significant margin, both in terms of convergence rate to a NE and average payoff against strategic adversaries.

قيم البحث

144 - Favour M. Nyikosa , Michael A. Osborne , Stephen J. Roberts 2019

Financial markets are complex environments that produce enormous amounts of noisy and non-stationary data. One fundamental problem is online portfolio selection, the goal of which is to exploit this data to sequentially select portfolios of assets to achieve positive investment outcomes while managing risks. Various algorithms have been proposed for solving this problem in fields such as finance, statistics and machine learning, among others. Most of the methods have parameters that are estimated from backtests for good performance. Since these algorithms operate on non-stationary data that reflects the complexity of financial markets, we posit that adaptively tuning these parameters in an intelligent manner is a remedy for dealing with this complexity. In this paper, we model the mapping between the parameter space and the space of performance metrics using a Gaussian process prior. We then propose an oracle based on adaptive Bayesian optimization for automatically and adaptively configuring online portfolio selection methods. We test the efficacy of our solution on algorithms operating on equity and index data from various markets.

التعلم الالي التعلم الآلي

Online-Extractability in the Quantum Random-Oracle Model

78 - Jelle Don , Serge Fehr , Christian Majenz 2021

We show the following generic result. Whenever a quantum query algorithm in the quantum random-oracle model outputs a classical value $t$ that is promised to be in some tight relation with $H(x)$ for some $x$, then $x$ can be efficiently extracted wi th almost certainty. The extraction is by means of a suitable simulation of the random oracle and works online, meaning that it is straightline, i.e., without rewinding, and on-the-fly, i.e., during the protocol execution and without disturbing it. The technical core of our result is a new commutator bound that bounds the operator norm of the commutator of the unitary operator that describes the evolution of the compressed oracle (which is used to simulate the random oracle above) and of the measurement that extracts $x$. We show two applications of our generic online extractability result. We show tight online extractability of commit-and-open $Sigma$-protocols in the quantum setting, and we offer the first non-asymptotic post-quantum security proof of the textbook Fujisaki-Okamoto transformation, i.e, without adjustments to facilitate the proof.

التشفير والأمن فيزياء الكم

Advice from the Oracle: Really Intelligent Information Retrieval

74 - Michael J. Kurtz 2018

What is intelligent information retrieval? Essentially this is asking what is intelligence, in this article I will attempt to show some of the aspects of human intelligence, as related to information retrieval. I will do this by the device of a semi- imaginary Oracle. Every Observatory has an oracle, someone who is a distinguished scientist, has great administrative responsibilities, acts as mentor to a number of less senior people, and as trusted advisor to even the most accomplished scientists, and knows essentially everyone in the field. In an appendix I will present a brief summary of the Statistical Factor Space method for text indexing and retrieval, and indicate how it will be used in the Astrophysics Data System Abstract Service. 2018 Keywords: Personal Digital Assistant; Supervised Topic Models

الذكاء الاصطناعي الأجهزة والأساليب للزيئات الفيزياء الفلكية الفيزياء والمجتمع

Uniform Solution Sampling Using a Constraint Solver As an Oracle

344 - Stefano Ermon , Carla P. Gomes , Bart Selman 2012

We consider the problem of sampling from solutions defined by a set of hard constraints on a combinatorial space. We propose a new sampling technique that, while enforcing a uniform exploration of the search space, leverages the reasoning power of a systematic constraint solver in a black-box scheme. We present a series of challenging domains, such as energy barriers and highly asymmetric spaces, that reveal the difficulties introduced by hard constraints. We demonstrate that standard approaches such as Simulated Annealing and Gibbs Sampling are greatly affected, while our new technique can overcome many of these difficulties. Finally, we show that our sampling scheme naturally defines a new approximate model counting technique, which we empirically show to be very accurate on a range of benchmark problems.

الذكاء الاصطناعي

Edge-Weighted Online Bipartite Matching

109 - Matthew Fahrbach , Zhiyi Huang , Runzhou Tao 2020

Online bipartite matching and its variants are among the most fundamental problems in the online algorithms literature. Karp, Vazirani, and Vazirani (STOC 1990) introduced an elegant algorithm for the unweighted problem that achieves an optimal compe titive ratio of $1-1/e$. Later, Aggarwal et al. (SODA 2011) generalized their algorithm and analysis to the vertex-weighted case. Little is known, however, about the most general edge-weighted problem aside from the trivial $1/2$-competitive greedy algorithm. In this paper, we present the first online algorithm that breaks the long-standing $1/2$ barrier and achieves a competitive ratio of at least $0.5086$. In light of the hardness result of Kapralov, Post, and Vondrak (SODA 2013) that restricts beating a $1/2$ competitive ratio for the more general problem of monotone submodular welfare maximization, our result can be seen as strong evidence that edge-weighted bipartite matching is strictly easier than submodular welfare maximization in the online setting. The main ingredient in our online matching algorithm is a novel subroutine called online correlated selection (OCS), which takes a sequence of pairs of vertices as input and selects one vertex from each pair. Instead of using a fresh random bit to choose a vertex from each pair, the OCS negatively correlates decisions across different pairs and provides a quantitative measure on the level of correlation. We believe our OCS technique is of independent interest and will find further applications in other online optimization problems.

بنى وهياكل البيانات والخوارزميات علوم الكمبيوتر ونظرية الألعاب