مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Maximin Action Identification: A New Bandit Framework for Games

268 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Aurelien Garivier

تاريخ النشر 2016

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Aurelien Garivier

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower-and upper-confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.

قيم البحث

85 - Xinjia Chen 2009

In this paper, we have established a unified framework of multistage parameter estimation. We demonstrate that a wide variety of statistical problems such as fixed-sample-size interval estimation, point estimation with error control, bounded-width co nfidence intervals, interval estimation following hypothesis testing, construction of confidence sequences, can be cast into the general framework of constructing sequential random intervals with prescribed coverage probabilities. We have developed exact methods for the construction of such sequential random intervals in the context of multistage sampling. In particular, we have established inclusion principle and coverage tuning techniques to control and adjust the coverage probabilities of sequential random intervals. We have obtained concrete sampling schemes which are unprecedentedly efficient in terms of sampling effort as compared to existing procedures.

نظرية الإحصاء التعلم الآلي الاحتمالات

A General Framework for Learning Mean-Field Games

150 - Xin Guo , Anran Hu , Renyuan Xu 2020

This paper presents a general mean-field game (GMFG) framework for simultaneous learning and decision-making in stochastic games with a large population. It first establishes the existence of a unique Nash Equilibrium to this GMFG, and demonstrates t hat naively combining Q-learning with the fixed-point approach in classical MFGs yields unstable algorithms. It then proposes value-based and policy-based reinforcement learning algorithms (GMF-P and GMF-P respectively) with smoothed policies, with analysis of convergence property and computational complexity. The experiments on repeated Ad auction problems demonstrate that GMF-V-Q, a specific GMF-V algorithm based on Q-learning, is efficient and robust in terms of convergence and learning accuracy. Moreover, its performance is superior in convergence, stability, and learning ability, when compared with existing algorithms for multi-agent reinforcement learning.

التعلم الآلي التحسين والتحكم التعلم الالي

A New Framework of Multistage Hypothesis Tests

114 - Xinjia Chen 2009

In this paper, we have established a general framework of multistage hypothesis tests which applies to arbitrarily many mutually exclusive and exhaustive composite hypotheses. Within the new framework, we have constructed specific multistage tests wh ich rigorously control the risk of committing decision errors and are more efficient than previous tests in terms of average sample number and the number of sampling operations. Without truncation, the sample numbers of our testing plans are absolutely bounded.

نظرية الإحصاء التعلم الآلي الاحتمالات

Non-Asymptotic Sequential Tests for Overlapping Hypotheses and application to near optimal arm identification in bandit models

74 - Aurelien Garivier 2019

In this paper, we study sequential testing problems with emph{overlapping} hypotheses. We first focus on the simple problem of assessing if the mean $mu$ of a Gaussian distribution is $geq -epsilon$ or $leq epsilon$; if $muin(-epsilon,epsilon)$, both answers are considered to be correct. Then, we consider PAC-best arm identification in a bandit model: given $K$ probability distributions on $mathbb{R}$ with means $mu_1,dots,mu_K$, we derive the asymptotic complexity of identifying, with risk at most $delta$, an index $Iin{1,dots,K}$ such that $mu_Igeq max_imu_i -epsilon$. We provide non asymptotic bounds on the error of a parallel General Likelihood Ratio Test, which can also be used for more general testing problems. We further propose lower bound on the number of observation needed to identify a correct hypothesis. Those lower bounds rely on information-theoretic arguments, and specifically on t

نظرية الإحصاء نظرية الإحصاء

Maximin Optimization for Binary Regression

106 - Nisan Chiprut , Amir Globerson , Ami Wiesel 2020

We consider regression problems with binary weights. Such optimization problems are ubiquitous in quantized learning models and digital communication systems. A natural approach is to optimize the corresponding Lagrangian using variants of the gradie nt ascent-descent method. Such maximin techniques are still poorly understood even in the concave-convex case. The non-convex binary constraints may lead to spurious local minima. Interestingly, we prove that this approach is optimal in linear regression with low noise conditions as well as robust regression with a small number of outliers. Practically, the method also performs well in regression with cross entropy loss, as well as non-convex multi-layer neural networks. Taken together our approach highlights the potential of saddle-point optimization for learning constrained models.

التعلم الآلي التحسين والتحكم التعلم الالي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة السورية الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Maximin Action Identification: A New Bandit Framework for Games

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً