ﻻ يوجد ملخص باللغة العربية
We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower-and upper-confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.
In this paper, we have established a unified framework of multistage parameter estimation. We demonstrate that a wide variety of statistical problems such as fixed-sample-size interval estimation, point estimation with error control, bounded-width co
This paper presents a general mean-field game (GMFG) framework for simultaneous learning and decision-making in stochastic games with a large population. It first establishes the existence of a unique Nash Equilibrium to this GMFG, and demonstrates t
In this paper, we have established a general framework of multistage hypothesis tests which applies to arbitrarily many mutually exclusive and exhaustive composite hypotheses. Within the new framework, we have constructed specific multistage tests wh
In this paper, we study sequential testing problems with emph{overlapping} hypotheses. We first focus on the simple problem of assessing if the mean $mu$ of a Gaussian distribution is $geq -epsilon$ or $leq epsilon$; if $muin(-epsilon,epsilon)$, both
We consider regression problems with binary weights. Such optimization problems are ubiquitous in quantized learning models and digital communication systems. A natural approach is to optimize the corresponding Lagrangian using variants of the gradie