ﻻ يوجد ملخص باللغة العربية
In this work, we aim to create a completely online algorithmic framework for prediction with expert advice that is translation-free and scale-free of the expert losses. Our goal is to create a generalized algorithm that is suitable for use in a wide variety of applications. For this purpose, we study the expected regret of our algorithm against a generic competition class in the sequential prediction by expert advice problem, where the expected regret measures the difference between the losses of our prediction algorithm and the losses of the best expert selection strategy in the competition. We design our algorithm using the universal prediction perspective to compete against a specified class of expert selection strategies, which is not necessarily a fixed expert selection. The class of expert selection strategies that we want to compete against is purely determined by the specific application at hand and is left generic, which makes our generalized algorithm suitable for use in many different problems. We show that no preliminary knowledge about the loss sequence is required by our algorithm and its performance bounds, which are second order, expressed in terms of sums of squared losses. Our regret bounds are stable under arbitrary scalings and translations of the losses.
This work addresses the classic machine learning problem of online prediction with expert advice. We consider the finite-horizon version of this zero-sum, two-person game. Using verification arguments from optimal control theory, we view the task of
This work addresses a classic problem of online prediction with expert advice. We assume an adversarial opponent, and we consider both the finite-horizon and random-stoppi
This work addresses the classic machine learning problem of online prediction with expert advice. A new potential-based framework for the fixed horizon version of this problem has been recently developed using verification arguments from optimal cont
We derive an online learning algorithm with improved regret guarantees for `easy loss sequences. We consider two types of `easiness: (a) stochastic loss sequences and (b) adversarial loss sequences with small effective range of the losses. While a nu
We consider the contextual bandit problem, where a player sequentially makes decisions based on past observations to maximize the cumulative reward. Although many algorithms have been proposed for contextual bandit, most of them rely on finding the m