ﻻ يوجد ملخص باللغة العربية
We consider the best-of-both-worlds problem for learning an episodic Markov Decision Process through $T$ episodes, with the goal of achieving $widetilde{mathcal{O}}(sqrt{T})$ regret when the losses are adversarial and simultaneously $mathcal{O}(text{polylog}(T))$ regret when the losses are (almost) stochastic. Recent work by [Jin and Luo, 2020] achieves this goal when the fixed transition is known, and leaves the case of unknown transition as a major open question. In this work, we resolve this open problem by using the same Follow-the-Regularized-Leader ($text{FTRL}$) framework together with a set of new techniques. Specifically, we first propose a loss-shifting trick in the $text{FTRL}$ analysis, which greatly simplifies the approach of [Jin and Luo, 2020] and already improves their results for the known transition case. Then, we extend this idea to the unknown transition case and develop a novel analysis which upper bounds the transition estimation error by (a fraction of) the regret itself in the stochastic setting, a key property to ensure $mathcal{O}(text{polylog}(T))$ regret.
This work studies the problem of learning episodic Markov Decision Processes with known transition and bandit feedback. We develop the first algorithm with a ``best-of-both-worlds guarantee: it achieves $mathcal{O}(log T)$ regret when the losses are
We consider the problem of learning in episodic finite-horizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses. We propose an efficient algorithm that achieves $mathcal{tilde{O}}(L|X|sqrt{|A|T})$
Interpretability techniques aim to provide the rationale behind a models decision, typically by explaining either an individual prediction (local explanation, e.g. `why is this patient diagnosed with this condition) or a class of predictions (global
We consider the problem of fair allocation of indivisible items among $n$ agents with additive valuations, when agents have equal entitlements to the goods, and there are no transfers. Best-of-Both-Worlds (BoBW) fairness mechanisms aim to give all ag
The past year has witnessed rapid advances in sequence-to-sequence (seq2seq) modeling for Machine Translation (MT). The classic RNN-based approaches to MT were first out-performed by the convolutional seq2seq model, which was then out-performed by th