ﻻ يوجد ملخص باللغة العربية
We study the statistical limits of Imitation Learning (IL) in episodic Markov Decision Processes (MDPs) with a state space $mathcal{S}$. We focus on the known-transition setting where the learner is provided a dataset of $N$ length-$H$ trajectories from a deterministic expert policy and knows the MDP transition. We establish an upper bound $O(|mathcal{S}|H^{3/2}/N)$ for the suboptimality using the Mimic-MD algorithm in Rajaraman et al (2020) which we prove to be computationally efficient. In contrast, we show the minimax suboptimality grows as $Omega( H^{3/2}/N)$ when $|mathcal{S}|geq 3$ while the unknown-transition setting suffers from a larger sharp rate $Theta(|mathcal{S}|H^2/N)$ (Rajaraman et al (2020)). The lower bound is established by proving a two-way reduction between IL and the value estimation problem of the unknown expert policy under any given reward function, as well as building connections with linear functional estimation with subsampled observations. We further show that under the additional assumption that the expert is optimal for the true reward function, there exists an efficient algorithm, which we term as Mimic-Mixture, that provably achieves suboptimality $O(1/N)$ for arbitrary 3-state MDPs with rewards only at the terminal layer. In contrast, no algorithm can achieve suboptimality $O(sqrt{H}/N)$ with high probability if the expert is not constrained to be optimal. Our work formally establishes the benefit of the expert optimal assumption in the known transition setting, while Rajaraman et al (2020) showed it does not help when transitions are unknown.
Despite its potential to improve sample complexity versus model-free approaches, model-based reinforcement learning can fail catastrophically if the model is inaccurate. An algorithm should ideally be able to trust an imperfect model over a reasonabl
Federated learning is a distributed framework for training machine learning models over the data residing at mobile devices, while protecting the privacy of individual users. A major bottleneck in scaling federated learning to a large number of users
Metric learning is an important family of algorithms for classification and similarity search, but the robustness of learned metrics against small adversarial perturbations is less studied. In this paper, we show that existing metric learning algorit
Partial-label learning (PLL) is a multi-class classification problem, where each training example is associated with a set of candidate labels. Even though many practical PLL methods have been proposed in the last two decades, there lacks a theoretic
In this paper, we establish a theoretical comparison between the asymptotic mean-squared error of Double Q-learning and Q-learning. Our result builds upon an analysis for linear stochastic approximation based on Lyapunov equations and applies to both