ﻻ يوجد ملخص باللغة العربية
While multitask representation learning has become a popular approach in reinforcement learning (RL), theoretical understanding of why and when it works remains limited. This paper presents analyses for the statistical benefit of multitask representation learning in linear Markov Decision Process (MDP) under a generative model. In this paper, we consider an agent to learn a representation function $phi$ out of a function class $Phi$ from $T$ source tasks with $N$ data per task, and then use the learned $hat{phi}$ to reduce the required number of sample for a new task. We first discover a emph{Least-Activated-Feature-Abundance} (LAFA) criterion, denoted as $kappa$, with which we prove that a straightforward least-square algorithm learns a policy which is $tilde{O}(H^2sqrt{frac{mathcal{C}(Phi)^2 kappa d}{NT}+frac{kappa d}{n}})$ sub-optimal. Here $H$ is the planning horizon, $mathcal{C}(Phi)$ is $Phi$s complexity measure, $d$ is the dimension of the representation (usually $dll mathcal{C}(Phi)$) and $n$ is the number of samples for the new task. Thus the required $n$ is $O(kappa d H^4)$ for the sub-optimality to be close to zero, which is much smaller than $O(mathcal{C}(Phi)^2kappa d H^4)$ in the setting without multitask representation learning, whose sub-optimality gap is $tilde{O}(H^2sqrt{frac{kappa mathcal{C}(Phi)^2d}{n}})$. This theoretically explains the power of multitask representation learning in reducing sample complexity. Further, we note that to ensure high sample efficiency, the LAFA criterion $kappa$ should be small. In fact, $kappa$ varies widely in magnitude depending on the different sampling distribution for new task. This indicates adaptive sampling technique is important to make $kappa$ solely depend on $d$. Finally, we provide empirical results of a noisy grid-world environment to corroborate our theoretical findings.
Continual (sequential) training and multitask (simultaneous) training are often attempting to solve the same overall objective: to find a solution that performs well on all considered tasks. The main difference is in the training regimes, where conti
An effective approach in meta-learning is to utilize multiple train tasks to learn a good initialization for model parameters that can help solve unseen test tasks with very few samples by fine-tuning from this initialization. Although successful in
The goal of representation learning is different from the ultimate objective of machine learning such as decision making, it is therefore very difficult to establish clear and direct objectives for training representation learning models. It has been
Partial label learning (PLL) is a class of weakly supervised learning where each training instance consists of a data and a set of candidate labels containing a unique ground truth label. To tackle this problem, a majority of current state-of-the-art
This paper introduces MDP homomorphic networks for deep reinforcement learning. MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP. Current approaches to deep reinforcement lea