ﻻ يوجد ملخص باللغة العربية
How do you incentivize self-interested agents to $textit{explore}$ when they prefer to $textit{exploit}$ ? We consider complex exploration problems, where each agent faces the same (but unknown) MDP. In contrast with traditional formulations of reinforcement learning, agents control the choice of policies, whereas an algorithm can only issue recommendations. However, the algorithm controls the flow of information, and can incentivize the agents to explore via information asymmetry. We design an algorithm which explores all reachable states in the MDP. We achieve provable guarantees similar to those for incentivizing exploration in static, stateless exploration problems studied previously.
This paper investigates whether learning contingency-awareness and controllable aspects of an environment can lead to better exploration in reinforcement learning. To investigate this question, we consider an instantiation of this hypothesis evaluate
We initiate the study of multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system extending recent results for the special case of stochastic bandits. We p
Realistic environments often provide agents with very limited feedback. When the environment is initially unknown, the feedback, in the beginning, can be completely absent, and the agents may first choose to devote all their effort on exploring effic
Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed dataset, without interactions with the system. An agent in this setting should avoid selecting actions whose consequences cannot be predicted from the data. This is
Sparse-reward domains are challenging for reinforcement learning algorithms since significant exploration is needed before encountering reward for the first time. Hierarchical reinforcement learning can facilitate exploration by reducing the number o