ﻻ يوجد ملخص باللغة العربية
We design a simple reinforcement learning (RL) agent that implements an optimistic version of $Q$-learning and establish through regret analysis that this agent can operate with some level of competence in any environment. While we leverage concepts from the literature on provably efficient RL, we consider a general agent-environment interface and provide a novel agent design and analysis. This level of generality positions our results to inform the design of future agents for operation in complex real environments. We establish that, as time progresses, our agent performs competitively relative to policies that require longer times to evaluate. The time it takes to approach asymptotic performance is polynomial in the complexity of the agents state representation and the time required to evaluate the best policy that the agent can represent. Notably, there is no dependence on the complexity of the environment. The ultimate per-period performance loss of the agent is bounded by a constant multiple of a measure of distortion introduced by the agents state representation. This work is the first to establish that an algorithm approaches this asymptotic condition within a tractable time frame.
Object-centric representations have recently enabled significant progress in tackling relational reasoning tasks. By building a strong object-centric inductive bias into neural architectures, recent efforts have improved generalization and data effic
High sample complexity remains a barrier to the application of reinforcement learning (RL), particularly in multi-agent systems. A large body of work has demonstrated that exploration mechanisms based on the principle of optimism under uncertainty ca
In multi-agent reinforcement learning, the problem of learning to act is particularly difficult because the policies of co-players may be heavily conditioned on information only observed by them. On the other hand, humans readily form beliefs about t
Reinforcement learning in cooperative multi-agent settings has recently advanced significantly in its scope, with applications in cooperative estimation for advertising, dynamic treatment regimes, distributed control, and federated learning. In this
Social learning is a key component of human and animal intelligence. By taking cues from the behavior of experts in their environment, social learners can acquire sophisticated behavior and rapidly adapt to new circumstances. This paper investigates