ﻻ يوجد ملخص باللغة العربية
We introduce Robust Restless Bandits, a challenging generalization of restless multi-arm bandits (RMAB). RMABs have been widely studied for intervention planning with limited resources. However, most works make the unrealistic assumption that the transition dynamics are known perfectly, restricting the applicability of existing methods to real-world scenarios. To make RMABs more useful in settings with uncertain dynamics: (i) We introduce the Robust RMAB problem and develop solutions for a minimax regret objective when transitions are given by interval uncertainties; (ii) We develop a double oracle algorithm for solving Robust RMABs and demonstrate its effectiveness on three experimental domains; (iii) To enable our double oracle approach, we introduce RMABPPO, a novel deep reinforcement learning algorithm for solving RMABs. RMABPPO hinges on learning an auxiliary $lambda$-network that allows each arms learning to decouple, greatly reducing sample complexity required for training; (iv) Under minimax regret, the adversary in the double oracle approach is notoriously difficult to implement due to non-stationarity. To address this, we formulate the adversary oracle as a multi-agent reinforcement learning problem and solve it with a multi-agent extension of RMABPPO, which may be of independent interest as the first known algorithm for this setting. Code is available at https://github.com/killian-34/RobustRMAB.
This paper proposes adversarial attacks for Reinforcement Learning (RL) and then improves the robustness of Deep Reinforcement Learning algorithms (DRL) to parameter uncertainties with the help of these attacks. We show that even a naively engineered
Multi-action restless multi-armed bandits (RMABs) are a powerful framework for constrained resource allocation in which $N$ independent processes are managed. However, previous work only study the offline setting where problem dynamics are known. We
Reinforcement learning agents are faced with two types of uncertainty. Epistemic uncertainty stems from limited data and is useful for exploration, whereas aleatoric uncertainty arises from stochastic environments and must be accounted for in risk-se
Deep neural networks, including reinforcement learning agents, have been proven vulnerable to small adversarial changes in the input, thus making deploying such networks in the real world problematic. In this paper, we propose RADIAL-RL, a method to
Robust Reinforcement Learning aims to find the optimal policy with some extent of robustness to environmental dynamics. Existing learning algorithms usually enable the robustness through disturbing the current state or simulating environmental parame