Ranking Policy Decisions


Abstract in English

Policies trained via Reinforcement Learning (RL) are often needlessly complex, making them more difficult to analyse and interpret. In a run with $n$ time steps, a policy will decide $n$ times on an action to take, even when only a tiny subset of these decisions deliver value over selecting a simple default action. Given a pre-trained policy, we propose a black-box method based on statistical fault localisation that ranks the states of the environment according to the importance of decisions made in those states. We evaluate our ranking method by creating new, simpler policies by pruning decisions identified as unimportant, and measure the impact on performance. Our experimental results on a diverse set of standard benchmarks (gridworld, CartPole, Atari games) show that in some cases less than half of the decisions made contribute to the expected reward. We furthermore show that the decisions made in the most frequently visited states are not the most important for the expected reward.

Download