Quantum error correction is widely thought to be the key to fault-tolerant quantum computation. However, determining the most suited encoding for unknown error channels or specific laboratory setups is highly challenging. Here, we present a reinforcement learning framework for optimizing and fault-tolerantly adapting quantum error correction codes. We consider a reinforcement learning agent tasked with modifying a family of surface code quantum memories until a desired logical error rate is reached. Using efficient simulations with about 70 data qubits with arbitrary connectivity, we demonstrate that such a reinforcement learning agent can determine near-optimal solutions, in terms of the number of data qubits, for various error models of interest. Moreover, we show that agents trained on one setting are able to successfully transfer their experience to different settings. This ability for transfer learning showcases the inherent strengths of reinforcement learning and the applicability of our approach for optimization from off-line simulations to on-line laboratory settings.