Computationally intractable tasks are often encountered in physics and optimization. Such tasks often comprise a cost function to be optimized over a so-called feasible set, which is specified by a set of constraints. This may yield, in general, to difficult and non-convex optimization tasks. A number of standard methods are used to tackle such problems: variational approaches focus on parameterizing a subclass of solutions within the feasible set; in contrast, relaxation techniques have been proposed to approximate it from outside, thus complementing the variational approach by providing ultimate bounds to the global optimal solution. In this work, we propose a novel approach combining the power of relaxation techniques with deep reinforcement learning in order to find the best possible bounds within a limited computational budget. We illustrate the viability of the method in the context of finding the ground state energy of many-body quantum systems, a paradigmatic problem in quantum physics. We benchmark our approach against other classical optimization algorithms such as breadth-first search or Monte-Carlo, and we characterize the effect of transfer learning. We find the latter may be indicative of phase transitions, with a completely autonomous approach. Finally, we provide tools to generalize the approach to other common applications in the field of quantum information processing.