We train a neural network as the universal exchange-correlation functional of density-functional theory that simultaneously reproduces both the exact exchange-correlation energy and potential. This functional is extremely non-local, but retains the computational scaling of traditional local or semi-local approximations. It therefore holds the promise of solving some of the delocalization problems that plague density-functional theory, while maintaining the computational efficiency that characterizes the Kohn-Sham equations. Furthermore, by using automatic differentiation, a capability present in modern machine-learning frameworks, we impose the exact mathematical relation between the exchange-correlation energy and the potential, leading to a fully consistent method. We demonstrate the feasibility of our approach by looking at one-dimensional systems with two strongly-correlated electrons, where density-functional methods are known to fail, and investigate the behavior and performance of our functional by varying the degree of non-locality.