Methods for solving PDEs using neural networks have recently become a very important topic. We provide an a priori error analysis for such methods which is based on the $mathcal{K}_1(mathbb{D})$-norm of the solution. We show that the resulting constrained optimization problem can be efficiently solved using a greedy algorithm, which replaces stochastic gradient descent. Following this, we show that the error arising from discretizing the energy integrals is bounded both in the deterministic case, i.e. when using numerical quadrature, and also in the stochastic case, i.e. when sampling points to approximate the integrals. In the later case, we use a Rademacher complexity analysis, and in the former we use standard numerical quadrature bounds. This extends existing results to methods which use a general dictionary of functions to learn solutions to PDEs and importantly gives a consistent analysis which incorporates the optimization, approximation, and generalization aspects of the problem. In addition, the Rademacher complexity analysis is simplified and generalized, which enables application to a wide range of problems.