Deep learning has recently emerged as a disruptive technology to solve challenging radio resource management problems in wireless networks. However, the neural network architectures adopted by existing works suffer from poor scalability, generalization, and lack of interpretability. A long-standing approach to improve scalability and generalization is to incorporate the structures of the target task into the neural network architecture. In this paper, we propose to apply graph neural networks (GNNs) to solve large-scale radio resource management problems, supported by effective neural network architecture design and theoretical analysis. Specifically, we first demonstrate that radio resource management problems can be formulated as graph optimization problems that enjoy a universal permutation equivariance property. We then identify a class of neural networks, named emph{message passing graph neural networks} (MPGNNs). It is demonstrated that they not only satisfy the permutation equivariance property, but also can generalize to large-scale problems while enjoying a high computational efficiency. For interpretablity and theoretical guarantees, we prove the equivalence between MPGNNs and a class of distributed optimization algorithms, which is then used to analyze the performance and generalization of MPGNN-based methods. Extensive simulations, with power control and beamforming as two examples, will demonstrate that the proposed method, trained in an unsupervised manner with unlabeled samples, matches or even outperforms classic optimization-based algorithms without domain-specific knowledge. Remarkably, the proposed method is highly scalable and can solve the beamforming problem in an interference channel with $1000$ transceiver pairs within $6$ milliseconds on a single GPU.