When designing large-scale distributed controllers, the information-sharing constraints between sub-controllers, as defined by a communication topology interconnecting them, are as important as the controller itself. Controllers implemented using dense topologies typically outperform those implemented using sparse topologies, but it is also desirable to minimize the cost of controller deployment. Motivated by the above, we introduce a compact but expressive graph recurrent neural network (GRNN) parameterization of distributed controllers that is well suited for distributed controller and communication topology co-design. Our proposed parameterization enjoys a local and distributed architecture, similar to previous Graph Neural Network (GNN)-based parameterizations, while further naturally allowing for joint optimization of the distributed controller and communication topology needed to implement it. We show that the distributed controller/communication topology co-design task can be posed as an $ell_1$-regularized empirical risk minimization problem that can be efficiently solved using stochastic gradient methods. We run extensive simulations to study the performance of GRNN-based distributed controllers and show that (a) they achieve performance comparable to GNN-based controllers while having fewer free parameters, and (b) our method allows for performance/communication density tradeoff curves to be efficiently approximated.