We introduce a general framework for approximating regular conditional distributions (RCDs). Our approximations of these RCDs are implemented by a new class of geometric deep learning models with inputs in $mathbb{R}^d$ and outputs in the Wasserstein-$1$ space $mathcal{P}_1(mathbb{R}^D)$. We find that the models built using our framework can approximate any continuous functions from $mathbb{R}^d$ to $mathcal{P}_1(mathbb{R}^D)$ uniformly on compacts, and quantitative rates are obtained. We identify two methods for avoiding the curse of dimensionality; i.e.: the number of parameters determining the approximating neural network depends only polynomially on the involved dimension and the approximation error. The first solution describes functions in $C(mathbb{R}^d,mathcal{P}_1(mathbb{R}^D))$ which can be efficiently approximated on any compact subset of $mathbb{R}^d$. Conversely, the second approach describes sets in $mathbb{R}^d$, on which any function in $C(mathbb{R}^d,mathcal{P}_1(mathbb{R}^D))$ can be efficiently approximated. Our framework is used to obtain an affirmative answer to the open conjecture of Bishop (1994); namely: mixture density networks are universal regular conditional distributions. The predictive performance of the proposed models is evaluated against comparable learning models on various probabilistic predictions tasks in the context of ELMs, model uncertainty, and heteroscedastic regression. All the results are obtained for more general input and output spaces and thus apply to geometric deep learning contexts.