We demonstrate the ability of convolutional neural networks (CNNs) to mitigate systematics in the virial scaling relation and produce dynamical mass estimates of galaxy clusters with remarkably low bias and scatter. We present two models, CNN$_mathrm{1D}$ and CNN$_mathrm{2D}$, which leverage this deep learning tool to infer cluster masses from distributions of member galaxy dynamics. Our first model, CNN$_text{1D}$, infers cluster mass directly from the distribution of member galaxy line-of-sight velocities. Our second model, CNN$_text{2D}$, extends the input space of CNN$_text{1D}$ to learn on the joint distribution of galaxy line-of-sight velocities and projected radial distances. We train each model as a regression over cluster mass using a labeled catalog of realistic mock cluster observations generated from the MultiDark simulation and UniverseMachine catalog. We then evaluate the performance of each model on an independent set of mock observations selected from the same simulated catalog. The CNN models produce cluster mass predictions with lognormal residuals of scatter as low as $0.132$ dex, greater than a factor of 2 improvement over the classical $M$-$sigma$ power-law estimator. Furthermore, the CNN model reduces prediction scatter relative to similar machine learning approaches by up to $17%$ while executing in drastically shorter training and evaluation times (by a factor of 30) and producing considerably more robust mass predictions (improving prediction stability under variations in galaxy sampling rate by $30%$).