Geometric deep learning has attracted significant attention in recent years, in part due to the availability of exotic data types for which traditional neural network architectures are not well suited. Our goal in this paper is to generalize convolutional neural networks (CNN) to the manifold-valued image case which arises commonly in medical imaging and computer vision applications. Explicitly, the input data to the network is an image where each pixel value is a sample from a Riemannian manifold. To achieve this goal, we must generalize the basic building block of traditional CNN architectures, namely, the weighted combinations operation. To this end, we develop a tangent space combination operation which is used to define a convolution operation on manifold-valued images that we call, the Manifold-Valued Convolution (MVC). We prove theoretical properties of the MVC operation, including equivariance to the action of the isometry group admitted by the manifold and characterizing when compositions of MVC layers collapse to a single layer. We present a detailed description of how to use MVC layers to build full, multi-layer neural networks that operate on manifold-valued images, which we call the MVC-net. Further, we empirically demonstrate superior performance of the MVC-nets in medical imaging and computer vision tasks.