Deep learning driven by large neural network models is overtaking traditional machine learning methods for understanding unstructured and perceptual data domains such as speech, text, and vision. At the same time, the as-a-Service-based business model on the cloud is fundamentally transforming the information technology industry. These two trends: deep learning, and as-a-service are colliding to give rise to a new business model for cognitive application delivery: deep learning as a service in the cloud. In this paper, we will discuss the details of the software architecture behind IBMs deep learning as a service (DLaaS). DLaaS provides developers the flexibility to use popular deep learning libraries such as Caffe, Torch and TensorFlow, in the cloud in a scalable and resilient manner with minimal effort. The platform uses a distribution and orchestration layer that facilitates learning from a large amount of data in a reasonable amount of time across compute nodes. A resource provisioning layer enables flexible job management on heterogeneous resources, such as graphics processing units (GPUs) and central processing units (CPUs), in an infrastructure as a service (IaaS) cloud.