Bayesian Dark Knowledge is a method for compressing the posterior predictive distribution of a neural network model into a more compact form. Specifically, the method attempts to compress a Monte Carlo approximation to the parameter posterior into a single network representing the posterior predictive distribution. Further, the authors show that this approach is successful in the classification setting using a student network whose architecture matches that of a single network in the teacher ensemble. In this work, we examine the robustness of Bayesian Dark Knowledge to higher levels of posterior uncertainty. We show that using a student network that matches the teacher architecture may fail to yield acceptable performance. We study an approach to close the resulting performance gap by increasing student model capacity.