Understanding natural language requires common sense, one aspect of which is the ability to discern the plausibility of events. While distributional models---most recently pre-trained, Transformer language models---have demonstrated improvements in modeling event plausibility, their performance still falls short of humans'. In this work, we show that Transformer-based plausibility models are markedly inconsistent across the conceptual classes of a lexical hierarchy, inferring that a person breathing'' is plausible while a dentist breathing'' is not, for example. We find this inconsistency persists even when models are softly injected with lexical knowledge, and we present a simple post-hoc method of forcing model consistency that improves correlation with human plausibility judgements.