In clinical care, obtaining a correct diagnosis is the first step towards successful treatment and, ultimately, recovery. Depending on the complexity of the case, the diagnostic phase can be lengthy and ridden with errors and delays. Such errors have a high likelihood to cause patients severe harm or even lead to their death and are estimated to cost the U.S. healthcare system several hundred billion dollars each year. To avoid diagnostic errors, physicians increasingly rely on diagnostic decision support systems drawing from heuristics, historic cases, textbooks, clinical guidelines and scholarly biomedical literature. The evaluation of such systems, however, is often conducted in an ad-hoc fashion, using non-transparent methodology, and proprietary data. This paper presents DC3, a collection of 31 extremely difficult diagnostic case challenges, manually compiled and solved by clinical experts. For each case, we present a number of temporally ordered physician-generated observations alongside the eventually confirmed true diagnosis. We additionally provide inferred dense relevance judgments for these cases among the PubMed collection of 27 million scholarly biomedical articles.