Ensemble learning is gaining renewed interests in recent years. This paper presents EnsembleBench, a holistic framework for evaluating and recommending high diversity and high accuracy ensembles. The design of EnsembleBench offers three novel features: (1) EnsembleBench introduces a set of quantitative metrics for assessing the quality of ensembles and for comparing alternative ensembles constructed for the same learning tasks. (2) EnsembleBench implements a suite of baseline diversity metrics and optimized diversity metrics for identifying and selecting ensembles with high diversity and high quality, making it an effective framework for benchmarking, evaluating and recommending high diversity model ensembles. (3) Four representative ensemble consensus methods are provided in the first release of EnsembleBench, enabling empirical study on the impact of consensus methods on ensemble accuracy. A comprehensive experimental evaluation on popular benchmark datasets demonstrates the utility and effectiveness of EnsembleBench for promoting high diversity ensembles and boosting the overall performance of selected ensembles.