We conduct a series of comparisons between spectroscopic and photometric observations of globular clusters and stellar models to examine their predictive power. Data from medium-to-high resolution spectroscopic surveys of lithium allow us to investigate first dredge-up and extra mixing in two clusters well separated in metallicity. Abundances at first dredge-up are satisfactorily reproduced but there is preliminary evidence to suggest that the models overestimate the luminosity at which the surface composition first changes in the lowest-metallicity system. Our models also begin extra mixing at luminosities that are too high, demonstrating a significant discrepancy with observations at low metallicity. We model the abundance changes during extra mixing as a thermohaline process and determine that the usual diffusive form of this mechanism cannot simultaneously reproduce both the carbon and lithium observations. Hubble Space Telescope photometry provides turnoff and bump magnitudes in a large number of globular clusters and offers the opportunity to better test stellar modelling as function of metallicity. We directly compare the predicted main-sequence turn-off and bump magnitudes as well as the distance-independent parameter $Delta M_V ~^{rm{MSTO}}_{rm{bump}}$. We require 15 Gyr isochrones to match the main-sequence turn-off magnitude in some clusters and cannot match the bump in low-metallicity systems. Changes to the distance modulus, metallicity scale and bolometric corrections may impact on the direct comparisons but $Delta M_V ~^{rm{MSTO}}_{rm{bump}}$, which is also underestimated from the models, can only be improved through changes to the input physics. Overshooting at the base of the convective envelope with an efficiency that is metallicity dependent is required to reproduce the empirically determined value of $Delta M_V ~^{rm{MSTO}}_{rm{bump}}$.