We study the luminosity gap, dm12, between the first and second ranked galaxies in a sample of 59 massive galaxy clusters, using data from the Hale Telescope, HST, Chandra, and Spitzer. We find that the dm12 distribution, p(dm12), is a declining function of dm12, to which we fitted a straight line: p(dm12) propto -(0.13+/-0.02)dm12. The fraction of clusters with large luminosity gaps is p(dm12>=1)=0.37+/-0.08, which represents a 3sigma excess over that obtained from Monte Carlo simulations of a Schechter function that matches the mean cluster galaxy luminosity function. We also identify four clusters with extreme luminosity gaps, dm12>=2, giving a fraction of p(dm12>=2)=0.07+0.05-0.03. More generally, large luminosity gap clusters are relatively homogeneous, with elliptical/disky brightest cluster galaxies (BCGs), cuspy gas density profiles (i.e. strong cool cores), high concentrations, and low substructure fractions. In contrast, small luminosity gap clusters are heterogeneous, spanning the full range of boxy/elliptical/disky BCG morphologies, the full range of cool core strengths and dark matter concentrations, and have large substructure fractions. Taken together, these results imply that the amplitude of the luminosity gap is a function of both the formation epoch, and the recent infall history of the cluster. BCG dominance is therefore a phase that a cluster may evolve through, and is not an evolutionary cul-de-sac. We also compare our results with semi-analytic model predictions based on the Millennium Simulation. None of the models are able to reproduce all of the observational results, underlining the inability of current models to match the empirical properties of BCGs. We identify the strength of AGN feedback and the efficiency with which cluster galaxies are replenished after they merge with the BCG in each model as possible causes of these discrepancies. [Abridged]