Bias-Variance Trade-off and Model Selection for Proton Radius Extractions


Abstract in English

Intuitively, a scientist might assume that a more complex regression model will necessarily yield a better predictive model of experimental data. Herein, we disprove this notion in the context of extracting the proton charge radius from charge form factor data. Using a Monte Carlo study, we show that a simpler regression model can in certain cases be the better predictive model. This is especially true with noisy data where the complex model will fit the noise instead of the physical signal. Thus, in order to select the appropriate regression model to employ, a clear technique should be used such as the Akaike information criterion or Bayesian information criterion, and ideally selected previous to seeing the results. Also, to ensure a reasonable fit, the scientist should also make regression quality plots, such as residual plots, and not just rely on a single criterion such as reduced chi2. When we apply these techniques to low four-momentum transfer cross section data, we find a proton radius that is consistent with the muonic Lamb shift results. While presented for the case of proton radius extraction, these concepts are applicable in general and can be used to illustrate the necessity of balancing bias and variance when building a regression model and validating results, ideas that are at the heart of modern machine learning algorithms.

Download