Tests of sunspot number sequences: 3. Effects of regression procedures on the calibration of historic sunspot data


Abstract in English

We use sunspot group observations from the Royal Greenwich Observatory (RGO) to investigate the effects of intercalibrating data from observers with different visual acuities. The tests are made by counting the number of groups $R_B$ above a variable cut-off threshold of observed total whole-spot area (uncorrected for foreshortening) to simulate what a lower acuity observer would have seen. The synthesised annual means of $R_B$ are then re-scaled to the observed RGO group number $R_A$ using a variety of regression techniques. It is found that a very high correlation between $R_A$ and $R_B$ ($r_{AB}$ > 0.98) does not prevent large errors in the intercalibration (e.g. sunspot maximum values can be over 30% too large even for such levels of $r_{AB}$). In generating the backbone sunspot number, Svalgaard and Schatten [2015] force regression fits to pass through the scatter plot origin which generates unreliable fits (the residuals do not form a normal distribution) and causes sunspot cycle amplitudes to be exaggerated in the intercalibrated data. It is demonstrated that the use of Quantile-Quantile (Q-Q) plots to test for a normal distribution is a useful indicator of erroneous and misleading regression fits. Ordinary least squares linear fits, not forced to pass through the origin, are sometimes reliable (although the optimum method used is shown to be different when matching peak and average sunspot group numbers). However other fits are only reliable if non-linear regression is used. From these results it is entirely possible that the inflation of solar cycle amplitudes in the backbone group sunspot number as one goes back in time, relative to related solar-terrestrial parameters, is entirely caused by the use of inappropriate and non-robust regression techniques to calibrate the sunspot data.

Download