Fisher forecasts are a common tool in cosmology with applications ranging from survey planning to the development of new cosmological probes. While frequently adopted, they are subject to numerical instabilities that need to be carefully investigated to ensure accurate and reproducible results. This research note discusses these challenges using the example of a weak lensing data vector and proposes procedures that can help in their solution.
Starting from the Fisher matrix for counts in cells, I derive the full Fisher matrix for surveys of multiple tracers of large-scale structure. The key assumption is that the inverse of the covariance of the galaxy counts is given by the naive matrix inverse of the covariance in a mixed position-space and Fourier-space basis. I then compute the Fisher matrix for the power spectrum in bins of the three-dimensional wavenumber k; the Fisher matrix for functions of position x (or redshift z) such as the linear bias of the tracers and/or the growth function; and the cross-terms of the Fisher matrix that expresses the correlations between estimations of the power spectrum and estimations of the bias. When the bias and growth function are fully specified, and the Fourier-space bins are large enough that the covariance between them can be neglected, the Fisher matrix for the power spectrum reduces to the widely used result that was first derived by Feldman, Kaiser and Peacock (1994). Assuming isotropy, an exact calculation of the Fisher matrix can be performed in the case of a constant-density, volume-limited survey. I then show how the exact Fisher matrix in the general case can be obtained in terms of a series of volume-limited surveys.
We compare Baryonic Acoustic Oscillation (BAO) and Redshift Space Distortion (RSD) measurements from recent galaxy surveys with their Fisher matrix based predictions. Measurements of the position of the BAO signal lead to constraints on the comoving angular diameter distance $D_{M}$ and the Hubble distance $D_{H}$ that agree well with their Fisher matrix based expectations. However, RSD-based measurements of the growth rate $f sigma_{8}$ do not agree with the predictions made before the surveys were undertaken, even when repeating those predictions using the actual survey parameters. We show that this is due to a combination of effects including degeneracies with the geometric parameters $D_{M}$ and $D_{H}$, and optimistic assumptions about the scale to which the linear signal can be extracted. We show that measurements using current data and large-scale modelling techniques extract an equivalent amount of signal to that in the linear regime for $k < 0.08 ,h,{rm Mpc}^{-1}$, remarkably independent of the sample properties and redshifts covered.
In a Bayesian context, theoretical parameters are correlated random variables. Then, the constraints on one parameter can be improved by either measuring this parameter more precisely - or by measuring the other parameters more precisely. Especially in the case of many parameters, a lengthy process of guesswork is then needed to determine the most efficient way to improve one parameters constraints. In this short article, we highlight an extremely simple analytical expression that replaces the guesswork and that facilitates a deeper understanding of optimization with interdependent parameters.
We show how to obtain constraints on $beta=f/b$, the ratio of the matter growth rate and the bias that quantifies the linear redshift-space distortions, that are independent of the cosmological model, using multiple tracers of large-scale structure. For a single tracer the uncertainties on $beta$ are constrained by the uncertainties in the amplitude and shape of the power spectrum, which is limited by cosmic variance. However, for two or more tracers this limit does not apply, since taking the ratio of power spectra cosmic variance cancels out, and in the linear (Kaiser) approximation one measures directly the quantity $(1+ beta_1 mu^2)^2/(1+ beta_2 mu^2)^2$, where $mu$ is the angle of a given mode with the line of sight. We provide analytic formulae for the Fisher matrix for one and two tracers, and quantify the signal-to-noise ratio needed to make effective use of the multiple-tracer technique. We also forecast the errors on $beta$ for a survey like Euclid.
The early phase of training a deep neural network has a dramatic effect on the local curvature of the loss function. For instance, using a small learning rate does not guarantee stable optimization because the optimization trajectory has a tendency to steer towards regions of the loss surface with increasing local curvature. We ask whether this tendency is connected to the widely observed phenomenon that the choice of the learning rate strongly influences generalization. We first show that stochastic gradient descent (SGD) implicitly penalizes the trace of the Fisher Information Matrix (FIM), a measure of the local curvature, from the start of training. We argue it is an implicit regularizer in SGD by showing that explicitly penalizing the trace of the FIM can significantly improve generalization. We highlight that poor final generalization coincides with the trace of the FIM attaining a large value early in training, to which we refer as catastrophic Fisher explosion. Finally, to gain insight into the regularization effect of penalizing the trace of the FIM, we show that it limits memorization by reducing the learning speed of examples with noisy labels more than that of the examples with clean labels.