No Arabic abstract
In a recent paper [textit{M. Cristelli, A. Zaccaria and L. Pietronero, Phys. Rev. E 85, 066108 (2012)}], Cristelli textit{et al.} analysed relation between skewness and kurtosis for complex dynamical systems and identified two power-law regimes of non-Gaussianity, one of which scales with an exponent of 2 and the other is with $4/3$. Finally the authors concluded that the observed relation is a universal fact in complex dynamical systems. Here, we test the proposed universal relation between skewness and kurtosis with large number of synthetic data and show that in fact it is not universal and originates only due to the small number of data points in the data sets considered. The proposed relation is tested using two different non-Gaussian distributions, namely $q$-Gaussian and Levy distributions. We clearly show that this relation disappears for sufficiently large data sets provided that the second moment of the distribution is finite. We find that, contrary to the claims of Cristelli textit{et al.} regarding a power-law scaling regime, kurtosis saturates to a single value, which is of course different from the Gaussian case ($K=3$), as the number of data is increased. On the other hand, if the second moment of the distribution is infinite, then the kurtosis seems to never converge to a single value. The converged kurtosis value for the finite second moment distributions and the number of data points needed to reach this value depend on the deviation of the original distribution from the Gaussian case. We also argue that the use of kurtosis to compare distributions to decide which one deviates from the Gaussian more can lead to incorrect results even for finite second moment distributions for small data sets, whereas it is totally misleading for infinite second moment distributions where the difference depends on $N$ for all finite $N$.
We investigate the use of data-driven likelihoods to bypass a key assumption made in many scientific analyses, which is that the true likelihood of the data is Gaussian. In particular, we suggest using the optimization targets of flow-based generative models, a class of models that can capture complex distributions by transforming a simple base distribution through layers of nonlinearities. We call these flow-based likelihoods (FBL). We analyze the accuracy and precision of the reconstructed likelihoods on mock Gaussian data, and show that simply gauging the quality of samples drawn from the trained model is not a sufficient indicator that the true likelihood has been learned. We nevertheless demonstrate that the likelihood can be reconstructed to a precision equal to that of sampling error due to a finite sample size. We then apply FBLs to mock weak lensing convergence power spectra, a cosmological observable that is significantly non-Gaussian (NG). We find that the FBL captures the NG signatures in the data extremely well, while other commonly used data-driven likelihoods, such as Gaussian mixture models and independent component analysis, fail to do so. This suggests that works that have found small posterior shifts in NG data with data-driven likelihoods such as these could be underestimating the impact of non-Gaussianity in parameter constraints. By introducing a suite of tests that can capture different levels of NG in the data, we show that the success or failure of traditional data-driven likelihoods can be tied back to the structure of the NG in the data. Unlike other methods, the flexibility of the FBL makes it successful at tackling different types of NG simultaneously. Because of this, and consequently their likely applicability across datasets and domains, we encourage their use for inference when sufficient mock data are available for training.
We propose a method to obtain phase portraits for stochastic systems. Starting from the Fokker-Planck equation, we separate the dynamics into a convective and a diffusive part. We show that stable and unstable fixed points of the convective field correspond to maxima and minima of the stationary probability distribution if the probability current vanishes at these points. Stochastic phase portraits, which are vector plots of the convective field, therefore indicate the extrema of the stationary distribution and can be used to identify stochastic bifurcations that change the number and stability of these extrema. We show that limit cycles in stochastic phase portraits can indicate ridges of the probability distribution, and we identify a novel type of stochastic bifurcations, where the probability maximum moves to the edge of the system through a gap between the two nullclines of the convective field.
Unsupervised learning makes manifest the underlying structure of data without curated training and specific problem definitions. However, the inference of relationships between data points is frustrated by the `curse of dimensionality in high-dimensions. Inspired by replica theory from statistical mechanics, we consider replicas of the system to tune the dimensionality and take the limit as the number of replicas goes to zero. The result is the intensive embedding, which is not only isometric (preserving local distances) but allows global structure to be more transparently visualized. We develop the Intensive Principal Component Analysis (InPCA) and demonstrate clear improvements in visualizations of the Ising model of magnetic spins, a neural network, and the dark energy cold dark matter ({Lambda}CDM) model as applied to the Cosmic Microwave Background.
An empirical analysis of interest rates in money and capital markets is performed. We investigate a set of 34 different weekly interest rate time series during a time period of 16 years between 1982 and 1997. Our study is focused on the collective behavior of the stochastic fluctuations of these time-series which is investigated by using a clustering linkage procedure. Without any a priori assumption, we individuate a meaningful separation in 6 main clusters organized in a hierarchical structure.
Methods. We perform numerical simulations of the evolution of the cosmic web for the conventional LCDM model. The simulations cover a wide range of box sizes L = 256 - 4000 Mpc/h, mass and force resolutions and epochs from very early moments z = 30 to the present moment z = 0. We calculate density fields with various smoothing lengths to find the dependence of the density field on smoothing scale. We calculate PDF and its moments - variance, skewness and kurtosis. Results. We focus on the third (skewness S) and fourth (kurtosis K) moments of the distribution functions: their dependence on the smoothing scale, the amplitude of fluctuations and the redshift. During the evolution the reduced skewness $S_3= S/sigma$ and reduced kurtosis $S_4=K/sigma^2$ present a complex behaviour: at a fixed redshift curves of $S_3(sigma)$ and $S_4(sigma)$ steeply increase with $sigma$ at $sigmale 1$ and then flatten out and become constant at $sigmage2$. If we fix the smoothing scale $R_t$, then after reaching the maximum at $sigmaapprox 2$, the curves at large $sigma$ start to gradually decline. We provide accurate fits for the evolution of $S_{3,4}(sigma,z)$. Skewness and kurtosis approach at early epochs constant levels, depending on smoothing length: $S_3(sigma) approx 3$ and $S_4(sigma) approx 15$. Conclusions. Most of statistics of dark matter clustering (e.g., halo mass function or concentration-mass relation) are nearly universal: they mostly depend on the $sigma$ with the relatively modest correction to explicit dependence on the redshift. We find just the opposite for skewness and kurtosis: the dependence of moments on evolutionary epoch $z$ and smoothing length $R_t$ is very different, together they determine the evolution of $S_{3,4}(sigma)$ uniquely. The evolution of $S_3$ and $S_4$ cannot be described by current theoretical approximations.