No Arabic abstract
Scientists use mathematical modelling to understand and predict the properties of complex physical systems. In highly parameterised models there often exist relationships between parameters over which model predictions are identical, or nearly so. These are known as structural or practical unidentifiabilities, respectively. They are hard to diagnose and make reliable parameter estimation from data impossible. They furthermore imply the existence of an underlying model simplification. We describe a scalable method for detecting unidentifiabilities, and the functional relations defining them, for generic models. This allows for model simplification, and appreciation of which parameters (or functions thereof) cannot be estimated from data. Our algorithm can identify features such as redundant mechanisms and fast timescale subsystems, as well as the regimes in which such approximations are valid. We base our algorithm on a novel quantification of regional parametric sensitivity: multiscale sloppiness. Traditionally, the link between parametric sensitivity and the conditioning of the parameter estimation problem is made locally, through the Fisher Information Matrix. This is valid in the regime of infinitesimal measurement uncertainty. We demonstrate the duality between multiscale sloppiness and the geometry of confidence regions surrounding parameter estimates made where measurement uncertainty is non-negligible. Further theoretical relationships are provided linking multiscale sloppiness to the Likelihood-ratio test. From this, we show that a local sensitivity analysis (as typically done) is insufficient for determining the reliability of parameter estimation, even with simple (non)linear systems. Our algorithm provides a tractable alternative. We finally apply our methods to a large-scale, benchmark Systems Biology model of NF-$kappa$B, uncovering previously unknown unidentifiabilities.
Fitting a simplifying model with several parameters to real data of complex objects is a highly nontrivial task, but enables the possibility to get insights into the objects physics. Here, we present a method to infer the parameters of the model, the model error as well as the statistics of the model error. This method relies on the usage of many data sets in a simultaneous analysis in order to overcome the problems caused by the degeneracy between model parameters and model error. Errors in the modeling of the measurement instrument can be absorbed in the model error allowing for applications with complex instruments.
We show that density models describing multiple observables with (i) hard boundaries and (ii) dependence on external parameters may be created using an auto-regressive Gaussian mixture model. The model is designed to capture how observable spectra are deformed by hypothesis variations, and is made more expressive by projecting data onto a configurable latent space. It may be used as a statistical model for scientific discovery in interpreting experimental observations, for example when constraining the parameters of a physical model or tuning simulation parameters according to calibration data. The model may also be sampled for use within a Monte Carlo simulation chain, or used to estimate likelihood ratios for event classification. The method is demonstrated on simulated high-energy particle physics data considering the anomalous electroweak production of a $Z$ boson in association with a dijet system at the Large Hadron Collider, and the accuracy of inference is tested using a realistic toy example. The developed methods are domain agnostic; they may be used within any field to perform simulation or inference where a dataset consisting of many real-valued observables has conditional dependence on external parameters.
We review briefly the concepts underlying complex systems and probability distributions. The later are often taken as the first quantitative characteristics of complex systems, allowing one to detect the possible occurrence of regularities providing a step toward defining a classification of the different levels of organization (the ``universality classes). A rapid survey covers the Gaussian law, the power law and the stretched exponential distributions. The fascination for power laws is then explained, starting from the statistical physics approach to critical phenomena, out-of-equilibrium phase transitions, self-organized criticality, and ending with a large but not exhaustive list of mechanisms leading to power law distributions. A check-list for testing and qualifying a power law distribution from your data is described in 7 steps. This essay enlarges the description of distributions by proposing that ``kings, i.e., events even beyond the extrapolation of the power law tail, may reveal an information which is complementary and perhaps sometimes even more important than the power law distribution. We conclude a list of future directions.
Many complex systems, including networks, are not static but can display strong fluctuations at various time scales. Characterizing the dynamics in complex networks is thus of the utmost importance in the understanding of these networks and of the dynamical processes taking place on them. In this article, we study the example of the US airport network in the time period 1990-2000. We show that even if the statistical distributions of most indicators are stationary, an intense activity takes place at the local (`microscopic) level, with many disappearing/appearing connections (links) between airports. We find that connections have a very broad distribution of lifetimes, and we introduce a set of metrics to characterize the links dynamics. We observe in particular that the links which disappear have essentially the same properties as the ones which appear, and that links which connect airports with very different traffic are very volatile. Motivated by this empirical study, we propose a model of dynamical networks, inspired from previous studies on firm growth, which reproduces most of the empirical observations both for the stationary statistical distributions and for the dynamical properties.
Scaling regions -- intervals on a graph where the dependent variable depends linearly on the independent variable -- abound in dynamical systems, notably in calculations of invariants like the correlation dimension or a Lyapunov exponent. In these applications, scaling regions are generally selected by hand, a process that is subjective and often challenging due to noise, algorithmic effects, and confirmation bias. In this paper, we propose an automated technique for extracting and characterizing such regions. Starting with a two-dimensional plot -- e.g., the values of the correlation integral, calculated using the Grassberger-Procaccia algorithm over a range of scales -- we create an ensemble of intervals by considering all possible combinations of endpoints, generating a distribution of slopes from least-squares fits weighted by the length of the fitting line and the inverse square of the fit error. The mode of this distribution gives an estimate of the slope of the scaling region (if it exists). The endpoints of the intervals that correspond to the mode provide an estimate for the extent of that region. When there is no scaling region, the distributions will be wide and the resulting error estimates for the slope will be large. We demonstrate this method for computations of dimension and Lyapunov exponent for several dynamical systems, and show that it can be useful in selecting values for the parameters in time-delay reconstructions.