No Arabic abstract
We present a model-free data-driven inference method that enables inferences on system outcomes to be derived directly from empirical data without the need for intervening modeling of any type, be it modeling of a material law or modeling of a prior distribution of material states. We specifically consider physical systems with states characterized by points in a phase space determined by the governing field equations. We assume that the system is characterized by two likelihood measures: one $mu_D$ measuring the likelihood of observing a material state in phase space; and another $mu_E$ measuring the likelihood of states satisfying the field equations, possibly under random actuation. We introduce a notion of intersection between measures which can be interpreted to quantify the likelihood of system outcomes. We provide conditions under which the intersection can be characterized as the athermal limit $mu_infty$ of entropic regularizations $mu_beta$, or thermalizations, of the product measure $mu = mu_Dtimes mu_E$ as $beta to +infty$. We also supply conditions under which $mu_infty$ can be obtained as the athermal limit of carefully thermalized $(mu_{h,beta_h})$ sequences of empirical data sets $(mu_h)$ approximating weakly an unknown likelihood function $mu$. In particular, we find that the cooling sequence $beta_h to +infty$ must be slow enough, corresponding to quenching, in order for the proper limit $mu_infty$ to be delivered. Finally, we derive explicit analytic expressions for expectations $mathbb{E}[f]$ of outcomes $f$ that are explicit in the data, thus demonstrating the feasibility of the model-free data-driven paradigm as regards making convergent inferences directly from the data without recourse to intermediate modeling steps.
The data-driven computing paradigm initially introduced by Kirchdoerfer and Ortiz (2016) enables finite element computations in solid mechanics to be performed directly from material data sets, without an explicit material model. From a computational effort point of view, the most challenging task is the projection of admissible states at material points onto their closest states in the material data set. In this study, we compare and develop several possible data structures for solving the nearest-neighbor problem. We show that approximate nearest-neighbor (ANN) algorithms can accelerate material data searches by several orders of magnitude relative to exact searching algorithms. The approximations are suggested by--and adapted to--the structure of the data-driven iterative solver and result in no significant loss of solution accuracy. We assess the performance of the ANN algorithm with respect to material data set size with the aid of a 3D elasticity test case. We show that computations on a single processor with up to one billion material data points are feasible within a few seconds execution time with a speedup of more than 106 with respect to exact k-d trees.
Programmers often leverage data structure libraries that provide useful and reusable abstractions. Modular verification of programs that make use of these libraries naturally rely on specifications that capture important properties about how the library expects these data structures to be accessed and manipulated. However, these specifications are often missing or incomplete, making it hard for clients to be confident they are using the library safely. When library source code is also unavailable, as is often the case, the challenge to infer meaningful specifications is further exacerbated. In this paper, we present a novel data-driven abductive inference mechanism that infers specifications for library methods sufficient to enable verification of the librarys clients. Our technique combines a data-driven learning-based framework to postulate candidate specifications, along with SMT-provided counterexamples to refine these candidates, taking special care to prevent generating specifications that overfit to sampled tests. The resulting specifications form a minimal set of requirements on the behavior of library implementations that ensures safety of a particular client program. Our solution thus provides a new multi-abduction procedure for precise specification inference of data structure libraries guided by client-side verification tasks. Experimental results on a wide range of realistic OCaml data structure programs demonstrate the effectiveness of the approach.
We discuss two projects in non-linear cosmostatistics applicable to very large surveys of galaxies. The first is a Bayesian reconstruction of galaxy redshifts and their number density distribution from approximate, photometric redshift data. The second focuses on cosmic voids and uses them to construct cosmic spheres that allow reconstructing the expansion history of the Universe using the Alcock-Paczynski test. In both cases we find that non-linearities enable the methods or enhance the results: non-linear gravitational evolution creates voids and our photo-z reconstruction works best in the highest density (and hence most non-linear) portions of our simulations.
In this paper we show how nuisance parameter marginalized posteriors can be inferred directly from simulations in a likelihood-free setting, without having to jointly infer the higher-dimensional interesting and nuisance parameter posterior first and marginalize a posteriori. The result is that for an inference task with a given number of interesting parameters, the number of simulations required to perform likelihood-free inference can be kept (roughly) the same irrespective of the number of additional nuisances to be marginalized over. To achieve this we introduce two extensions to the standard likelihood-free inference set-up. Firstly we show how nuisance parameters can be re-cast as latent variables and hence automatically marginalized over in the likelihood-free framework. Secondly, we derive an asymptotically optimal compression from $N$ data down to $n$ summaries -- one per interesting parameter -- such that the Fisher information is (asymptotically) preserved, but the summaries are insensitive (to leading order) to the nuisance parameters. This means that the nuisance marginalized inference task involves learning $n$ interesting parameters from $n$ nuisance hardened data summaries, regardless of the presence or number of additional nuisance parameters to be marginalized over. We validate our approach on two examples from cosmology: supernovae and weak lensing data analyses with nuisance parameterized systematics. For the supernova problem, high-fidelity posterior inference of $Omega_m$ and $w_0$ (marginalized over systematics) can be obtained from just a few hundred data simulations. For the weak lensing problem, six cosmological parameters can be inferred from $mathcal{O}(10^3)$ simulations, irrespective of whether ten additional nuisance parameters are included in the problem or not.
Data-driven inference was recently introduced as a protocol that, upon the input of a set of data, outputs a mathematical description for a physical device able to explain the data. The device so inferred is automatically self-consistent, that is, capable of generating all given data, and least committal, that is, consistent with a minimal superset of the given dataset. When applied to the inference of an unknown device, data-driven inference has been shown to output always the true device whenever the dataset has been produced by means of an observationally complete setup, which plays here the same role played by informationally complete setups in conventional quantum tomography. In this paper we develop a unified formalism for the data-driven inference of states and measurements. In the case of qubits, in particular, we provide an explicit implementation of the inference protocol as a convex programming algorithm for the machine learning of states and measurements. We also derive a complete characterization of observational completeness for general systems, from which it follows that only spherical 2-designs achieve observational completeness for qubit systems. This result provides symmetric informationally complete sets and mutually unbiased bases with a new theoretical and operational justification.