ترغب بنشر مسار تعليمي؟ اضغط هنا

We develop a rigorous and general framework for constructing information-theoretic divergences that subsume both $f$-divergences and integral probability metrics (IPMs), such as the $1$-Wasserstein distance. We prove under which assumptions these div ergences, hereafter referred to as $(f,Gamma)$-divergences, provide a notion of `distance between probability measures and show that they can be expressed as a two-stage mass-redistribution/mass-transport process. The $(f,Gamma)$-divergences inherit features from IPMs, such as the ability to compare distributions which are not absolutely continuous, as well as from $f$-divergences, namely the strict concavity of their variational representations and the ability to control heavy-tailed distributions for particular choices of $f$. When combined, these features establish a divergence with improved properties for estimation, statistical learning, and uncertainty quantification applications. Using statistical learning as an example, we demonstrate their advantage in training generative adversarial networks (GANs) for heavy-tailed, not-absolutely continuous sample distributions. We also show improved performance and stability over gradient-penalized Wasserstein GAN in image generation.
95 - Jeremiah Birrell 2020
Distributionally robust optimization (DRO) is a widely used framework for optimizing objective functionals in the presence of both randomness and model-form uncertainty. A key step in the practical solution of many DRO problems is a tractable reformu lation of the optimization over the chosen model ambiguity set, which is generally infinite dimensional. Previous works have solved this problem in the case where the objective functional is an expected value. In this paper we study objective functionals that are the sum of an expected value and a variance penalty term. We prove that the corresponding variance-penalized DRO problem over an $f$-divergence neighborhood can be reformulated as a finite-dimensional convex optimization problem. This result also provides tight uncertainty quantification bounds on the variance.
We derive a new variational formula for the Renyi family of divergences, $R_alpha(Q|P)$, between probability measures $Q$ and $P$. Our result generalizes the classical Donsker-Varadhan variational formula for the Kullback-Leibler divergence. We furth er show that this Renyi variational formula holds over a range of function spaces; this leads to a formula for the optimizer under very weak assumptions and is also key in our development of a consistency theory for Renyi divergence estimators. By applying this theory to neural-network estimators, we show that if a neural network family satisfies one of several strengthen
Variational representations of divergences and distances between high-dimensional probability distributions offer significant theoretical insights and practical advantages in numerous research areas. Recently, they have gained popularity in machine l earning as a tractable and scalable approach for training probabilistic models and for statistically differentiating between data distributions. Their advantages include: 1) They can be estimated from data as statistical averages. 2) Such representations can leverage the ability of neural networks to efficiently approximate optimal solutions in function spaces. However, a systematic and practical approach to improving tightness of such variational formulas, and accordingly accelerate statistical learning and estimation from data, is lacking. Here we develop such a methodology for building new, tighter variational representations of divergences. Our approach relies on improved objective functionals constructed via an auxiliary optimization problem. Furthermore, the calculation of the functional Hessian of objective functionals unveils local curvature differences around the common optimal variational solution; this quantifies and orders the tightness gains between different variational representations. Finally, numerical simulations utilizing neural-network optimization demonstrate that tighter representations can result in significantly faster learning and more accurate estimation of divergences in both synthetic and real datasets (of more than 1000 dimensions), often accelerated by nearly an order of magnitude.
Rare events, and more general risk-sensitive quantities-of-interest (QoIs), are significantly impacted by uncertainty in the tail behavior of a distribution. Uncertainty in the tail can take many different forms, each of which leads to a particular a mbiguity set of alternative models. Distributional robustness bounds over such an ambiguity set constitute a stress-test of the model. In this paper we develop a method, utilizing Renyi-divergences, of constructing the ambiguity set that captures a user-specified form of tail-perturbation. We then obtain distributional robustness bounds (performance guarantees) for risk-sensitive QoIs over these ambiguity sets, using the known connection between Renyi-divergences and robustness for risk-sensitive QoIs. We also expand on this connection in several ways, including a generalization of the Donsker-Varadhan variational formula to Renyi divergences, and various tightness results. These ideas are illustrated through applications to uncertainty quantification in a model of lithium-ion battery failure, robustness of large deviations rate functions, and risk-sensitive distributionally robust optimization for option pricing.
In this paper we provide performance guarantees for hypocoercive non-reversible MCMC samplers $X_t$ with invariant measure $mu^*$ and our results apply in particular to the Langevin equation, Hamiltonian Monte-Carlo, and the bouncy particle and zig-z ag samplers. Specifically, we establish a concentration inequality of Bernstein type for ergodic averages $frac{1}{T} int_0^T f(X_t), dt$. As a consequence we provide performance guarantees: (a) explicit non-asymptotic confidence intervals for $int f dmu^*$ when using a finite time ergodic average with given initial condition $mu$ and (b) uncertainty quantification bounds, expressed in terms of relative entropy rate, on the bias of $int f dmu^*$ when using an alternative or approximate processes $widetilde{X}_t$. (Results in (b) generalize recent results (arXiv:1812.05174) from the authors for coercive dynamics.) The concentration inequality is proved by combining the approach via Feynmann-Kac semigroups first noted by Wu with the hypocoercive estimates of Dolbeault, Mouhot and Schmeiser (arXiv:1005.1495) developed for the Langevin equation and recently generalized to partially deterministic Markov processes by Andrieu et al. (arXiv:1808.08592)
Quantifying the impact of parametric and model-form uncertainty on the predictions of stochastic models is a key challenge in many applications. Previous work has shown that the relative entropy rate is an effective tool for deriving path-space uncer tainty quantification (UQ) bounds on ergodic averages. In this work we identify appropriate information-theoretic objects for a wider range of quantities of interest on path-space, such as hitting times and exponentially discounted observables, and develop the corresponding UQ bounds. In addition, our method yields tighter UQ bounds, even in cases where previous relative-entropy-based methods also apply, e.g., for ergodic averages. We illustrate these results with examples from option pricing, non-reversible diffusion processes, stochastic control, semi-Markov queueing models, and expectations and distributions of hitting times.
Information-theory based variational principles have proven effective at providing scalable uncertainty quantification (i.e. robustness) bounds for quantities of interest in the presence of nonparametric model-form uncertainty. In this work, we combi ne such variational formulas with functional inequalities (Poincar{e}, $log$-Sobolev, Liapunov functions) to derive explicit uncertainty quantification bounds for time-averaged observables, comparing a Markov process to a second (not necessarily Markov) process. These bounds are well-behaved in the infinite-time limit and apply to steady-states of both discrete and continuous-time Markov processes.
We show that the non-integer effective number of neutrinos $N^{mathrm{eff}}_ u$ can be understood as an effect of lepton $L$ asymmetry in the early Universe carried by the Dirac neutrino cosmic background. We show that $N_ u^{mathrm{eff}}=3.36pm0.34$ (CMB only) and $N_ u^{mathrm{eff}}= 3.62pm0.25$ (CMB and $H_0$) require a ratio between baryon number $B$ and lepton number to be $1.16 times 10^{-9}leqslant B/|L|leqslant 1.51 times 10^{-9}$. These values are close to the baryon-to-photon ratio $0.57times 10^{-9}leqslant B/N_gamma leqslant 0.67times10^{-9}$. Thus instead of the usual $|L|ll N_gamma$ and $Bsimeq |L|$, we propose to use $0.4 leqslant |L|/N_gammaleqslant 0.52$ and $Bll|L|$ as another natural choice, which resolves the tension between Planck-CMB and $H_0$ and leads to a non-integer value of $N_ u^{mathrm{eff}}>3$.
288 - Jeremiah Birrell , Jan Wehr 2018
We study the small-mass (overdamped) limit of Langevin equations for a particle in a potential and/or magnetic field with matrix-valued and state-dependent drift and diffusion. We utilize a bootstrapping argument to derive a hierarchy of approximate equations for the position degrees of freedom that are able to achieve accuracy of order $m^{ell/2}$ over compact time intervals for any $ellinmathbb{Z}^+$. This generalizes prior derivations of the homogenized equation for the position degrees of freedom in the $mto 0$ limit, which result in order $m^{1/2}$ approximations. Our results cover bounded forces, for which we prove convergence in $L^p$ norms, and unbounded forces, in which case we prove convergence in probability.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا