ترغب بنشر مسار تعليمي؟ اضغط هنا

History matching with probabilistic emulators and active learning

44   0   0.0 ( 0 )
 نشر من قبل Alfredo Garbuno-Inigo
 تاريخ النشر 2020
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

The scientific understanding of real-world processes has dramatically improved over the years through computer simulations. Such simulators represent complex mathematical models that are implemented as computer codes which are often expensive. The validity of using a particular simulator to draw accurate conclusions relies on the assumption that the computer code is correctly calibrated. This calibration procedure is often pursued under extensive experimentation and comparison with data from a real-world process. The problem is that the data collection may be so expensive that only a handful of experiments are feasible. History matching is a calibration technique that, given a simulator, it iteratively discards regions of the input space using an implausibility measure. When the simulator is computationally expensive, an emulator is used to explore the input space. In this paper, a Gaussian process provides a complete probabilistic output that is incorporated into the implausibility measure. The identification of regions of interest is accomplished with recently developed annealing sampling techniques. Active learning functions are incorporated into the history matching procedure to refocus on the input space and improve the emulator. The efficiency of the proposed framework is tested in well-known examples from the history matching literature, as well as in a proposed testbed of functions of higher dimensions.



قيم البحث

اقرأ أيضاً

Data-efficient learning algorithms are essential in many practical applications where data collection is expensive, e.g., in robotics due to the wear and tear. To address this problem, meta-learning algorithms use prior experience about tasks to lear n new, related tasks efficiently. Typically, a set of training tasks is assumed given or randomly chosen. However, this setting does not take into account the sequential nature that naturally arises when training a model from scratch in real-life: how do we collect a set of training tasks in a data-efficient manner? In this work, we introduce task selection based on prior experience into a meta-learning algorithm by conceptualizing the learner and the active meta-learning setting using a probabilistic latent variable model. We provide empirical evidence that our approach improves data-efficiency when compared to strong baselines on simulated robotic experiments.
Probabilistic forecasts in the form of probability distributions over future events have become popular in several fields including meteorology, hydrology, economics, and demography. In typical applications, many alternative statistical models and da ta sources can be used to produce probabilistic forecasts. Hence, evaluating and selecting among competing methods is an important task. The scoringRules package for R provides functionality for comparative evaluation of probabilistic models based on proper scoring rules, covering a wide range of situations in applied work. This paper discusses implementation and usage details, presents case studies from meteorology and economics, and points to the relevant background literature.
An emulator is a fast-to-evaluate statistical approximation of a detailed mathematical model (simulator). When used in lieu of simulators, emulators can expedite tasks that require many repeated evaluations, such as sensitivity analyses, policy optim ization, model calibration, and value-of-information analyses. Emulators are developed using the output of simulators at specific input values (design points). Developing an emulator that closely approximates the simulator can require many design points, which becomes computationally expensive. We describe a self-terminating active learning algorithm to efficiently develop emulators tailored to a specific emulation task, and compare it with algorithms that optimize geometric criteria (random latin hypercube sampling and maximum projection designs) and other active learning algorithms (treed Gaussian Processes that optimize typical active learning criteria). We compared the algorithms root mean square error (RMSE) and maximum absolute deviation from the simulator (MAX) for seven benchmark functions and in a prostate cancer screening model. In the empirical analyses, in simulators with greatly-varying smoothness over the input domain, active learning algorithms resulted in emulators with smaller RMSE and MAX for the same number of design points. In all other cases, all algorithms performed comparably. The proposed algorithm attained satisfactory performance in all analyses, had smaller variability than the treed Gaussian Processes (it is deterministic), and, on average, had similar or better performance as the treed Gaussian Processes in 6 out of 7 benchmark functions and in the prostate cancer model.
Time series forecasting is an active research topic in academia as well as industry. Although we see an increasing amount of adoptions of machine learning methods in solving some of those forecasting challenges, statistical methods remain powerful wh ile dealing with low granularity data. This paper introduces a refined Bayesian exponential smoothing model with the help of probabilistic programming languages including Stan. Our model refinements include additional global trend, transformation for multiplicative form, noise distribution and choice of priors. A benchmark study is conducted on a rich set of time-series data sets for our models along with other well-known time series models.
Numerical integration and emulation are fundamental topics across scientific fields. We propose novel adaptive quadrature schemes based on an active learning procedure. We consider an interpolative approach for building a surrogate posterior density, combining it with Monte Carlo sampling methods and other quadrature rules. The nodes of the quadrature are sequentially chosen by maximizing a suitable acquisition function, which takes into account the current approximation of the posterior and the positions of the nodes. This maximization does not require additional evaluations of the true posterior. We introduce two specific schemes based on Gaussian and Nearest Neighbors (NN) bases. For the Gaussian case, we also provide a novel procedure for fitting the bandwidth parameter, in order to build a suitable emulator of a density function. With both techniques, we always obtain a positive estimation of the marginal likelihood (a.k.a., Bayesian evidence). An equivalent importance sampling interpretation is also described, which allows the design of extended schemes. Several theoretical results are provided and discussed. Numerical results show the advantage of the proposed approach, including a challenging inference problem in an astronomic dynamical model, with the goal of revealing the number of planets orbiting a star.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا