ترغب بنشر مسار تعليمي؟ اضغط هنا

91 - Di Qi , John Harlim 2021
We propose a Machine Learning (ML) non-Markovian closure modeling framework for accurate predictions of statistical responses of turbulent dynamical systems subjected to external forcings. One of the difficulties in this statistical closure problem i s the lack of training data, which is a configuration that is not desirable in supervised learning with neural network models. In this study with the 40-dimensional Lorenz-96 model, the shortage of data (in temporal) is due to the stationarity of the statistics beyond the decorrelation time, thus, the only informative content in the training data is on short-time transient statistics. We adopted a unified closure framework on various truncation regimes, including and excluding the detailed dynamical equations for the variances. The closure frameworks employ a Long-Short-Term-Memory architecture to represent the higher-order unresolved statistical feedbacks with careful consideration to account for the intrinsic instability yet producing stable long-time predictions. We found that this unified agnostic ML approach performs well under various truncation scenarios. Numerically, the ML closure model can accurately predict the long-time statistical responses subjected to various time-dependent external forces that are not (and maximum forcing amplitudes that are relatively larger than those) in the training dataset.
A nonparametric method to predict non-Markovian time series of partially observed dynamics is developed. The prediction problem we consider is a supervised learning task of finding a regression function that takes a delay embedded observable to the o bservable at a future time. When delay embedding theory is applicable, the proposed regression function is a consistent estimator of the flow map induced by the delay embedding. Furthermore, the corresponding Mori-Zwanzig equation governing the evolution of the observable simplifies to only a Markovian term, represented by the regression function. We realize this supervised learning task with a class of kernel-based linear estimators, the kernel analog forecast (KAF), which are consistent in the limit of large data. In a scenario with a high-dimensional covariate space, we employ a Markovian kernel smoothing method which is computationally cheaper than the Nystrom projection method for realizing KAF. In addition to the guaranteed theoretical convergence, we numerically demonstrate the effectiveness of this approach on higher-dimensional problems where the relevant kernel features are difficult to capture with the Nystrom method. Given noisy training data, we propose a nonparametric smoother as a de-noising method. Numerically, we show that the proposed smoother is more accurate than EnKF and 4Dvar in de-noising signals corrupted by independent (but not necessarily identically distributed) noise, even if the smoother is constructed using a data set corrupted by white noise. We show skillful prediction using the KAF constructed from the denoised data.
In this paper, we extend the class of kernel methods, the so-called diffusion maps (DM), and its local kernel variants, to approximate second-order differential operators defined on smooth manifolds with boundaries that naturally arise in elliptic PD E models. To achieve this goal, we introduce the Ghost Point Diffusion Maps (GPDM) estimator on an extended manifold, identified by the set of point clouds on the unknown original manifold together with a set of ghost points, specified along the estimated tangential direction at the sampled points at the boundary. The resulting GPDM estimator restricts the standard DM matrix to a set of extrapolation equations that estimates the function values at the ghost points. This adjustment is analogous to the classical ghost point method in finite-difference scheme for solving PDEs on flat domain. As opposed to the classical DM which diverges near the boundary, the proposed GPDM estimator converges pointwise even near the boundary. Applying the consistent GPDM estimator to solve the well-posed elliptic PDEs with classical boundary conditions (Dirichlet, Neumann, and Robin), we establish the convergence of the approximate solution under appropriate smoothness assumptions. We numerically validate the proposed mesh-free PDE solver on various problems defined on simple sub-manifolds embedded in Euclidean spaces as well as on an unknown manifold. Numerically, we also found that the GPDM is more accurate compared to DM in solving elliptic eigenvalue problems on bounded smooth manifolds.
In this paper, we consider modeling missing dynamics with a nonparametric non-Markovian model, constructed using the theory of kernel embedding of conditional distributions on appropriate Reproducing Kernel Hilbert Spaces (RKHS), equipped with orthon ormal basis functions. Depending on the choice of the basis functions, the resulting closure model from this nonparametric modeling formulation is in the form of parametric model. This suggests that the success of various parametric modeling approaches that were proposed in various domains of applications can be understood through the RKHS representations. When the missing dynamical terms evolve faster than the relevant observable of interest, the proposed approach is consistent with the effective dynamics derived from the classical averaging theory. In the linear Gaussian case without the time-scale gap, we will show that the proposed non-Markovian model with a very long memory yields an accurate estimation of the nontrivial autocovariance function for the relevant variable of the full dynamics. Supporting numerical results on instructive nonlinear dynamics show that the proposed approach is able to replicate high-dimensional missing dynamical terms on problems with and without the separation of temporal scales.
In this paper, we consider a surrogate modeling approach using a data-driven nonparametric likelihood function constructed on a manifold on which the data lie (or to which they are close). The proposed method represents the likelihood function using a spectral expansion formulation known as the kernel embedding of the conditional distribution. To respect the geometry of the data, we employ this spectral expansion using a set of data-driven basis functions obtained from the diffusion maps algorithm. The theoretical error estimate suggests that the error bound of the approximate data-driven likelihood function is independent of the variance of the basis functions, which allows us to determine the amount of training data for accurate likelihood function estimations. Supporting numerical results to demonstrate the robustness of the data-driven likelihood functions for parameter estimation are given on instructive examples involving stochastic and deterministic differential equations. When the dimension of the data manifold is strictly less than the dimension of the ambient space, we found that the proposed approach (which does not require the knowledge of the data manifold) is superior compared to likelihood functions constructed using standard parametric basis functions defined on the ambient coordinates. In an example where the data manifold is not smooth and unknown, the proposed method is more robust compared to an existing polynomial chaos surrogate model which assumes a parametric likelihood, the non-intrusive spectral projection.
113 - John Harlim 2018
Modern scientific computational methods are undergoing a transformative change; big data and statistical learning methods now have the potential to outperform the classical first-principles modeling paradigm. This book bridges this transition, connec ting the theory of probability, stochastic processes, functional analysis, numerical analysis, and differential geometry. It describes two classes of computational methods to leverage data for modeling dynamical systems. The first is concerned with data fitting algorithms to estimate parameters in parametric models that are postulated on the basis of physical or dynamical laws. The second class is on operator estimation, which uses the data to nonparametrically approximate the operator generated by the transition function of the underlying dynamical systems. This self-contained book is suitable for graduate studies in applied mathematics, statistics, and engineering. Carefully chosen elementary examples with supplementary MATLAB codes and appendices covering the relevant prerequisite materials are provided, making it suitable for self-study.
This paper demonstrates the efficacy of data-driven localization mappings for assimilating satellite-like observations in a dynamical system of intermediate complexity. In particular, a sparse network of synthetic brightness temperature measurements is simulated using an idealized radiative transfer model and assimilated to the monsoon-Hadley multicloud model, a nonlinear stochastic model containing several thousands of model coordinates. A serial ensemble Kalman filter is implemented in which the empirical correlation statistics are improved using localization maps obtained from a supervised learning algorithm. The impact of the localization mappings is assessed in perfect model observing system simulation experiments (OSSEs) as well as in the presence of model errors resulting from the misspecification of key convective closure parameters. In perfect model OSSEs, the localization mappings that use adjacent correlations to improve the correlation estimated from small ensemble sizes produce robust accurate analysis estimates. In the presence of model error, the filter skills of the localization maps trained on perfect and imperfect model data are comparable.
187 - He Zhang , Xiantao Li , 2017
This paper presents a numerical method to implement the parameter estimation method using response statistics that was recently formulated by the authors. The proposed approach formulates the parameter estimation problem of It^o drift diffusions as a nonlinear least-squares problem. To avoid solving the model repeatedly when using an iterative scheme in solving the resulting least-squares problems, a polynomial surrogate model is employed on appropriate response statistics with smooth dependence on the parameters. The existence of minimizers of the approximate polynomial least-squares problems that converge to the solution of the true least square problem is established under appropriate regularity assumption of the essential statistics as functions of parameters. Numerical implementation of the proposed method is conducted on two prototypical examples that belong to classes of models with wide range of applications, including the Langevin dynamics and the stochastically forced gradient flows. Several important practical issues, such as the selection of the appropriate response operator to ensure the identifiability of the parameters and the reduction of the parameter space, are discussed. From the numerical experiments, it is found that the proposed approach is superior compared to the conventional approach that uses equilibrium statistics to determine the parameters.
80 - Wenrui Hao , John Harlim 2017
An equation-by-equation (EBE) method is proposed to solve a system of nonlinear equations arising from the moment constrained maximum entropy problem of multidimensional variables. The design of the EBE method combines ideas from homotopy continuatio n and Newtons iterative methods. Theoretically, we establish the local convergence under appropriate conditions and show that the proposed method, geometrically, finds the solution by searching along the surface corresponding to one component of the nonlinear problem. We will demonstrate the robustness of the method on various numerical examples, including: (1) A six-moment one-dimensional entropy problem with an explicit solution that contains components of order $10^0-10^3$ in magnitude; (2) Four-moment multidimensional entropy problems with explicit solutions where the resulting systems to be solved ranging from $70-310$ equations; (3) Four- to eight-moment of a two-dimensional entropy problem, which solutions correspond to the densities of the two leading EOFs of the wind stress-driven large-scale oceanic model. In this case, we find that the EBE method is more accurate compared to the classical Newtons method, the MATLAB generic solver, and the previously developed BFGS-based method, which was also tested on this problem. (4) Four-moment constrained of up to five-dimensional entropy problems which solutions correspond to multidimensional densities of the components of the solutions of the Kuramoto-Sivashinsky equation. For the higher dimensional cases of this example, the EBE method is superior because it automatically selects a subset of the prescribed moment constraints from which the maximum entropy solution can be estimated within the desired tolerance. This selection feature is particularly important since the moment constrained maximum entropy problems do not necessarily have solutions in general.
We propose a nonparametric approach for probabilistic prediction of the AL index trained with AL and solar wind ($v B_z$) data. Our framework relies on the diffusion forecasting technique, which views AL and $ v B_z $ data as observables of an autono mous, ergodic, stochastic dynamical system operating on a manifold. Diffusion forecasting builds a data-driven representation of the Markov semigroup governing the evolution of probability measures of the dynamical system. In particular, the Markov semigroup operator is represented in an orthonormal basis acquired from data using the diffusion maps algorithm and Takens delay embeddings. This representation of the evolution semigroup is used in conjunction with a Bayesian filtering algorithm for forecast initialization to predict the probability that the AL index is less than a user-selected threshold over arbitrary lead times and without requiring exogenous inputs. We find that the model produces skillful forecasts out to at least two-hour leads despite gaps in the training data.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا