No Arabic abstract
This paper presents a new variational data assimilation (VDA) approach for the formal treatment of bias in both model outputs and observations. This approach relies on the Wasserstein metric stemming from the theory of optimal mass transport to penalize the distance between the probability histograms of the analysis state and an a priori reference dataset, which is likely to be more uncertain but less biased than both model and observations. Unlike previous bias-aware VDA approaches, the new Wasserstein metric VDA (WM-VDA) dynamically treats systematic biases of unknown magnitude and sign in both model and observations through assimilation of the reference data in the probability domain and can fully recover the probability histogram of the analysis state. The performance of WM-VDA is compared with the classic three-dimensional VDA (3D-Var) scheme on first-order linear dynamics and the chaotic Lorenz attractor. Under positive systematic biases in both model and observations, we consistently demonstrate a significant reduction in the forecast bias and unbiased root mean squared error.
In this paper, we present an ensemble data assimilation paradigm over a Riemannian manifold equipped with the Wasserstein metric. Unlike the Eulerian penalization of error in the Euclidean space, the Wasserstein metric can capture translation and difference between the shapes of square-integrable probability distributions of the background state and observations -- enabling to formally penalize geophysical biases in state-space with non-Gaussian distributions. The new approach is applied to dissipative and chaotic evolutionary dynamics and its potential advantages and limitations are highlighted compared to the classic variational and filtering data assimilation approaches under systematic and random errors.
Optimal control of a stochastic dynamical system usually requires a good dynamical model with probability distributions, which is difficult to obtain due to limited measurements and/or complicated dynamics. To solve it, this work proposes a data-driven distributionally robust control framework with the Wasserstein metric via a constrained two-player zero-sum Markov game, where the adversarial player selects the probability distribution from a Wasserstein ball centered at an empirical distribution. Then, the game is approached by its penalized version, an optimal stabilizing solution of which is derived explicitly in a linear structure under the Riccati-type iterations. Moreover, we design a model-free Q-learning algorithm with global convergence to learn the optimal controller. Finally, we verify the effectiveness of the proposed learning algorithm and demonstrate its robustness to the probability distribution errors via numerical examples.
Many information retrieval algorithms rely on the notion of a good distance that allows to efficiently compare objects of different nature. Recently, a new promising metric called Word Movers Distance was proposed to measure the divergence between text passages. In this paper, we demonstrate that this metric can be extended to incorporate term-weighting schemes and provide more accurate and computationally efficient matching between documents using entropic regularization. We evaluate the benefits of both extensions in the task of cross-lingual document retrieval (CLDR). Our experimental results on eight CLDR problems suggest that the proposed methods achieve remarkable improvements in terms of Mean Reciprocal Rank compared to several baselines.
The use of data assimilation technique to identify optimal topography is discussed in frames of time-dependent motion governed by non-linear barotropic ocean model. Assimilation of artificially generated data allows to measure the influence of various error sources and to classify the impact of noise that is present in observational data and model parameters. The choice of assimilation window is discussed. Assimilating noisy data with longer windows provides higher accuracy of identified topography. The topography identified once by data assimilation can be successfully used for other model runs that start from other initial conditions and are situated in other parts of the models attractor.
The mixed-logit model is a flexible tool in transportation choice analysis, which provides valuable insights into inter and intra-individual behavioural heterogeneity. However, applications of mixed-logit models are limited by the high computational and data requirements for model estimation. When estimating on small samples, the Bayesian estimation approach becomes vulnerable to over and under-fitting. This is problematic for investigating the behaviour of specific population sub-groups or market segments with low data availability. Similar challenges arise when transferring an existing model to a new location or time period, e.g., when estimating post-pandemic travel behaviour. We propose an Early Stopping Bayesian Data Assimilation (ESBDA) simulator for estimation of mixed-logit which combines a Bayesian statistical approach with Machine Learning methodologies. The aim is to improve the transferability of mixed-logit models and to enable the estimation of robust choice models with low data availability. This approach can provide new insights into choice behaviour where the traditional estimation of mixed-logit models was not possible due to low data availability, and open up new opportunities for investment and planning decisions support. The ESBDA estimator is benchmarked against the Direct Application approach, a basic Bayesian simulator with random starting parameter values and a Bayesian Data Assimilation (BDA) simulator without early stopping. The ESBDA approach is found to effectively overcome under and over-fitting and non-convergence issues in simulation. Its resulting models clearly outperform those of the reference simulators in predictive accuracy. Furthermore, models estimated with ESBDA tend to be more robust, with significant parameters with signs and values consistent with behavioural theory, even when estimated on small samples.