No Arabic abstract
Stochastic parameterizations account for uncertainty in the representation of unresolved sub-grid processes by sampling from the distribution of possible sub-grid forcings. Some existing stochastic parameterizations utilize data-driven approaches to characterize uncertainty, but these approaches require significant structural assumptions that can limit their scalability. Machine learning models, including neural networks, are able to represent a wide range of distributions and build optimized mappings between a large number of inputs and sub-grid forcings. Recent research on machine learning parameterizations has focused only on deterministic parameterizations. In this study, we develop a stochastic parameterization using the generative adversarial network (GAN) machine learning framework. The GAN stochastic parameterization is trained and evaluated on output from the Lorenz 96 model, which is a common baseline model for evaluating both parameterization and data assimilation techniques. We evaluate different ways of characterizing the input noise for the model and perform model runs with the GAN parameterization at weather and climate timescales. Some of the GAN configurations perform better than a baseline bespoke parameterization at both timescales, and the networks closely reproduce the spatio-temporal correlations and regimes of the Lorenz 96 system. We also find that in general those models which produce skillful forecasts are also associated with the best climate simulations.
A novel method, based on the combination of data assimilation and machine learning is introduced. The new hybrid approach is designed for a two-fold scope: (i) emulating hidden, possibly chaotic, dynamics and (ii) predicting their future states. The method consists in applying iteratively a data assimilation step, here an ensemble Kalman filter, and a neural network. Data assimilation is used to optimally combine a surrogate model with sparse noisy data. The output analysis is spatially complete and is used as a training set by the neural network to update the surrogate model. The two steps are then repeated iteratively. Numerical experiments have been carried out using the chaotic 40-variables Lorenz 96 model, proving both convergence and statistical skill of the proposed hybrid approach. The surrogate model shows short-term forecast skill up to two Lyapunov times, the retrieval of positive Lyapunov exponents as well as the more energetic frequencies of the power density spectrum. The sensitivity of the method to critical setup parameters is also presented: the forecast skill decreases smoothly with increased observational noise but drops abruptly if less than half of the model domain is observed. The successful synergy between data assimilation and machine learning, proven here with a low-dimensional system, encourages further investigation of such hybrids with more sophisticated dynamics.
Recently, sampling methods have been successfully applied to enhance the sample quality of Generative Adversarial Networks (GANs). However, in practice, they typically have poor sample efficiency because of the independent proposal sampling from the generator. In this work, we propose REP-GAN, a novel sampling method that allows general dependent proposals by REParameterizing the Markov chains into the latent space of the generator. Theoretically, we show that our reparameterized proposal admits a closed-form Metropolis-Hastings acceptance ratio. Empirically, extensive experiments on synthetic and real datasets demonstrate that our REP-GAN largely improves the sample efficiency and obtains better sample quality simultaneously.
Global climate models represent small-scale processes such as clouds and convection using quasi-empirical models known as parameterizations, and these parameterizations are a leading cause of uncertainty in climate projections. A promising alternative approach is to use machine learning to build new parameterizations directly from high-resolution model output. However, parameterizations learned from three-dimensional model output have not yet been successfully used for simulations of climate. Here we use a random forest to learn a parameterization of subgrid processes from output of a three-dimensional high-resolution atmospheric model. Integrating this parameterization into the atmospheric model leads to stable simulations at coarse resolution that replicate the climate of the high-resolution simulation. The parameterization obeys physical constraints and captures important statistics such as precipitation extremes. The ability to learn from a fully three-dimensional simulation presents an opportunity for learning parameterizations from the wide range of global high-resolution simulations that are now emerging.
A new framework is proposed for the evaluation of stochastic subgrid-scale parameterizations in the context of MAOOAM, a coupled ocean-atmosphere model of intermediate complexity. Two physically-based parameterizations are investigated, the first one based on the singular perturbation of Markov operator, also known as homogenization. The second one is a recently proposed parameterization based on the Ruelles response theory. The two parameterization are implemented in a rigorous way, assuming however that the unresolved scale relevant statistics are Gaussian. They are extensively tested for a low-order version known to exhibit low-frequency variability, and some preliminary results are obtained for an intermediate-order version. Several different configurations of the resolved-unresolved scale separations are then considered. Both parameterizations show remarkable performances in correcting the impact of model errors, being even able to change the modality of the probability distributions. Their respective limitations are also discussed.
The advent of generative adversarial networks (GAN) has enabled new capabilities in synthesis, interpolation, and data augmentation heretofore considered very challenging. However, one of the common assumptions in most GAN architectures is the assumption of simple parametric latent-space distributions. While easy to implement, a simple latent-space distribution can be problematic for uses such as interpolation. This is due to distributional mismatches when samples are interpolated in the latent space. We present a straightforward formalization of this problem; using basic results from probability theory and off-the-shelf-optimization tools, we develop ways to arrive at appropriate non-parametric priors. The obtained prior exhibits unusual qualitative properties in terms of its shape, and quantitative benefits in terms of lower divergence with its mid-point distribution. We demonstrate that our designed prior helps improve image generation along any Euclidean straight line during interpolation, both qualitatively and quantitatively, without any additional training or architectural modifications. The proposed formulation is quite flexible, paving the way to impose newer constraints on the latent-space statistics.