No Arabic abstract
In recent years, there has been growing interest in using Precipitable Water Vapor (PWV) derived from Global Positioning System (GPS) signal delays to predict rainfall. However, the occurrence of rainfall is dependent on a myriad of atmospheric parameters. This paper proposes a systematic approach to analyze various parameters that affect precipitation in the atmosphere. Different ground-based weather features like Temperature, Relative Humidity, Dew Point, Solar Radiation, PWV along with Seasonal and Diurnal variables are identified, and a detailed feature correlation study is presented. While all features play a significant role in rainfall classification, only a few of them, such as PWV, Solar Radiation, Seasonal and Diurnal features, stand out for rainfall prediction. Based on these findings, an optimum set of features are used in a data-driven machine learning algorithm for rainfall prediction. The experimental evaluation using a four-year (2012-2015) database shows a true detection rate of 80.4%, a false alarm rate of 20.3%, and an overall accuracy of 79.6%. Compared to the existing literature, our method significantly reduces the false alarm rates.
Numerical weather prediction has traditionally been based on physical models of the atmosphere. Recently, however, the rise of deep learning has created increased interest in purely data-driven medium-range weather forecasting with first studies exploring the feasibility of such an approach. To accelerate progress in this area, the WeatherBench benchmark challenge was defined. Here, we train a deep residual convolutional neural network (Resnet) to predict geopotential, temperature and precipitation at 5.625 degree resolution up to 5 days ahead. To avoid overfitting and improve forecast skill, we pretrain the model using historical climate model output before fine-tuning on reanalysis data. The resulting forecasts outperform previous submissions to WeatherBench and are comparable in skill to a physical baseline at similar resolution. We also analyze how the neural network creates its predictions and find that, with some exceptions, it is compatible with physical reasoning. Finally, we perform scaling experiments to estimate the potential skill of data-driven approaches at higher resolutions.
Modeling geophysical processes as low-dimensional dynamical systems and regressing their vector field from data is a promising approach for learning emulators of such systems. We show that when the kernel of these emulators is also learned from data (using kernel flows, a variant of cross-validation), then the resulting data-driven models are not only faster than equation-based models but are easier to train than neural networks such as the long short-term memory neural network. In addition, they are also more accurate and predictive than the latter. When trained on geophysical observational data, for example, the weekly averaged global sea-surface temperature, considerable gains are also observed by the proposed technique in comparison to classical partial differential equation-based models in terms of forecast computational cost and accuracy. When trained on publicly available re-analysis data for the daily temperature of the North-American continent, we see significant improvements over classical baselines such as climatology and persistence-based forecast techniques. Although our experiments concern specific examples, the proposed approach is general, and our results support the viability of kernel methods (with learned kernels) for interpretable and computationally efficient geophysical forecasting for a large diversity of processes.
Data-driven approaches, most prominently deep learning, have become powerful tools for prediction in many domains. A natural question to ask is whether data-driven methods could also be used to predict global weather patterns days in advance. First studies show promise but the lack of a common dataset and evaluation metrics make inter-comparison between studies difficult. Here we present a benchmark dataset for data-driven medium-range weather forecasting, a topic of high scientific interest for atmospheric and computer scientists alike. We provide data derived from the ERA5 archive that has been processed to facilitate the use in machine learning models. We propose simple and clear evaluation metrics which will enable a direct comparison between different methods. Further, we provide baseline scores from simple linear regression techniques, deep learning models, as well as purely physical forecasting models. The dataset is publicly available at https://github.com/pangeo-data/WeatherBench and the companion code is reproducible with tutorials for getting started. We hope that this dataset will accelerate research in data-driven weather forecasting.
We introduce a data assimilation method to estimate model parameters with observations of passive tracers by directly assimilating Lagrangian Coherent Structures. Our approach differs from the usual Lagrangian Data Assimilation approach, where parameters are estimated based on tracer trajectories. We employ the Approximate Bayesian Computation (ABC) framework to avoid computing the likelihood function of the coherent structure, which is usually unavailable. We solve the ABC by a Sequential Monte Carlo (SMC) method, and use Principal Component Analysis (PCA) to identify the coherent patterns from tracer trajectory data. Our new method shows remarkably improved results compared to the bootstrap particle filter when the physical model exhibits chaotic advection.
Weak lensing by large-scale structure is a powerful probe of cosmology if the apparent alignments in the shapes of distant galaxies can be accurately measured. We study the performance of a fully data-driven approach, based on MetaDetection, focusing on the more realistic case of observations with an anisotropic PSF. Under the assumption that PSF anisotropy is the only source of additive shear bias, we show how unbiased shear estimates can be obtained from the observed data alone. To do so, we exploit the finding that the multiplicative shear bias obtained with MetaDetection is nearly insensitive to the PSF ellipticity. In practice, this assumption can be validated by comparing the empirical corrections obtained from observations to those from simulated data. We show that our data-driven approach meets the stringent requirements for upcoming space and ground-based surveys, although further optimisation is possible.