No Arabic abstract
We present a comparative study of different probabilistic forecasting techniques on the task of predicting the electrical load of secondary substations and cabinets located in a low voltage distribution grid, as well as their aggregated power profile. The methods are evaluated using standard KPIs for deterministic and probabilistic forecasts. We also compare the ability of different hierarchical techniques in improving the bottom level forecasters performances. Both the raw and cleaned datasets, including meteorological data, are made publicly available to provide a standard benchmark for evaluating forecasting algorithms for demand-side management applications.
Demand forecasting is a central component of the replenishment process for retailers, as it provides crucial input for subsequent decision making like ordering processes. In contrast to point estimates, such as the conditional mean of the underlying probability distribution, or confidence intervals, forecasting complete probability density functions allows to investigate the impact on operational metrics, which are important to define the business strategy, over the full range of the expected demand. Whereas metrics evaluating point estimates are widely used, methods for assessing the accuracy of predicted distributions are rare, and this work proposes new techniques for both qualitative and quantitative evaluation methods. Using the supervised machine learning method Cyclic Boosting, complete individual probability density functions can be predicted such that each prediction is fully explainable. This is of particular importance for practitioners, as it allows to avoid black-box models and understand the contributing factors for each individual prediction. Another crucial aspect in terms of both explainability and generalizability of demand forecasting methods is the limitation of the influence of temporal confounding, which is prevalent in most state of the art approaches.
Intermittency is a common and challenging problem in demand forecasting. We introduce a new, unified framework for building intermittent demand forecasting models, which incorporates and allows to generalize existing methods in several directions. Our framework is based on extensions of well-established model-based methods to discrete-time renewal processes, which can parsimoniously account for patterns such as aging, clustering and quasi-periodicity in demand arrivals. The connection to discrete-time renewal processes allows not only for a principled extension of Croston-type models, but also for an natural inclusion of neural network based models---by replacing exponential smoothing with a recurrent neural network. We also demonstrate that modeling continuous-time demand arrivals, i.e., with a temporal point process, is possible via a trivial extension of our framework. This leads to more flexible modeling in scenarios where data of individual purchase orders are directly available with granular timestamps. Complementing this theoretical advancement, we demonstrate the efficacy of our framework for forecasting practice via an extensive empirical study on standard intermittent demand data sets, in which we report predictive accuracy in a variety of scenarios that compares favorably to the state of the art.
Data-driven approaches, most prominently deep learning, have become powerful tools for prediction in many domains. A natural question to ask is whether data-driven methods could also be used to predict global weather patterns days in advance. First studies show promise but the lack of a common dataset and evaluation metrics make inter-comparison between studies difficult. Here we present a benchmark dataset for data-driven medium-range weather forecasting, a topic of high scientific interest for atmospheric and computer scientists alike. We provide data derived from the ERA5 archive that has been processed to facilitate the use in machine learning models. We propose simple and clear evaluation metrics which will enable a direct comparison between different methods. Further, we provide baseline scores from simple linear regression techniques, deep learning models, as well as purely physical forecasting models. The dataset is publicly available at https://github.com/pangeo-data/WeatherBench and the companion code is reproducible with tutorials for getting started. We hope that this dataset will accelerate research in data-driven weather forecasting.
Effective riverine flood forecasting at scale is hindered by a multitude of factors, most notably the need to rely on human calibration in current methodology, the limited amount of data for a specific location, and the computational difficulty of building continent/global level models that are sufficiently accurate. Machine learning (ML) is primed to be useful in this scenario: learned models often surpass human experts in complex high-dimensional scenarios, and the framework of transfer or multitask learning is an appealing solution for leveraging local signals to achieve improved global performance. We propose to build on these strengths and develop ML systems for timely and accurate riverine flood prediction.
Platelet products are both expensive and have very short shelf lives. As usage rates for platelets are highly variable, the effective management of platelet demand and supply is very important yet challenging. The primary goal of this paper is to present an efficient forecasting model for platelet demand at Canadian Blood Services (CBS). To accomplish this goal, four different demand forecasting methods, ARIMA (Auto Regressive Moving Average), Prophet, lasso regression (least absolute shrinkage and selection operator) and LSTM (Long Short-Term Memory) networks are utilized and evaluated. We use a large clinical dataset for a centralized blood distribution centre for four hospitals in Hamilton, Ontario, spanning from 2010 to 2018 and consisting of daily platelet transfusions along with information such as the product specifications, the recipients characteristics, and the recipients laboratory test results. This study is the first to utilize different methods from statistical time series models to data-driven regression and a machine learning technique for platelet transfusion using clinical predictors and with different amounts of data. We find that the multivariate approaches have the highest accuracy in general, however, if sufficient data are available, a simpler time series approach such as ARIMA appears to be sufficient. We also comment on the approach to choose clinical indicators (inputs) for the multivariate models.