ترغب بنشر مسار تعليمي؟ اضغط هنا

Ranking earthquake forecasts using proper scoring rules: Binary events in a low probability environment

59   0   0.0 ( 0 )
 نشر من قبل Francesco Serafini
 تاريخ النشر 2021
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Operational earthquake forecasting for risk management and communication during seismic sequences depends on our ability to select an optimal forecasting model. To do this, we need to compare the performance of competing models with each other in prospective forecasting mode, and to rank their performance using a fair, reproducible and reliable method. The Collaboratory for the Study of Earthquake Predictability (CSEP) conducts such prospective earthquake forecasting experiments around the globe. One metric that has been proposed to rank competing models is the Parimutuel Gambling score, which has the advantage of allowing alarm-based (categorical) forecasts to be compared with probabilistic ones. Here we examine the suitability of this score for ranking competing earthquake forecasts. First, we prove analytically that this score is in general improper, meaning that, on average, it does not prefer the model that generated the data. Even in the special case where it is proper, we show it can still be used in an improper way. Then, we compare its performance with two commonly-used proper scores (the Brier and logarithmic scores), taking into account the uncertainty around the observed average score. We estimate the confidence intervals for the expected score difference which allows us to define if and when a model can be preferred. Our findings suggest the Parimutuel Gambling score should not be used to distinguishing between multiple competing forecasts. They also enable a more rigorous approach to distinguish between the predictive skills of candidate forecasts in addition to their rankings.



قيم البحث

اقرأ أيضاً

We investigate proper scoring rules for continuous distributions on the real line. It is known that the log score is the only such rule that depends on the quoted density only through its value at the outcome that materializes. Here we allow further dependence on a finite number $m$ of derivatives of the density at the outcome, and describe a large class of such $m$-local proper scoring rules: these exist for all even $m$ but no odd $m$. We further show that for $mgeq2$ all such $m$-local rules can be computed without knowledge of the normalizing constant of the distribution.
We examine the precursory behavior of geoelectric signals before large earthquakes by means of an algorithm including an alarm-based model and binary classification. This algorithm, introduced originally by Chen and Chen [Nat. Hazards., 84, 2016], is improved by removing a time parameter for coarse-graining of earthquake occurrences, as well as by extending the single station method into a joint stations method. We also determine the optimal frequency bands of earthquake-related geoelectric signals with the highest signal-to-noise ratio. Using significance tests, we also provide evidence of an underlying seismoelectric relationship. It is appropriate for machine learning to extract this underlying relationship, which could be used to quantify probabilistic forecasts of impending earthquakes, and to get closer to operational earthquake prediction.
A scoring rule is a loss function measuring the quality of a quoted probability distribution $Q$ for a random variable $X$, in the light of the realized outcome $x$ of $X$; it is proper if the expected score, under any distribution $P$ for $X$, is mi nimized by quoting $Q=P$. Using the fact that any differentiable proper scoring rule on a finite sample space ${mathcal{X}}$ is the gradient of a concave homogeneous function, we consider when such a rule can be local in the sense of depending only on the probabilities quoted for points in a nominated neighborhood of $x$. Under mild conditions, we characterize such a proper local scoring rule in terms of a collection of homogeneous functions on the cliques of an undirected graph on the space ${mathcal{X}}$. A useful property of such rules is that the quoted distribution $Q$ need only be known up to a scale factor. Examples of the use of such scoring rules include Besags pseudo-likelihood and Hyv{a}rinens method of ratio matching.
All proper scoring rules incentivize an expert to predict emph{accurately} (report their true estimate), but not all proper scoring rules equally incentivize emph{precision}. Rather than treating the experts belief as exogenously given, we consider a model where a rational expert can endogenously refine their belief by repeatedly paying a fixed cost, and is incentivized to do so by a proper scoring rule. Specifically, our expert aims to predict the probability that a biased coin flipped tomorrow will land heads, and can flip the coin any number of times today at a cost of $c$ per flip. Our first main result defines an emph{incentivization index} for proper scoring rules, and proves that this index measures the expected error of the experts estimate (where the number of flips today is chosen adaptively to maximize the predictors expected payoff). Our second main result finds the unique scoring rule which optimizes the incentivization index over all proper scoring rules. We also consider extensions to minimizing the $ell^{th}$ moment of error, and again provide an incentivization index and optimal proper scoring rule. In some cases, the resulting scoring rule is differentiable, but not infinitely differentiable. In these cases, we further prove that the optimum can be uniformly approximated by polynomial scoring rules. Finally, we compare common scoring rules via our measure, and include simulations confirming the relevance of our measure even in domains outside where it provably applies.
The use of tiered warnings and multicategorical forecasts are ubiquitous in meteorological operations. Here, a flexible family of scoring functions is presented for evaluating the performance of ordered multicategorical forecasts. Each score has a ri sk parameter $alpha$, selected for the specific use case, so that it is consistent with a forecast directive based on the fixed threshold probability $1-alpha$ (equivalently, a fixed $alpha$-quantile mapping). Each score also has use-case specific weights so that forecasters who accurately discriminate between categorical thresholds are rewarded in proportion to the weight for that threshold. A variation is presented where the penalty assigned to near misses or close false alarms is discounted, which again is consistent with directives based on fixed risk measures. The scores presented provide an alternative to many performance measures currently in use, whose optimal threshold probabilities for forecasting an event typically vary with each forecast case, and in the case of equitable scores are based around sample base rates rather than risk measures suitable for users.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا