Do you want to publish a course? Click here

Proper local scoring rules on discrete sample spaces

145   0   0.0 ( 0 )
 Added by A. Philip Dawid
 Publication date 2011
and research's language is English




Ask ChatGPT about the research

A scoring rule is a loss function measuring the quality of a quoted probability distribution $Q$ for a random variable $X$, in the light of the realized outcome $x$ of $X$; it is proper if the expected score, under any distribution $P$ for $X$, is minimized by quoting $Q=P$. Using the fact that any differentiable proper scoring rule on a finite sample space ${mathcal{X}}$ is the gradient of a concave homogeneous function, we consider when such a rule can be local in the sense of depending only on the probabilities quoted for points in a nominated neighborhood of $x$. Under mild conditions, we characterize such a proper local scoring rule in terms of a collection of homogeneous functions on the cliques of an undirected graph on the space ${mathcal{X}}$. A useful property of such rules is that the quoted distribution $Q$ need only be known up to a scale factor. Examples of the use of such scoring rules include Besags pseudo-likelihood and Hyv{a}rinens method of ratio matching.

rate research

Read More

We investigate proper scoring rules for continuous distributions on the real line. It is known that the log score is the only such rule that depends on the quoted density only through its value at the outcome that materializes. Here we allow further dependence on a finite number $m$ of derivatives of the density at the outcome, and describe a large class of such $m$-local proper scoring rules: these exist for all even $m$ but no odd $m$. We further show that for $mgeq2$ all such $m$-local rules can be computed without knowledge of the normalizing constant of the distribution.
Proper scoring rules are commonly applied to quantify the accuracy of distribution forecasts. Given an observation they assign a scalar score to each distribution forecast, with the the lowest expected score attributed to the true distribution. The energy and variogram scores are two rules that have recently gained some popularity in multivariate settings because their computation does not require a forecast to have parametric density function and so they are broadly applicable. Here we conduct a simulation study to compare the discrimination ability between the energy score and three variogram scores. Compared with other studies, our simulation design is more realistic because it is supported by a historical data set containing commodity prices, currencies and interest rates, and our data generating processes include a diverse selection of models with different marginal distributions, dependence structure, and calibration windows. This facilitates a comprehensive comparison of the performance of proper scoring rules in different settings. To compare the scores we use three metrics: the mean relative score, error rate and a generalised discrimination heuristic. Overall, we find that the variogram score with parameter p=0.5 outperforms the energy score and the other two variogram scores.
Rejecting the null hypothesis in two-sample testing is a fundamental tool for scientific discovery. Yet, aside from concluding that two samples do not come from the same probability distribution, it is often of interest to characterize how the two distributions differ. Given samples from two densities $f_1$ and $f_0$, we consider the task of localizing occurrences of the inequality $f_1 > f_0$. To avoid the challenges associated with high-dimensional space, we propose a general hypothesis testing framework where hypotheses are formulated adaptively to the data by conditioning on the combined sample from the two densities. We then investigate a special case of this framework where the notion of locality is captured by a random walk on a weighted graph constructed over this combined sample. We derive a tractable testing procedure for this case employing a type of scan statistic, and provide non-asymptotic lower bounds on the power and accuracy of our test to detect whether $f_1>f_0$ in a local sense. Furthermore, we characterize the tests consistency according to a certain problem-hardness parameter, and show that our test achieves the minimax detection rate for this parameter. We conduct numerical experiments to validate our method, and demonstrate our approach on two real-world applications: detecting and localizing arsenic well contamination across the United States, and analyzing two-sample single-cell RNA sequencing data from melanoma patients.
This paper forges a strong connection between two seemingly unrelated forecasting problems: incentive-compatible forecast elicitation and forecast aggregation. Proper scoring rules are the well-known solution to the former problem. To each such rule s we associate a corresponding method of aggregation, mapping expert forecasts and expert weights to a consensus forecast, which we call *quasi-arithmetic (QA) pooling* with respect to s. We justify this correspondence in several ways: - QA pooling with respect to the two most well-studied scoring rules (quadratic and logarithmic) corresponds to the two most well-studied forecast aggregation methods (linear and logarithmic). - Given a scoring rule s used for payment, a forecaster agent who sub-contracts several experts, paying them in proportion to their weights, is best off aggregating the experts reports using QA pooling with respect to s, meaning this strategy maximizes its worst-case profit (over the possible outcomes). - The score of an aggregator who uses QA pooling is concave in the experts weights. As a consequence, online gradient descent can be used to learn appropriate expert weights from repeated experiments with low regret. - The class of all QA pooling methods is characterized by a natural set of axioms (generalizing classical work by Kolmogorov on quasi-arithmetic means).
This paper deals with a new Bayesian approach to the standard one-sample $z$- and $t$- tests. More specifically, let $x_1,ldots,x_n$ be an independent random sample from a normal distribution with mean $mu$ and variance $sigma^2$. The goal is to test the null hypothesis $mathcal{H}_0: mu=mu_1$ against all possible alternatives. The approach is based on using the well-known formula of the Kullbak-Leibler divergence between two normal distributions (sampling and hypothesized distributions selected in an appropriate way). The change of the distance from a priori to a posteriori is compared through the relative belief ratio (a measure of evidence). Eliciting the prior, checking for prior-data conflict and bias are also considered. Many theoretical properties of the procedure have been developed. Besides its simplicity, and unlike the classical approach, the new approach possesses attractive and distinctive features such as giving evidence in favor of the null hypothesis. It also avoids several undesirable paradoxes, such as Lindleys paradox that may be encountered by some existing Bayesian methods. The use of the approach has been illustrated through several examples.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا