No Arabic abstract
The role of probability appears unchallenged as the key measure of uncertainty, used among other things for practical induction in the empirical sciences. Yet, Popper was emphatic in his rejection of inductive probability and of the logical probability of hypotheses; furthermore, for him, the degree of corroboration cannot be a probability. Instead he proposed a deductive method of testing. In many ways this dialectic tension has many parallels in statistics, with the Bayesians on logico-inductive side vs the non-Bayesians or the frequentists on the other side. Simplistically Popper seems to be on the frequentist side, but recent synthesis on the non-Bayesian side might direct the Popperian views to a more nuanced destination. Logical probability seems perfectly suited to measure partial evidence or support, so what can we use if we are to reject it? For the past 100 years, statisticians have also developed a related concept called likelihood, which has played a central role in statistical modelling and inference. Remarkably, this Fisherian concept of uncertainty is largely unknown or at least severely under-appreciated in non-statistical literature. As a measure of corroboration, the likelihood satisfies the Popperian requirement that it is not a probability. Our aim is to introduce the likelihood and its recent extension via a discussion of two well-known logical fallacies in order to highlight that its lack of recognition may have led to unnecessary confusion in our discourse about falsification and corroboration of hypotheses. We highlight the 100 years of development of likelihood concepts. The year 2021 will mark the 100-year anniversary of the likelihood, so with this paper we wish it a long life and increased appreciation in non-statistical literature.
In 2001, Leo Breiman wrote of a divide between data modeling and algorithmic modeling cultures. Twenty years later this division feels far more ephemeral, both in terms of assigning individuals to camps, and in terms of intellectual boundaries. We argue that this is largely due to the data modelers incorporating algorithmic methods into their toolbox, particularly driven by recent developments in the statistical understanding of Breimans own Random Forest methods. While this can be simplistically described as Breiman won, these same developments also expose the limitations of the prediction-first philosophy that he espoused, making careful statistical analysis all the more important. This paper outlines these exciting recent developments in the random forest literature which, in our view, occurred as a result of a necessary blending of the two ways of thinking Breiman originally described. We also ask what areas statistics and statisticians might currently overlook.
We provide accessible insight into the current replication crisis in statistical science, by revisiting the old metaphor of court trial as hypothesis test. Inter alia, we define and diagnose harmful statistical witch-hunting both in justice and science, which extends to the replication crisis itself, where a hunt on p-values is currently underway.
The random variate m is, in combinatorics, a basis for comparing permutations, as well as the solution to a centuries-old riddle involving the mishandling of hats. In statistics, m is the test statistic for a disused null hypothesis statistical test (NHST) of association, the matching method. In this paper, I show that the matching method has an absolute and relatively low limit on its statistical power. I do so first by reinterpreting Raes theorem, which describes the joint distributions of m with several rank correlation statistics under a true null. I then derive this property solely from ms unconditional sampling distribution, on which basis I develop the concept of a deficient statistic: a statistic that is insufficient and inconsistent and inefficient with respect to its parameter. Finally, I demonstrate an application for m that makes use of its deficiency to qualify the sampling error in a jointly estimated sample correlation.
Which type of statistical uncertainty -- Frequentist statistical (in)significance with a p-value, or a Bayesian probability -- helps evidence-based policymaking better? To investigate this, I ran a survey experiment on a sample from the population of Ireland and obtained 517 responses. The experiment asked these participants to decide to or not to introduce a new bus line as a policy to reduce traffic jams. The treatment was the different types of statistical uncertainty information: statistical (in)significance with a p-value, and the probability that the estimate is correct. In each type, uncertainty was set either low or non-low. It turned out that participants shown the Frequentist information exhibited a much more deterministic tendency to adopting or not adopting the policy than those shown the Bayesian information, given the actual difference between the low-uncertainty and non-low-uncertainty the experimental scenarios implied. This finding suggests that policy-relevant quantitative research should present the uncertainty of statistical estimates using the probability of associated policy effects rather than statistical (in)significance, to allow the general public and policymakers to correctly evaluate the continuous nature of statistical uncertainty.
We discuss statistical issues in cases of serial killer nurses, focussing on the Dutch case of the nurse Lucia de Berk, arrested under suspicion of murder in 2001, convicted to life imprisonment, but declared innocent in 2010; and the case of the English nurse Ben Geen, arrested in 2004, also given a life sentence. At the trial of Ben Geen, a statistical expert was refused permission to present evidence on statistical biases concerning the way suspicious cases were identified by a hospital team of investigators. The judge ruled that the experts written evidence was merely common sense. An application to the CCRC to review the case was turned down, since the application only presented statistical evidence but did not re-address the medical evidence presented at the original trials. This rejection has been successfully challenged in court, and the CCRC has withdrawn it. The paper includes some striking new statistical findings on the Ben Geen case as well as giving advice to statisticians involved in future cases, which are not infrequent. Statisticians need to be warned of the pitfalls which await them.