No Arabic abstract
This note proposes a penalty criterion for assessing correct score forecasting in a soccer match. The penalty is based on hierarchical priorities for such a forecast i.e., i) Win, Draw and Loss exact prediction and ii) normalized Euclidian distance between actual and forecast scores. The procedure is illustrated on typical scores, and different alternatives on the penalty components are discussed.
In basketball and hockey, state-of-the-art player value statistics are often variants of Adjusted Plus-Minus (APM). But APM hasnt had the same impact in soccer, since soccer games are low scoring with a low number of substitutions. In soccer, perhaps the most comprehensive player value statistics come from video games, and in particular FIFA. FIFA ratings combine the subjective evaluations of over 9000 scouts, coaches, and season-ticket holders into ratings for over 18,000 players. This paper combines FIFA ratings and APM into a single metric, which we call Augmented APM. The key idea is recasting APM into a Bayesian framework, and incorporating FIFA ratings into the prior distribution. We show that Augmented APM predicts better than both standard APM and a model using only FIFA ratings. We also show that Augmented APM decorrelates players that are highly collinear.
The problem of evaluating the performance of soccer players is attracting the interest of many companies and the scientific community, thanks to the availability of massive data capturing all the events generated during a match (e.g., tackles, passes, shots, etc.). Unfortunately, there is no consolidated and widely accepted metric for measuring performance quality in all of its facets. In this paper, we design and implement PlayeRank, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players. We build our framework by deploying a massive dataset of soccer-logs and consisting of millions of match events pertaining to four seasons of 18 prominent soccer competitions. By comparing PlayeRank to known algorithms for performance evaluation in soccer, and by exploiting a dataset of players evaluations made by professional soccer scouts, we show that PlayeRank significantly outperforms the competitors. We also explore the ratings produced by {sf PlayeRank} and discover interesting patterns about the nature of excellent performances and what distinguishes the top players from the others. At the end, we explore some applications of PlayeRank -- i.e. searching players and player versatility --- showing its flexibility and efficiency, which makes it worth to be used in the design of a scalable platform for soccer analytics.
Forecasting accuracy of mortality data is important for the management of pension funds and pricing of life insurance in actuarial science. Age-specific mortality forecasting in the US poses a challenging problem in high dimensional time series analysis. Prior attempts utilize traditional dimension reduction techniques to avoid the curse of dimensionality, and then mortality forecasting is achieved through features forecasting. However, a method of reducing dimension pertinent to ideal forecasting is elusive. To address this, we propose a novel approach to pursue features that are not only capable of representing original data well but also capturing time-serial dependence as most as possible. The proposed method is adaptive for the US mortality data and enjoys good statistical performance. As a comparison, our method performs better than existing approaches, especially in regard to the Lee-Carter Model as a benchmark in mortality analysis. Based on forecasting results, we generate more accurate estimates of future life expectancies and prices of life annuities, which can have great financial impact on life insurers and social securities compared with using Lee-Carter Model. Furthermore, various simulations illustrate scenarios under which our method has advantages, as well as interpretation of the good performance on mortality data.
Motivated by the evidence that real-world networks evolve in time and may exhibit non-stationary features, we propose an extension of the Exponential Random Graph Models (ERGMs) accommodating the time variation of network parameters. Within the ERGM framework, a network realization is sampled from a static probability distribution defined parametrically in terms of network statistics. Inspired by the fast growing literature on Dynamic Conditional Score-driven models, in our approach, each parameter evolves according to an updating rule driven by the score of the conditional distribution. We demonstrate the flexibility of the score-driven ERGMs, both as data generating processes and as filters, and we prove the advantages of the dynamic version with respect to the static one. Our method captures dynamical network dependencies, that emerge from the data, and allows for a test discriminating between static or time-varying parameters. Finally, we corroborate our findings with the application to networks from real financial and political systems exhibiting non stationary dynamics.
Droughts are a recurring hazard in sub-Saharan Africa, that can wreak huge socioeconomic costs.Acting early based on alerts provided by early warning systems (EWS) can potentially provide substantial mitigation, reducing the financial and human cost. However, existing EWS tend only to monitor current, rather than forecast future, environmental and socioeconomic indicators of drought, and hence are not always sufficiently timely to be effective in practice. Here we present a novel method for forecasting satellite-based indicators of vegetation condition. Specifically, we focused on the 3-month Vegetation Condition Index (VCI3M) over pastoral livelihood zones in Kenya, which is the indicator used by the Kenyan National Drought Management Authority(NDMA). Using data from MODIS and Landsat, we apply linear autoregression and Gaussian process modeling methods and demonstrate high forecasting skill several weeks ahead. As a benchmark we predicted the drought alert marker used by NDMA (VCI3M<35). Both of our models were able to predict this alert marker four weeks ahead with a hit rate of around 89% and a false alarm rate of around 4%, or 81% and 6% respectively six weeks ahead. The methods developed here can thus identify a deteriorating vegetation condition well and sufficiently in advance to help disaster risk managers act early to support vulnerable communities and limit the impact of a drought hazard.