No Arabic abstract
The problem of evaluating the performance of soccer players is attracting the interest of many companies and the scientific community, thanks to the availability of massive data capturing all the events generated during a match (e.g., tackles, passes, shots, etc.). Unfortunately, there is no consolidated and widely accepted metric for measuring performance quality in all of its facets. In this paper, we design and implement PlayeRank, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players. We build our framework by deploying a massive dataset of soccer-logs and consisting of millions of match events pertaining to four seasons of 18 prominent soccer competitions. By comparing PlayeRank to known algorithms for performance evaluation in soccer, and by exploiting a dataset of players evaluations made by professional soccer scouts, we show that PlayeRank significantly outperforms the competitors. We also explore the ratings produced by {sf PlayeRank} and discover interesting patterns about the nature of excellent performances and what distinguishes the top players from the others. At the end, we explore some applications of PlayeRank -- i.e. searching players and player versatility --- showing its flexibility and efficiency, which makes it worth to be used in the design of a scalable platform for soccer analytics.
Understanding the set of elementary steps and kinetics in each reaction is extremely valuable to make informed decisions about creating the next generation of catalytic materials. With physical and mechanistic complexity of industrial catalysts, it is critical to obtain kinetic information through experimental methods. As such, this work details a methodology based on the combination of transient rate/concentration dependencies and machine learning to measure the number of active sites, the individual rate constants, and gain insight into the mechanism under a complex set of elementary steps. This new methodology was applied to simulated transient responses to verify its ability to obtain correct estimates of the micro-kinetic coefficients. Furthermore, experimental CO oxidation data was analyzed to reveal the Langmuir-Hinshelwood mechanism driving the reaction. As oxygen accumulated on the catalyst, a transition in the mechanism was clearly defined in the machine learning analysis due to the large amount of kinetic information available from transient reaction techniques. This methodology is proposed as a new data driven approach to characterize how materials control complex reaction mechanisms relying exclusively on experimental data.
The Coronavirus Disease 2019 (COVID-19) pandemic has caused tremendous amount of deaths and a devastating impact on the economic development all over the world. Thus, it is paramount to control its further transmission, for which purpose it is necessary to find the mechanism of its transmission process and evaluate the effect of different control strategies. To deal with these issues, we describe the transmission of COVID-19 as an explosive Markov process with four parameters. The state transitions of the proposed Markov process can clearly disclose the terrible explosion and complex heterogeneity of COVID-19. Based on this, we further propose a simulation approach with heterogeneous infections. Experimentations show that our approach can closely track the real transmission process of COVID-19, disclose its transmission mechanism, and forecast the transmission under different non-drug intervention strategies. More importantly, our approach can helpfully develop effective strategies for controlling COVID-19 and appropriately compare their control effect in different countries/cities.
The design of symbol detectors in digital communication systems has traditionally relied on statistical channel models that describe the relation between the transmitted symbols and the observed signal at the receiver. Here we review a data-driven framework to symbol detection design which combines machine learning (ML) and model-based algorithms. In this hybrid approach, well-known channel-model-based algorithms such as the Viterbi method, BCJR detection, and multiple-input multiple-output (MIMO) soft interference cancellation (SIC) are augmented with ML-based algorithms to remove their channel-model-dependence, allowing the receiver to learn to implement these algorithms solely from data. The resulting data-driven receivers are most suitable for systems where the underlying channel models are poorly understood, highly complex, or do not well-capture the underlying physics. Our approach is unique in that it only replaces the channel-model-based computations with dedicated neural networks that can be trained from a small amount of data, while keeping the general algorithm intact. Our results demonstrate that these techniques can yield near-optimal performance of model-based algorithms without knowing the exact channel input-output statistical relationship and in the presence of channel state information uncertainty.
Recent data-driven approaches have shown great potential in early prediction of battery cycle life by utilizing features from the discharge voltage curve. However, these studies caution that data-driven approaches must be combined with specific design of experiments in order to limit the range of aging conditions, since the expected life of Li-ion batteries is a complex function of various aging factors. In this work, we investigate the performance of the data-driven approach for battery lifetime prognostics with Li-ion batteries cycled under a variety of aging conditions, in order to determine when the data-driven approach can successfully be applied. Results show a correlation between the variance of the discharge capacity difference and the end-of-life for cells aged under a wide range of charge/discharge C-rates and operating temperatures. This holds despite the different conditions being used not only to cycle the batteries but also to obtain the features: the features are calculated directly from cycling data without separate slow characterization cycles at a controlled temperature. However, the correlation weakens considerably when the voltage data window for feature extraction is reduced, or when features from the charge voltage curve instead of discharge are used. As deep constant-current discharges rarely happen in practice, this imposes new challenges for applying this method in a real-world system.
Despite the increasing use of citation-based metrics for research evaluation purposes, we do not know yet which metrics best deliver on their promise to gauge the significance of a scientific paper or a patent. We assess 17 network-based metrics by their ability to identify milestone papers and patents in three large citation datasets. We find that traditional information-retrieval evaluation metrics are strongly affected by the interplay between the age distribution of the milestone items and age biases of the evaluated metrics. Outcomes of these metrics are therefore not representative of the metrics ranking ability. We argue in favor of a modified evaluation procedure that explicitly penalizes biased metrics and allows us to reveal metrics performance patterns that are consistent across the datasets. PageRank and LeaderRank turn out to be the best-performing ranking metrics when their age bias is suppressed by a simple transformation of the scores that they produce, whereas other popular metrics, including citation count, HITS and Collective Influence, produce significantly worse ranking results.