No Arabic abstract
Keywords in scientific articles have found their significance in information filtering and classification. In this article, we empirically investigated statistical characteristics and evolutionary properties of keywords in a very famous journal, namely Proceedings of the National Academy of Science of the United States of America (PNAS), including frequency distribution, temporal scaling behavior, and decay factor. The empirical results indicate that the keyword frequency in PNAS approximately follows a Zipfs law with exponent 0.86. In addition, there is a power-low correlation between the cumulative number of distinct keywords and the cumulative number of keyword occurrences. Extensive empirical analysis on some other journals data is also presented, with decaying trends of most popular keywords being monitored. Interestingly, top journals from various subjects share very similar decaying tendency, while the journals of low impact factors exhibit completely different behavior. Those empirical characters may shed some light on the in-depth understanding of semantic evolutionary behaviors. In addition, the analysis of keyword-based system is helpful for the design of corresponding recommender systems.
In this Brief Report, we propose a new index of user similarity, namely the transferring similarity, which involves all high-order similarities between users. Accordingly, we design a modified collaborative filtering algorithm, which provides remarkably higher accurate predictions than the standard collaborative filtering. More interestingly, we find that the algorithmic performance will approach its optimal value when the parameter, contained in the definition of transferring similarity, gets close to its critical value, before which the series expansion of transferring similarity is convergent and after which it is divergent. Our study is complementary to the one reported in [E. A. Leicht, P. Holme, and M. E. J. Newman, Phys. Rev. E {bf 73} 026120 (2006)], and is relevant to the missing link prediction problem.
Image-based jet analysis is built upon the jet image representation of jets that enables a direct connection between high energy physics and the fields of computer vision and deep learning. Through this connection, a wide array of new jet analysis techniques have emerged. In this text, we survey jet image based classification models, built primarily on the use of convolutional neural networks, examine the methods to understand what these models have learned and what is their sensitivity to uncertainties, and review the recent successes in moving these models from phenomenological studies to real world application on experiments at the LHC. Beyond jet classification, several other applications of jet image based techniques, including energy estimation, pileup noise reduction, data generation, and anomaly detection, are discussed.
We present a new high-resolution global renewable energy atlas ({REatlas}) that can be used to calculate customised hourly time series of wind and solar PV power generation. In this paper, the atlas is applied to produce 32-year-long hourly model wind power time series for Denmark for each historical and future year between 1980 and 2035. These are calibrated and validated against real production data from the period 2000 to 2010. The high number of years allows us to discuss how the characteristics of Danish wind power generation varies between individual weather years. As an example, the annual energy production is found to vary by $pm10%$ from the average. Furthermore, we show how the production pattern change as small onshore turbines are gradually replaced by large onshore and offshore turbines. Finally, we compare our wind power time series for 2020 to corresponding data from a handful of Danish energy system models. The aim is to illustrate how current differences in model wind may result in significant differences in technical and economical model predictions. These include up to $15%$ differences in installed capacity and $40%$ differences in system reserve requirements.
Much recent empirical evidence shows that textit{community structure} is ubiquitous in the real-world networks. In this Letter, we propose a growth model to create scale-free networks with the tunable strength (noted by $Q$) of community structure and investigate the influence of community strength upon the collective synchronization induced by SIRS epidemiological process. Global and local synchronizability of the system is studied by means of an order parameter and the relevant finite-size scaling analysis is provided. The numerical results show that, a phase transition occurs at $Q_csimeq0.835$ from global synchronization to desynchronization and the local synchronization is weakened in a range of intermediately large $Q$. Moreover, we study the impact of mean degree $<k>$ upon synchronization on scale-free networks.
The process of collecting and organizing sets of observations represents a common theme throughout the history of science. However, despite the ubiquity of scientists measuring, recording, and analyzing the dynamics of different processes, an extensive organization of scientific time-series data and analysis methods has never been performed. Addressing this, annotated collections of over 35 000 real-world and model-generated time series and over 9000 time-series analysis algorithms are analyzed in this work. We introduce reduced representations of both time series, in terms of their properties measured by diverse scientific methods, and of time-series analysis methods, in terms of their behaviour on empirical time series, and use them to organize these interdisciplinary resources. This new approach to comparing across diverse scientific data and methods allows us to organize time-series datasets automatically according to their properties, retrieve alternatives to particular analysis methods developed in other scientific disciplines, and automate the selection of useful methods for time-series classification and regression tasks. The broad scientific utility of these tools is demonstrated on datasets of electroencephalograms, self-affine time series, heart beat intervals, speech signals, and others, in each case contributing novel analysis techniques to the existing literature. Highly comparative techniques that compare across an interdisciplinary literature can thus be used to guide more focused research in time-series analysis for applications across the scientific disciplines.