No Arabic abstract
In the same way ecosystems tend to increase maturity by decreasing the flow of energy per unit biomass, we should move towards a more mature science by publishing less but high-quality papers and getting away from joining large teams in small roles. That is, we should decrease our scientific productivity for good.
Scientific journals are the repositories of the gradually accumulating knowledge of mankind about the world surrounding us. Just as our knowledge is organised into classes ranging from major disciplines, subjects and fields to increasingly specific topics, journals can also be categorised into groups using various metrics. In addition to the set of topics characteristic for a journal, they can also be ranked regarding their relevance from the point of overall influence. One widespread measure is impact factor, but in the present paper we intend to reconstruct a much more detailed description by studying the hierarchical relations between the journals based on citation data. We use a measure related to the notion of m-reaching centrality and find a network which shows the level of influence of a journal from the point of the direction and efficiency with which information spreads through the network. We can also obtain an alternative network using a suitably modified nested hierarchy extraction method applied to the same data. The results are weakly methodology-dependent and reveal non-trivial relations among journals. The two alternative hierarchies show large similarity with some striking differences, providing together a complex picture of the intricate relations between scientific journals.
Science is built upon scholarship consensus that changes over time. This raises the question of how revolutionary theories and assumptions are evaluated and accepted into the norm of science as the setting for the next science. Using two recently proposed metrics, we identify the novel paper with high atypicality, which models how research draws upon unusual combinations of prior research in crafting their own contributions, and evaluate recognition to novel papers by citation and disruption, which captures the degree to which a research article creates a new direction by eclipsing citations to the prior work it builds upon. Only a small fraction of papers (2.3%) are highly novel, and there are fewer novel papers over time, with a nearly threefold decrease from 3.9% in 1970 to 1.4% in 2000. A highly novel paper indeed has a much higher chance (61.3%) to disrupt science than conventional papers (36.4%), but this recognition only comes from a distant future as reflected in citations, and it typically takes 10 years or longer for the disruption score of a paper to stabilize. In comparison, only nearly 20% of scholars survived in academia over this long period, measured in publications. We also provide the first computational model reformulating atypicality as the distance across the latent knowledge spaces learned by neural networks, as a proxy to the socially agreed relevance between distinct fields of scientific knowledge. The evolution of this knowledge space characterizes how yesterdays novelty forms todays scientific conventions, which condition the novelty--and surprise--of tomorrows breakthroughs. This computational model may be used to inform science policy that aims to recognize and cultivate novelty, so as to mitigate the conflict between individual career success and collective advance in science and direct human creativity to the unknown frontier of scientific knowledge.
Whether a scientific paper is cited is related not only to the influence of its author(s) but also to the journal publishing it. Scientists, either proficient or tender, usually submit their most important work to prestigious journals which receives higher citations than the ordinary. How to model the role of scientific journals in citation dynamics is of great importance. In this paper we address this issue through two folds. One is the intrinsic heterogeneity of a paper determined by the impact factor of the journal publishing it. The other is the mechanism of a paper being cited which depends on its citations and prestige. We develop a model for citation networks via an intrinsic nodal weight function and an intuitive ageing mechanism. The nodes weight is drawn from the distribution of impact factors of journals and the ageing transition is a function of the citation and the prestige. The node-degree distribution of resulting networks shows nonuniversal scaling: the distribution decays exponentially for small degree and has a power-law tail for large degree, hence the dual behaviour. The higher the impact factor of the journal, the larger the tipping point and the smaller the power exponent that are obtained. With the increase of the journal rank, this phenomenon will fade and evolve to pure power laws.
Inspired by the social and economic benefits of diversity, we analyze over 9 million papers and 6 million scientists to study the relationship between research impact and five classes of diversity: ethnicity, discipline, gender, affiliation, and academic age. Using randomized baseline models, we establish the presence of homophily in ethnicity, gender and affiliation. We then study the effect of diversity on scientific impact, as reflected in citations. Remarkably, of the classes considered, ethnic diversity had the strongest correlation with scientific impact. To further isolate the effects of ethnic diversity, we used randomized baseline models and again found a clear link between diversity and impact. To further support these findings, we use coarsened exact matching to compare the scientific impact of ethnically diverse papers and scientists with closely-matched control groups. Here, we find that ethnic diversity resulted in an impact gain of 10.63% for papers, and 47.67% for scientists.
The majority of scientific papers are distributed in PDF, which pose challenges for accessibility, especially for blind and low vision (BLV) readers. We characterize the scope of this problem by assessing the accessibility of 11,397 PDFs published 2010--2019 sampled across various fields of study, finding that only 2.4% of these PDFs satisfy all of our defined accessibility criteria. We introduce the SciA11y system to offset some of the issues around inaccessibility. SciA11y incorporates several machine learning models to extract the content of scientific PDFs and render this content as accessible HTML, with added novel navigational features to support screen reader users. An intrinsic evaluation of extraction quality indicates that the majority of HTML renders (87%) produced by our system have no or only some readability issues. We perform a qualitative user study to understand the needs of BLV researchers when reading papers, and to assess whether the SciA11y system could address these needs. We summarize our user study findings into a set of five design recommendations for accessible scientific reader systems. User response to SciA11y was positive, with all users saying they would be likely to use the system in the future, and some stating that the system, if available, would become their primary workflow. We successfully produce HTML renders for over 12M papers, of which an open access subset of 1.5M are available for browsing at https://scia11y.org/