No Arabic abstract
Mentorship in science is crucial for topic choice, career decisions, and the success of mentees and mentors. Typically, researchers who study mentorship use article co-authorship and doctoral dissertation datasets. However, available datasets of this type focus on narrow selections of fields and miss out on early career and non-publication-related interactions. Here, we describe MENTORSHIP, a crowdsourced dataset of 743176 mentorship relationships among 738989 scientists across 112 fields that avoids these shortcomings. We enrich the scientists profiles with publication data from the Microsoft Academic Graph and semantic representations of research using deep learning content analysis. Because gender and race have become critical dimensions when analyzing mentorship and disparities in science, we also provide estimations of these factors. We perform extensive validations of the profile--publication matching, semantic content, and demographic inferences. We anticipate this dataset will spur the study of mentorship in science and deepen our understanding of its role in scientists career outcomes.
Computer science is a relatively young discipline combining science, engineering, and mathematics. The main flavors of computer science research involve the theoretical development of conceptual models for the different aspects of computing and the more applicative building of software artifacts and assessment of their properties. In the computer science publication culture, conferences are an important vehicle to quickly move ideas, and journals often publish deep
The paper citation network is a traditional social medium for the exchange of ideas and knowledge. In this paper we view citation networks from the perspective of information diffusion. We study the structural features of the information paths through the citation networks of publications in computer science, and analyze the impact of various citation choices on the subsequent impact of the article. We find that citing recent papers and papers within the same scholarly community garners a slightly larger number of citations on average. However, this correlation is weaker among well-cited papers implying that for high impact work citing within ones field is of lesser importance. We also study differences in information flow for specific subsets of citation networks: books versus conference and journal articles, different areas of computer science, and different time periods.
Preprint is a version of a scientific paper that is publicly distributed preceding formal peer review. Since the launch of arXiv in 1991, preprints have been increasingly distributed over the Internet as opposed to paper copies. It allows open online access to disseminate the original research within a few days, often at a very low operating cost. This work overviews how preprint has been evolving and impacting the research community over the past thirty years alongside the growth of the Web. In this work, we first report that the number of preprints has exponentially increased 63 times in 30 years, although it only accounts for 4% of research articles. Second, we quantify the benefits that preprints bring to authors: preprints reach an audience 14 months earlier on average and associate with five times more citations compared with a non-preprint counterpart. Last, to address the quality concern of preprints, we discover that 41% of preprints are ultimately published at a peer-reviewed destination, and the published venues are as influential as papers without a preprint version. Additionally, we discuss the unprecedented role of preprints in communicating the latest research data during recent public health emergencies. In conclusion, we provide quantitative evidence to unveil the positive impact of preprints on individual researchers and the community. Preprints make scholarly communication more efficient by disseminating scientific discoveries more rapidly and widely with the aid of Web technologies. The measurements we present in this study can help researchers and policymakers make informed decisions about how to effectively use and responsibly embrace a preprint culture.
Our current knowledge of scholarly plagiarism is largely based on the similarity between full text research articles. In this paper, we propose an innovative and novel conceptualization of scholarly plagiarism in the form of reuse of explicit citation sentences in scientific research articles. Note that while full-text plagiarism is an indicator of a gross-level behavior, copying of citation sentences is a more nuanced micro-scale phenomenon observed even for well-known researchers. The current work poses several interesting questions and attempts to answer them by empirically investigating a large bibliographic text dataset from computer science containing millions of lines of citation sentences. In particular, we report evidences of massive copying behavior. We also present several striking real examples throughout the paper to showcase widespread adoption of this undesirable practice. In contrast to the popular perception, we find that copying tendency increases as an author matures. The copying behavior is reported to exist in all fields of computer science; however, the theoretical fields indicate more copying than the applied fields.
Knowledge of how science is consumed in public domains is essential for a deeper understanding of the role of science in human society. While science is heavily supported by public funding, common depictions suggest that scientific research remains an isolated or ivory tower activity, with weak connectivity to public use, little relationship between the quality of research and its public use, and little correspondence between the funding of science and its public use. This paper introduces a measurement framework to examine public good features of science, allowing us to study public uses of science, the public funding of science, and how use and funding relate. Specifically, we integrate five large-scale datasets that link scientific publications from all scientific fields to their upstream funding support and downstream public uses across three public domains - government documents, the news media, and marketplace invention. We find that the public uses of science are extremely diverse, with different public domains drawing distinctively across scientific fields. Yet amidst these differences, we find key forms of alignment in the interface between science and society. First, despite concerns that the public does not engage high-quality science, we find universal alignment, in each scientific field and public domain, between what the public consumes and what is highly impactful within science. Second, despite myriad factors underpinning the public funding of science, the resulting allocation across fields presents a striking alignment with the fields collective public use. Overall, public uses of science present a rich landscape of specialized consumption, yet collectively science and society interface with remarkable, quantifiable alignment between scientific use, public use, and funding.