New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Predicting the Reproducibility of Social and Behavioral Science Papers Using Supervised Learning Models

350 0 0.0 ( 0 )

Download Cite

Added by Jian Wu

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Jian Wu - Rajal Nivargi - Sree Sai Teja Lanka

Digital Libraries Artificial Intelligence Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In recent years, significant effort has been invested verifying the reproducibility and robustness of research claims in social and behavioral sciences (SBS), much of which has involved resource-intensive replication projects. In this paper, we investigate prediction of the reproducibility of SBS papers using machine learning methods based on a set of features. We propose a framework that extracts five types of features from scholarly work that can be used to support assessments of reproducibility of published research claims. Bibliometric features, venue features, and author features are collected from public APIs or extracted using open source machine learning libraries with customized parsers. Statistical features, such as p-values, are extracted by recognizing patterns in the body text. Semantic features, such as funding information, are obtained from public APIs or are extracted using natural language processing models. We analyze pairwise correlations between individual features and their importance for predicting a set of human-assessed ground truth labels. In doing so, we identify a subset of 9 top features that play relatively more important roles in predicting the reproducibility of SBS papers in our corpus. Results are verified by comparing performances of 10 supervised predictive classifiers trained on different sets of features.

rate research

TweetPap: A Dataset to Study the Social Media Discourse of Scientific Papers

174 - Naman Jain , Mayank Singh 2021

Nowadays, researchers have moved to platforms like Twitter to spread information about their ideas and empirical evidence. Recent studies have shown that social media affects the scientific impact of a paper. However, these studies only utilize the tweet counts to represent Twitter activity. In this paper, we propose TweetPap, a large-scale dataset that introduces temporal information of citation/tweets and the metadata of the tweets to quantify and understand the discourse of scientific papers on social media. The dataset is publicly available at https://github.com/lingo-iitgn/TweetPap

Digital Libraries Social and Information Networks

A papers corresponding affiliation and first affiliation are consistent at the country level in Web of Science

66 - Jianfei Yu , Chunxiao Yin , Linlin Liu 2021

The purpose of this study is to explore the relationship between the first affiliation and the corresponding affiliation at the different levels via the scientometric analysis We select over 18 million papers in the core collection database of Web of Science (WoS) published from 2000 to 2015, and measure the percentage of match between the first and the corresponding affiliation at the country and institution level. We find that a papers the first affiliation and the corresponding affiliation are highly consistent at the country level, with over 98% of the match on average. However, the match at the institution level is much lower, which varies significantly with time and country. Hence, for studies at the country level, using the first and corresponding affiliations are almost the same. But we may need to take more cautions to select affiliation when the institution is the focus of the investigation. In the meanwhile, we find some evidence that the recorded corresponding information in the WoS database has undergone some changes since 2013, which sheds light on future studies on the comparison of different databases or the affiliation accuracy of WoS. Our finding relies on the records of WoS, which may not be entirely accurate. Given the scale of the analysis, our findings can serve as a useful reference for further studies when country allocation or institute allocation is needed. Existing studies on comparisons of straight counting methods usually cover a limited number of papers, a particular research field or a limited range of time. More importantly, using the number counted can not sufficiently tell if the corresponding and first affiliation are similar. This paper uses a metric similar to Jaccard similarity to measure the percentage of the match and performs a comprehensive analysis based on a large-scale bibliometric database.

Digital Libraries

Towards Long-term and Archivable Reproducibility

61 - Mohammad Akhlaghi , Raul Infante-Sainz , Boudewijn F. Roukema 2020

Analysis pipelines commonly use high-level technologies that are popular when created, but are unlikely to be readable, executable, or sustainable in the long term. A set of criteria is introduced to address this problem: Completeness (no execution requirement beyond a minimal Unix-like operating system, no administrator privileges, no network connection, and storage primarily in plain text); modular design; minimal complexity; scalability; verifiable inputs and outputs; version control; linking analysis with narrative; and free software. As a proof of concept, we introduce Maneage (Managing data lineage), enabling cheap archiving, provenance extraction, and peer verification that been tested in several research publications. We show that longevity is a realistic requirement that does not sacrifice immediate or short-term reproducibility. The caveats (with proposed solutions) are then discussed and we conclude with the benefits for the various stakeholders. This paper is itself written with Maneage (project commit eeff5de).

Digital Libraries

The role of mainstreamness and interdisciplinarity for the relevance of scientific papers

233 - Stefan Thurner , Wenyuan Liu , Peter Klimek 2019

There is demand from science funders, industry, and the public that science should become more risk-taking, more out-of-the-box, and more interdisciplinary. Is it possible to tell how interdisciplinary and out-of-the-box scientific papers are, or which papers are mainstream? Here we use the bibliographic coupling network, derived from all physics papers that were published in the Physical Review journals in the past century, to try to identify them as mainstream, out-of-the-box, or interdisciplinary. We show that the network clusters into scientific fields. The position of individual papers with respect to these clusters allows us to estimate their degree of mainstreamness or interdisciplinary. We show that over the past decades the fraction of mainstream papers increases, the fraction of out-of-the-box decreases, and the fraction of interdisciplinary papers remains constant. Studying the rewards of papers, we find that in terms of absolute citations, both, mainstream and interdisciplinary papers are rewarded. In the long run, mainstream papers perform less than interdisciplinary ones in terms of citation rates. We conclude that to avoid a trend towards mainstreamness a new incentive scheme is necessary.

Digital Libraries Physics and Society Applications

Predicting human decisions with behavioral theories and machine learning

96 - Ori Plonsky , Reut Apel , Eyal Ert 2019

Behavioral decision theories aim to explain human behavior. Can they help predict it? An open tournament for prediction of human choices in fundamental economic decision tasks is presented. The results suggest that integration of certain behavioral theories as features in machine learning systems provides the best predictions. Surprisingly, the most useful theories for prediction build on basic properties of human and animal learning and are very different from mainstream decision theories that focus on deviations from rational choice. Moreover, we find that theoretical features should be based not only on qualitative behavioral insights (e.g. loss aversion), but also on quantitative behavioral foresights generated by functional descriptive models (e.g. Prospect Theory). Our analysis prescribes a recipe for derivation of explainable, useful predictions of human decisions.

Artificial Intelligence Computer Science and Game Theory Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Predicting the Reproducibility of Social and Behavioral Science Papers Using Supervised Learning Models

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions