Should citations be field-normalized in evaluative bibliometrics? An empirical analysis based on propensity score matching

151 0 0.0 ( 0 )

Download Cite

Added by Lutz Bornmann Dr.

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Lutz Bornmann - Robin Haunschild - Ruediger Mutz

Digital Libraries

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Field-normalization of citations is bibliometric standard. Despite the observed differences in citation counts between fields, the question remains how strong fields influence citation rates beyond the effect of attributes or factors possibly influencing citations (FICs). We considered several FICs such as number of pages and number of co-authors in this study. We wondered whether there is a separate field-effect besides other effects (e.g., from numbers of pages and co-authors). To find an answer on the question in this study, we applied inverse-probability of treatment weighting (IPW). Using Web of Science data (a sample of 308,231 articles), we investigated whether mean differences among subject categories in citation rates still remain, even if the subject categories are made comparable in the field-related attributes (e.g., comparable of co-authors, comparable number of pages) by IPW. In a diagnostic step of our statistical analyses, we considered propensity scores as covariates in regression analyses to examine whether the differences between the fields in FICs vanish. The results revealed that the differences did not completely vanish but were strongly reduced. We received similar results when we calculated mean value differences of the fields after IPW representing the causal or unconfounded field effects on citations. However, field differences in citation rates remain. The results point out that field-normalization seems to be a prerequisite for citation analysis and cannot be replaced by the consideration of any set of FICs in citation analyses.

rate research

Propensity score analysis with partially observed confounders: how should multiple imputation be used?

102 - Clemence Leyrat , Shaun R. Seaman , Ian R. White 2016

Inverse probability of treatment weighting (IPTW) is a popular propensity score (PS)-based approach to estimate causal effects in observational studies at risk of confounding bias. A major issue when estimating the PS is the presence of partially observed covariates. Multiple imputation (MI) is a natural approach to handle missing data on covariates, but its use in the PS context raises three important questions: (i) should we apply Rubins rules to the IPTW treatment effect estimates or to the PS estimates themselves? (ii) does the outcome have to be included in the imputation model? (iii) how should we estimate the variance of the IPTW estimator after MI? We performed a simulation study focusing on the effect of a binary treatment on a binary outcome with three confounders (two of them partially observed). We used MI with chained equations to create complete datasets and compared three ways of combining the results: combining treatment effect estimates (MIte); combining the PS across the imputed datasets (MIps); or combining the PS parameters and estimating the PS of the average covariates across the imputed datasets (MIpar). We also compared the performance of these methods to complete case (CC) analysis and the missingness pattern (MP) approach, a method which uses a different PS model for each pattern of missingness. We also studied empirically the consistency of these 3 MI estimators. Under a missing at random (MAR) mechanism, CC and MP analyses were biased in most cases when estimating the marginal treatment effect, whereas MI approaches had good performance in reducing bias as long as the outcome was included in the imputation model. However, only MIte was unbiased in all the studied scenarios and Rubins rules provided good variance estimates for MIte.

Methodology

Selective Inference in Propensity Score Analysis

88 - Yoshiyuki Ninomiya , Yuta Umezu , Ichiro Takeuchi 2021

Selective inference (post-selection inference) is a methodology that has attracted much attention in recent years in the fields of statistics and machine learning. Naive inference based on data that are also used for model selection tends to show an overestimation, and so the selective inference conditions the event that the model was selected. In this paper, we develop selective inference in propensity score analysis with a semiparametric approach, which has become a standard tool in causal inference. Specifically, for the most basic causal inference model in which the causal effect can be written as a linear sum of confounding variables, we conduct Lasso-type variable selection by adding an $ell_1$ penalty term to the loss function that gives a semiparametric estimator. Confidence intervals are then given for the coefficients of the selected confounding variables, conditional on the event of variable selection, with asymptotic guarantees. An important property of this method is that it does not require modeling of nonparametric regression functions for the outcome variables, as is usually the case with semiparametric propensity score analysis.

Methodology

Which papers cited which tweets? An empirical analysis based on Scopus data

323 - Robin Haunschild , Lutz Bornmann 2020

Many altmetric studies analyze which papers were mentioned how often in specific altmetrics sources. In order to study the potential policy relevance of tweets from another perspective, we investigate which tweets were cited in papers. If many tweets were cited in publications, this might demonstrate that tweets have substantial and useful content. Overall, a rather low number of tweets (n=5506) were cited by less than 3000 papers. Most tweets do not seem to be cited because of any cognitive influence they might have had on studies; they rather were study objects. Most of the papers citing tweets are from the subject areas Social Sciences, Arts and Humanities, and Computer Sciences. Most of the papers cited only one tweet. Up to 55 tweets cited in a single paper were found. This research-in-progress does not support a high policy-relevance of tweets. However, a content analysis of the tweets and/or papers might lead to a more detailed conclusion.

Digital Libraries

Usage Bibliometrics

129 - Michael J. Kurtz , Johan Bollen 2011

Scholarly usage data provides unique opportunities to address the known shortcomings of citation analysis. However, the collection, processing and analysis of usage data remains an area of active research. This article provides a review of the state-of-the-art in usage-based informetric, i.e. the use of usage data to study the scholarly process.

Digital Libraries Instrumentation and Methods for Astrophysics Information Retrieval

On the Heterogeneous Distributions in Paper Citations

145 - Jinhyuk Yun , Sejung Ahn , June Young Lee 2018

Academic papers have been the protagonists in disseminating expertise. Naturally, paper citation pattern analysis is an efficient and essential way of investigating the knowledge structure of science and technology. For decades, it has been observed that citation of scientific literature follows a heterogeneous and heavy-tailed distribution, and many of them suggest a power-law distribution, log-normal distribution, and related distributions. However, many studies are limited to small-scale approaches; therefore, it is hard to generalize. To overcome this problem, we investigate 21 years of citation evolution through a systematic analysis of the entire citation history of 42,423,644 scientific literatures published from 1996 to 2016 and contained in SCOPUS. We tested six candidate distributions for the scientific literature in three distinct levels of Scimago Journal & Country Rank (SJR) classification scheme. First, we observe that the raw number of annual citation acquisitions tends to follow the log-normal distribution for all disciplines, except for the first year of the publication. We also find significant disparity between the yearly acquired citation number among the journals, which suggests that it is essential to remove the citation surplus inherited from the prestige of the journals. Our simple method for separating the citation preference of an individual article from the inherited citation of the journals reveals an unexpected regularity in the normalized annual acquisitions of citations across the entire field of science. Specifically, the normalized annual citation acquisitions have power-law probability distributions with an exponential cut-off of the exponents around 2.3, regardless of its publication and citation year. Our results imply that journal reputation has a substantial long-term impact on the citation.

Digital Libraries Physics and Society