The inconsistency of h-index: a mathematical analysis


Abstract in English

Citation distributions are lognormal. We use 30 lognormally distributed synthetic series of numbers that simulate real series of citations to investigate the consistency of the h index. Using the lognormal cumulative distribution function, the equation that defines the h index can be formulated; this equation shows that h has a complex dependence on the number of papers (N). We also investigate the correlation between h and the number of papers exceeding various citation thresholds, from 5 to 500 citations. The best correlation is for the 100 threshold but numerous data points deviate from the general trend. The size-independent indicator h/N shows no correlation with the probability of publishing a paper exceeding any of the citation thresholds. In contrast with the h index, the total number of citations shows a high correlation with the number of papers exceeding the thresholds of 10 and 50 citations; the mean number of citations correlates with the probability of publishing a paper that exceeds any level of citations. Thus, in synthetic series, the number of citations and the mean number of citations are much better indicators of research performance than h and h/N. We discuss that in real citation distributions there are other difficulties.

Download