I-score: An indicator quantifying scientific research output

Abstract

This study presents a new bibliometric indicator, the i-score, which is designed to measure the scientific output of individuals or groups. Similar to the Hirsch index (h-index), the i-score uses a single number to assess research performance. However, it goes a step further by considering the geometric area under the publication–citation curve, offering a more refined measurement that addresses the deficiencies of the h-index. Through three empirical studies, this study shows that the i-score offers a more consistent and accurate assessment of research performance by incorporating both productivity and quality.

Keywords

i-score h-index indicator research output

1. Introduction

The Hirsch index (h-index) [1], introduced by American physicist Jorge E. Hirsch in 2005, is defined as “the number of papers with citation number $\geq$ h, as a useful index to characterize the scientific output of a researcher” [1]. This single number has become a popular tool for representing both research productivity and quality [2,3], making the h-index a key focus in bibliometrics and research evaluation since its inception.

On the contrary, the h-index has been widely criticized for its deficiencies and limitations in evaluating research [3 –10]. Numerous variations of the h-index have been proposed to overcome these issues, but each typically addresses only one or two flaws.

The purpose of this study is to introduce a new geometric indicator that measures both research productivity and quality, addressing the deficiencies and limitations of the h-index. Like the h-index, the proposed i-score summarizes a researcher’s or group’s scientific output and impact into a single number, which is often preferred by research administrators. Although the deficiencies and limitations of the h-index are well-known, it remains widely used because combining multiple dimensions of research impact into one simple measure is convenient and appealing. The i-score aims to provide the same simplicity while overcoming the h-index’s deficiencies and limitations, making it a more effective tool for research evaluation.

The remaining part of the paper is structured as follows. In the “Literature Review” section, I will review the literature regarding h-index and summarize its main deficiencies and limitations. In the “i-score” section, I will introduce and define a new indicator, i-score, using the geometric area of the rectangle under the publication–citation curve to measure research productivity and quality. In the “Empirical studies” section, I will conduct three empirical studies for the new indicator and present the results. In the “Discussion” section, I will address the limitations of the i-score in research evaluation as well as some possible solutions. Finally, in the “Conclusion” section, I will briefly give my concluding remarks.

2. Literature review

Since 2005, Hirsch’s paper introducing the h-index has been widely cited, leading to numerous related publications. However, the validity and reliability of the h-index have been challenged by bibliometricians and other scholars [2,11 –13]. As a result, experts in bibliometrics have recommended against proposing new citation impact indicators unless they clearly demonstrate added value [14].

2.1. Deficiencies and limitations

While the h-index effectively balances research productivity and quality, addressing the limitations of some indicators that are overly influenced by a few highly cited papers [11,15], it also has inherent flaws and practical limitations [3 –10,16 –19] as summarized in Table 1.

Table 1.

The major deficiencies and limitations of the h-index.

Category		Explanation
Deficiency	Inconsistency	The h-index may behave inconsistently when the underlying data change in ways that should logically increase the score.
	Insensitiveness	The h-index does not respond well to changes in publications or citations outside the h-core.
	Degree of discrimination	The h-index cannot easily distinguish between scholars with different performance levels when their h-index are similar.
Limitation	Disciplinary bias	Different academic fields have different publication and citation cultures. The h-index cannot account for these differences.
	Counting method	The h-index can be inflated by self-citations and co-authorship.
	Career index	The h-index systematically advantages older or more senior researchers.
	Database coverage	The h-index depends on the database (Web of Science, Scopus, Google Scholar), which can produce different h-index values.

Waltman and van Eck [12] point out that the definition of h-index regarding calculating the h-index is arbitrary, considering that the number of citations and the number of publications are two independent entities. They argue that the h-index could be defined as “a scientist has an h-index of h if h of his publications each have at least 2-h citations and his remaining publications each have fewer than 2(h + 1) citations” or “a scientist has an h-index of h if h of his publications each have at least h/2 citations and his remaining publications each have fewer than (h + 1)/2 citations” [12], and such an arbitrary definition may result in inconsistent results when calculating the h.

Previous studies suggest that the h-index is primarily determined by a few highly cited papers that make up the h-core, while it overlooks papers and citations outside of this core [11,13,20,21]. In addition, some researchers argue that the h-index is not effective in distinguishing between scholars with similar research performance [16,22].

The h-index is a commonly used metric for assessing research performance, but it has several well-known limitations when used in research evaluation. It is field-dependent, making cross-disciplinary comparisons challenging [1 –4]. In addition, the h-index can be skewed by self-citations [23 –25] and co-authorship [3,18], which may artificially inflate its value. The h-index also favours more senior scholars, as it increases with the number of publications and citations over time, and never decreases [2,17,26]. Furthermore, the h-index relies on data from bibliographic databases such as Web of Science and Scopus, which have varying and limited coverage [3,19].

2.2. h-indices

To overcome the limitations of the h-index, more than 50 variants have been proposed and developed, as shown in Table 2. Of these, 18 variants aim to improve the accuracy of the h-core by including more highly cited papers, while 17 focus on adjusting the counting method to account for self-citations and co-authorship. In addition, six variants address disciplinary bias, and eleven consider the academic age of researchers. Eight other variants seek to expand the h-index’s applications. However, only a few variants have addressed issues related to inconsistency and discrimination in the h-index.

Table 2.

List of h-index variants by what they address.

Deficiencies
Inconsistency	Insensitiveness	Degree of discrimination
h_α-index [27]; m-score [28]; rec-index [29]	A-index, R-index, AR-index [30]; h(2)-index [31]; e-index [32]; f-index, t-index [33]; g-index [21]; hg-index [20]; h_w-index [34]; IQp [35]; Maxprod [36]; m-index [1]; π-index [37]; q²-index [38]; h_T-index [39]; W-index [40]	h_m-index [22]; m-score [28]; multidimensional h-index [41]
Limitations
Disciplinary bias	Counting method	Career index
h_I index [42]; the contemporary h-index, trend h-index, normalized h-index [43]; HF-rating [44]; IQp [35]; n-index [45]	Adapted pure h-index [46]; selectivity (S) and amplitude (A) indices [47]; b-index [48]; h(2)-index [31]; degree h-index [49]; h_I index [42]; h-index of first authored papers [50]; index ℏ (hbar) [51]; h_α index [52]; h_m-index [53]; excluding self-citations from h-index [54]; k-index [55]; h-maj [56]; RC-index and CC-index [57]; weighted h-index [58]; w-index [59]	A-index, R-index, AR-index [30]; citation speed index [60]; contemporary h-index, trend h-index, normalized h-index [43]; dynamic h-type index [61]; h_w-index [34]; IQp [35]; k-index [62]; m-index [1]; h-rate [63]; index hpd [64]; s-index [65]
Applications
Index h_m [66]; h₁⁺, h₁^Δ and h₁* [67]; modified impact index (MII) [68]; multidimensional h-index [41]; h₁ index and h₂ index[69]; q²-index [38]; v-index [70]; w-index [71]

From the perspective of bibliometricians, inconsistency is regarded as a crucial deficiency, as Waltman and van Eck [12] declare that the h-index is not a valid indicator to measure research output. Unfortunately, although the inconsistency is a serious deficiency of h-index weakening the validity and reliability of its measurement, it was less addressed by previous studies and these h-indices.

2.3. Geometric indicators: m-score and rec-index

Shu [28] introduced a new indicator called the m-score, using the geometric area under the publication–citation curve to measure both research productivity and quality. The m-score addresses the inconsistencies of the h-index and offers better discrimination. Building on this idea, Fenner et al. [72] developed another geometric indicator, rec-index, which was defined as the maximum rectangle between the citation vector x and the number of publications. The i-score is an improvement in the rec-index.

Figure 1 illustrates how to use the geometric area under the publication–citation curve to measure research output. For each scholar, we plot their research output on a publication–citation curve, where each point on the curve shows the number of publications (P) that have been cited at least a certain number of times (C). At any given point on the curve, the rectangle CBPO represents the scholar’s research output for having P publications with at least C citations. We can also draw a triangle (2C-2P-O) that is exactly twice the size of the rectangle CBPO. This triangle covers most of the research output but does not include the two tails of the curve, which represent very highly cited or very lowly cited papers. The rec-index [72] identifies the largest rectangle CBPO that can be placed under the publication–citation curve, while the h-index identifies the largest square under the curve, whose area is h².

Figure 1.

The geometric area under the citation–publication curve.

Measuring research output by using the area of the largest rectangle can avoid some of the problems with the h-index, but it can still be skewed by a few extremely highly cited papers or a large number of lowly cited or uncited papers. To address this issue, Shu [28] used the logarithms of both the number of papers and citations when calculating the m-score, while Harris et al. [29] proposed a two-dimensional bibliometric index (rec_i, rec_p) that assesses both the influential and the prolific aspects of a researcher’s output. However, neither method provides a single number, like the h-index, that captures both productivity and quality of research.

3. i-score

In this study, I propose a new geometric indicator called the i-score (impact score) to assess both research productivity and quality. The i-score is based on the same method using geometric area under the publication–citation curve. As shown in Figure 1, for any given point B on this curve, the rectangle CBPO represents the research output of an individual researcher or a group, where the individual/group has P publications that were cited at least C times. Unlike the rec-index proposed by Fenner et al. [72], which uses the area of the largest rectangle CBPO to measure research output, the i-score is calculated as the average area of all rectangles CBPO under the publication–citation curve.

3.1. Definition

The i-score mirrors the rec-index definition in Fenner et al. [72], but instead of taking the maximum rectangle, the i-score takes the average of all rectangles defined at distinct thresholds, providing a more balanced indicator that incorporates the entire citation curve rather than just the single dominating rectangle.

Let the citation curve be the histogram obtained by plotting the number of citations (on the vertical axis) against the ranked publications (on the horizontal axis), with the citation vector x =x₁, x₂, …, x_n sorted in descending order. The rec-index [72] corresponds to the largest rectangle that can fit under the citation curve. The i-score generalizes this by considering all maximal rectangles that appear when the curve is viewed at distinct citation thresholds.

At each threshold C, we construct a rectangle with height C and width P(C), where P(C) is the number of papers with at least C citations. The i-score is then the average area of these rectangles.

i (x) = \frac{1}{[T]} \sum_{C \in T} C \cdot P (C)

where T is the set of distinct citation thresholds.

3.2. Computing of i-score

The computing of i-score is as follows: (1) rank the scholar’s publications in descending order based on the number of citations each has received; (2) identify “P” as the number of publications that have been cited at least “C” times; (3) calculate the value of “rec” as rec = P × C; and (4) finally, the i-score is the average value of these “rec” calculations.

As demonstrated in Table 3 and Figure 2, a scholar published 15 papers, of which 12 were cited between 1 and 70 times each. To compute the i-score, we follow the four-step procedure mentioned above.

Step 1. Construct the ranked list.

Each paper is ordered from the most cited to the least cited. In this case, the most cited paper received 70 citations, the second and third papers each received 18 citations, the fourth received 8 citations, and so on, until the twelfth paper received 1 citation. The last three papers received 0 citations.

Step 2. Transform into a cumulative publication–citation distribution.

For each distinct citation threshold “C,” we count the number of papers “P” that have at least C citations. This produces points on the publication–citation curve (Figure 2), illustrating that:

1 paper was cited at least 70 times,

3 papers were cited at least 18 times,

4 papers were cited at least 8 times,

8 papers were cited at least 5 times,

10 papers were cited at least 4 times,

11 papers were cited at least 2 times

12 papers were cited at least 1 time.

Step 3. Compute rec values.

Each rec value is calculated as the product of “C” (the citation threshold) and “P” (the number of papers meeting that threshold). Only the points where a change occurs in the cumulative curve are included, so we obtain the following seven rec values: 70, 54, 32, 40, 40, 22, and 12.

Step 4. Average the rec values.

The i-score is defined as the mean of these rec values

i - score = \frac{70 + 54 + 32 + 40 + 40 + 22 + 12}{7} = 38.57

Table 3.

Computing the i-score using an example.

Rank (by citation)	C value	P value	Rec (C×P)
1	70	1	70
2	18	2
3	18	3	54
4	8	4	32
5	5	5
6	5	6
7	5	7
8	5	8	40
9	4	9
10	4	10	40
11	2	11	22
12	1	12	12
13	0	13
14	0	14
15	0	15

Figure 2.

Calculating the i-score by a sample citation–publication curve.

4. Empirical studies

To validate the new indicator, three empirical studies were conducted. The first study addressed the inconsistency in the h-index identified by Waltman and van Eck [12]. The second study compared the i-score with the h-index, m-score, and rec-index using the same data set reported by Cronin and Meho [73]. The third study analysed the correlation between the i-score and other bibliometric indicators using a large data set.

4.1. Inconsistency test

Waltman and van Eck [12] argue that the arbitrary calculation of the h-index can lead to inconsistent results, as demonstrated through three examples. These examples were also used in this study to validate the m-score, rec-index, and i-score, all of which are based on the geometric area under the publication–citation curve.

In the first example, Waltman and van Eck [12] point out that the h-index can violate the principle that the ranking of two scientists should remain consistent when their relative performance is the same. As shown in Table 4, when Scientists A and B both double their research output, Scientist A’s h-index increases from 9 to 12, while Scientist B’s h-index jumps from 7 to 14. As a result, their rankings shift, even though their relative performance remains the same. On the contrary, the m-score, rec-index, and i-score all maintain consistent rankings. Scientist A’s m-score, rec-index, and i-score increase from 1.0299 to 1.3547, 108 to 216, and 78 to 156, respectively, while Scientist B’s scores rise from 0.9939 to 1.348, 105 to 210, and 77.5 to 155, respectively, preserving the original ranking between the two scientists.

Table 4.

Example 1 of Waltman and van Eck [12].

		h-index	m-score	rec-index	i-score
Scientist A
Before	12 publications, 3 with 4 citations and 9 with 12 citations each	9	1.0299	108	78
After	24 publications, 6 with 4 citations and 18 with 12 citations each	12	1.3547	216	156
Scientist B
Before	10 publications, 3 with 5 citations and 7 with 15 citations each	7	0.9939	105	77.5
After	20 publications, 6 with 5 citations and 14 with 15 citations each	14	1.348	210	155

In the second example, Waltman and van Eck [12] observe that the h-index can violate the principle that the ranking of two scientists should remain consistent when they add the same research record. For instance, when Scientists X and Y each add the same number of publications and citations within the same timeframe (as shown in Table 5), Scientist X’s h-index remains at 5, while Scientist Y’s h-index jumps from 4 to 6. This change alters their rankings despite their identical contributions. The m-score and rec-index also show similar inconsistencies. For example, Scientist X’s m-score and rec-index increase from 0.4886 to 0.5907 and from 25 to 35, respectively, while Scientist Y’s m-score and rec-index rise from 0.4685 to 0.6055 and from 24 to 36. On the contrary, the i-score provides consistent rankings – Scientist X’s i-score increases from 19.5 to 21.7, and Scientist Y’s rises from 22.5 to 24.3, keeping Scientist Y ranked higher than Scientist X both before and after the addition of the new research records.

Table 5.

Example 2 of Waltman and van Eck [12].

		h-index	m-score	rec-index	i-score
Scientist X
Before	7 publications, 5 with 5 citations and 2 with 2 citations each	5	0.4886	25	19.5
After	9 publications, 2 with 8 citations, 5 with 5 citations, and 2 with 2 citations each	5	0.5907	35	21.7
Scientist B
Before	7 publications, 4 with 6 citations and 3 with 3 citations each	4	0.4685	24	22.5
After	9 publications, 2 with 8 citations, 4 with 6 citations, and 3 with 3 citations each	6	0.6055	36	24.3

In the third example, Waltman and van Eck [12] demonstrate that the h-index fails to uphold the principle that the ranking of two groups should align with the rankings of their individual members. As illustrated in Table 6, Group A has a lower h-index than Group B, despite its members having higher h-indices than those in Group B. On the contrary, when using the m-score, rec-index, and i-score, the group rankings are consistent with the rankings of their individual members. Specifically, since the m-score, rec-index, and i-scores of Scientists X1 and X2 are higher than those of Scientists Y1 and Y2, Group A’s overall m-score, rec-index, and i-score are also higher than those of Group B.

Table 6.

Example 3 of Waltman and van Eck [12].

	h-index	m-score	rec-index	i-score
Group A: 14 publications, each is cited 9 times	9	1.0937	126	126
Scientist X1: 7 publications, each is cited 9 times	7	0.8064	63	63
Scientist X2: 7 publications, each is cited 9 times	7	0.8064	63	63
Group B: 12 publications, each is cited 10 times	10	1.0792	120	120
Scientist Y1: 6 publications, each is cited 10 times	6	0.7782	60	60
Scientist Y2: 6 publications, each is cited 10 times	6	0.7782	60	60

According to Waltman and van Eck [12], the arbitrary nature of the h-index—where h publications are cited at least h times—can lead to inconsistencies. On the contrary, m-score, rec-index, and the i-score, which reflect the geometric area under the publication–citation curve, provide a more reliable measure. While m-score and rec-index may still show some inconsistencies, the i-score consistently avoids this issue.

4.2. Comparison test

An additional empirical study was conducted to compare the i-score with the h-index, m-score, and rec-index in ranking 31 influential information scientists, originally reported by Cronin and Meho [73]. The findings reveal that indicators like the m-score, rec-index, and i-score, which use the geometric area under the publication–citation curve, can differentiate scholars with similar research performance, whereas the h-index cannot.

Table 7 illustrates that the h-index struggles to distinguish between scholars with similar performance, as it results in ties for certain h-index values: 27, 23, 22, 17, 13, 12, 10, and 4. Two scholars had the same h-index for 5 out of these 7 values, 3 scholars tied for 22, and four scholars tied for 12. This lack of differentiation can be addressed by using indicators based on the geometric area under the publication–citation curve, which provides unique rankings for all scholars. For instance, Bates, White, and Dillon all share an h-index of 22, despite having different research outputs with 66, 69, and 90 publications, and 2742, 3150, and 2155 citations, respectively. However, when assessed using the m-score, rec-index, and i-score, their research outcomes yield distinct values: m-scores of 1.9947, 1.9466, and 1.9057; rec-indexes of 880, 1242, and 856; and i-scores of 449.57, 449.12, and 354.49, respectively.

Table 7.

Comparison of the h-index, m-score, rec-index, and i-score, along with their rankings (in brackets) for 31 information scientists.

Name	No. of papers	No. of citations	h-index	m-score	rec-index	i-score
Belkin, Nicholas J.	74 (14)	3412 (1)	25 (4)	2.163 (2)	1212 (4)	555.1 (1)
Borgman, Christine L.	126 (3)	3183 (4)	27 (1)	2.1583 (3)	1250 (1)	525.73 (2)
Cronin, Blaise	112 (6)	3079 (6)	27 (1)	2.0714 (5)	824 (9)	525.38 (3)
Saracevic, Tefko	106 (8)	3409 (2)	24 (5)	2.2382 (1)	1232 (3)	474.17 (4)
Tenopir, Carol	421 (1)	3195 (3)	26 (3)	2.0735 (4)	777 (10)	473.97 (5)
Bates, Marcia J.	66 (17)	2742 (8)	22 (8)	1.9947 (6)	880 (7)	449.57 (6)
White, Howard D.	69 (16)	3150 (5)	22 (8)	1.9466 (8)	1242 (2)	449.12 (7)
McCain, Katherine W.	77 (13)	2882 (7)	23 (6)	1.8995 (10)	954 (5)	405.88 (8)
Marchionini, Gary	113 (5)	2314 (9)	23 (6)	1.9756 (7)	660 (12)	379.75 (9)
Dillon, Andrew	90 (10)	2155 (10)	22 (8)	1.9057 (9)	856 (8)	354.49 (10)
Budd, John M.	120 (4)	1160 (17)	19 (11)	1.7426 (11)	448 (17)	267.87 (11)
Spink, Amanda	55 (21)	1210 (15)	17 (13)	1.6527 (13)	408 (18)	240.28 (12)
McClure, Charles R.	166 (2)	1064 (18)	17 (13)	1.6022 (14)	370 (20)	234.11 (13)
Kantor, Paul B.	109 (7)	1547 (12)	18 (12)	1.6755 (12)	600 (13)	224.49 (14)
Case, Donald O.	57 (20)	1176 (16)	15 (16)	1.5814 (15)	372 (19)	216.1 (15)
Kuhlthau, Carol C.	27 (27)	1569 (11)	12 (19)	1.3809 (18)	925 (6)	205.91 (16)
Wildemuth, Barbara M.	65 (18)	837 (20)	16 (15)	1.4763 (16)	279 (23)	186.48 (17)
Fidel, Raya	10 (30)	551 (22)	6 (28)	1.1289 (25)	300 (22)	178.14 (18)
Buckland, Michael K.	70 (15)	1502 (13)	13 (17)	1.3617 (19)	675 (11)	168.78 (19)
Griffiths, José-Marie	65 (18)	1317 (14)	13 (17)	1.4231 (17)	556 (15)	166.24 (20)
Losee, Robert M.	50 (22)	515 (25)	12 (19)	1.3072 (20)	198 (25)	138.74 (21)
Schamber, Linda	31 (26)	888 (19)	8 (27)	1.1814 (22)	482 (16)	136.3 (22)
Van House, Nancy	32 (25)	516 (24)	12 (19)	1.1664 (23)	207 (24)	126.45 (23)
Smith, Linda C.	9 (31)	384 (29)	4 (30)	0.8147 (29)	304 (21)	114.83 (24)
Carbo, Toni	12 (28)	620 (21)	4 (30)	0.5315 (31)	586 (14)	114.17 (25)
Larson, Ray R.	81 (12)	548 (23)	12 (19)	1.2202 (21)	171 (27)	111.06 (26)
Koenig, Michael E.D.	82 (11)	454 (26)	10 (24)	1.1223 (27)	138 (30)	91.59 (27)
Hernon, Peter	91 (9)	420 (28)	11 (23)	1.1239 (26)	156 (28)	81.52 (28)
Soergel, Dagobert	46 (23)	437 (27)	10 (24)	1.149 (24)	144 (29)	76.35 (29)
Eisenberg, Mike	12 (28)	319 (30)	5 (29)	0.699 (30)	187 (26)	70.75 (30)
Liddy, Elizabeth D.	46 (23)	311 (31)	9 (26)	1.0399 (28)	119 (31)	69.56 (31)

These indicators, based on the area under the publication–citations curve, are influenced by a set of highly cited papers. For example, although Hernon has more publications (91 vs. 10) and a higher h-index (11 vs. 6) than Fidel, his m-score (1.1239 vs. 1.1289), rec-index (156 vs. 300), and i-score (81.52 vs. 178.14) are all lower than Fidel’s. Fidel’s advantage comes from her four highly cited papers, which received 233, 117, 80, and 75 citations, respectively, despite only having six papers cited at least 6 times. On the contrary, Hernon’s most cited paper received only 53 citations, even though he has 11 papers cited at least 11 times. Another example is Schamber, who was ranked 27th by h-index (8). She received 888 citations for her 31 papers, including three that were cited more than 100 times (293, 241, and 158 citations). As a result, her ranking improves to 22nd by m-score and i-score and 16th by rec-index.

Of the three indicators that use the area under the publication–citation curve, the rec-index can be skewed by a small number of highly cited papers, which produce the largest area under the publication–citation curve. In contrast, the m-score and i-score, which use logarithms or averages of these areas, respectively, help to minimize the influence of outliers. For example, although White has fewer publications (69 vs. 74) and fewer citations (3150 vs. 3412) than Belkin, White’s rec-index is higher (1242 vs. 1212). However, his m-score (1.9466 vs. 2.163) and i-score (449.12 vs. 555.1) are both lower than Belkin’s. This inconsistency arises because White’s two most highly cited papers have received 726 and 621 citations, respectively, creating a larger area under his publication–citation curve. While Belkin has more papers (11 vs. 7) with 100 or more citations than White, his two most cited papers received only 671 and 466 citations. Belkin’s rec-index of 1212 comes from 15 papers that were cited at least 15 times, which is lower than White’s rec-index, driven by just two papers with more than 621 citations each.

4.3. Correlation test

The last empirical study compared the i-score with other common bibliometric indicators used to assess research productivity and quality, focusing on their correlations. The analysis included researchers who had a unique author ID from Scopus, along with their publication records. Data were retrieved from Scopus in March 2023, and we collected all publications and citations recorded up to February 2023. To reduce the impact of outliers, only authors with at least 30 publications were considered. In total, data from 1,059,309 authors, as well as their following bibliometric indicators, were analysed. Both Pearson and Spearman correlation tests were conducted to evaluate the relationships among these indicators.

Total number of publications (TP)

Total number of citations received (TC)

The h-index (h)

The citation rate (CR)

Citation distribution index (CDI)

The i-score (i)

The CR is the average number of citations a researcher’s publications receive over the whole period. The CDI, previously referred to as the “Relative Integration Score”) measures the overall shape of a citation distribution by using the relative citation levels of papers across all 10 deciles. A higher CDI means that an entity has fewer papers in the low-citation deciles or more papers in the high-citation deciles. By definition, the world average CDI is 0. In theory, CDI scores range from −50 (if all papers fall into the lowest citation decile) to +50 (if all papers fall into the highest decile) [74].

As shown in Table 8, the i-score demonstrates strong linear relationships with two fundamental indicators of research output: TP and TC. Its Pearson correlations with TP (0.8090) and TC (0.9225) are both higher than those of the h-index, which reports values of 0.7609 and 0.8207, respectively. These results indicate that the i-score increases in a stable and predictable manner as researchers produce more work and accumulate more citations. In other words, the i-score reflects both research productivity and cumulative citation impact more closely than the h-index.

Table 8.

Pearson correlation coefficient among bibliometric indicators.

	CDI	CR	h	i	TC	TP
CDI		0.4901	0.5532	0.3887	0.3800	0.2000
CR	0.4901		0.4299	0.4030	0.5325	0.1027
h	0.5532	0.4299		0.8600	0.8207	0.7609
i	0.3887	0.4030	0.8600		0.9225	0.8090
TC	0.3800	0.5325	0.8207	0.9225		0.7402
TP	0.2000	0.1027	0.7609	0.8090	0.7402

In contrast, the i-score shows weaker correlations with relative-impact indicators such as CR and the CDI. This pattern is expected. Measures of relative impact divide citations by the number of publications; thus, researchers with many publications may achieve a strong overall impact (high TC) but moderate relative citation levels. The i-score follows this logic as it is designed to capture the cumulative contribution of a researcher’s full publication and citation record, not relative citation efficiency. Therefore, its weaker correlations with CR and CDI do not indicate inconsistency; rather, they reflect the different conceptual focus of the indicator.

Table 9 presents the Spearman correlations among the same set of indicators. Here, the i-score continues to align closely with the rankings produced by established metrics. Its rank correlations with CDI (0.6711), CR (0.7983), and especially TC (0.9679) are all higher than the corresponding Pearson correlations in Table 8. This pattern indicates that while these measures capture different aspects of research performance, the i-score produces researcher rankings that are broadly consistent with widely used bibliometric indicators.

Table 9.

Spearman correlation coefficient among bibliometric indicators.

	CDI	CR	h	i	TC	TP
CDI		0.7629	0.6373	0.6711	0.6360	0.1506
CR	0.7629		0.7413	0.7983	0.8422	0.1573
H	0.6373	0.7413		0.9756	0.9513	0.6592
I	0.6711	0.7983	0.9756		0.9679	0.5914
TC	0.6360	0.8422	0.9513	0.9679		0.5843
TP	0.1506	0.1573	0.6592	0.5914	0.5843

The only exception occurs with the TP, where the i-score shows a lower Spearman correlation (0.5914) than that of the h-index (0.6592). This can be explained by how both the data and the h-index behave. As many researchers have the same number of publications, this creates many ties in the rankings. The h-index, due to its well-known insensitivity, varies less and ends up matching the tied publication ranks more closely, resulting in a higher Spearman correlation with TP. In contrast, the i-score incorporates citation distribution and responds to meaningful citation growth, producing more differentiated rankings among researchers with identical publication counts. This leads to a weaker Spearman correlation, which is an expected and theoretically consistent outcome given its design as a more sensitive and discriminating performance indicator.

Overall, the results from Tables 8 and 9 show that the i-score behaves in a stable way with respect to core dimensions of productivity and cumulative impact. Importantly, the fact that the h-index also shows strong correlations does not weaken the reliability of the i-score. Rather, it indicates that the i-score aligns with well-established measures while offering a more balanced and nuanced representation of a researcher’s overall contribution. These properties collectively support the conclusion that the i-score is a dependable and informative indicator of research performance.

5. Discussion

While the i-score addresses key shortcomings of the h-index, such as inconsistency, insensitiveness, and the degree of discrimination, it shares similar limitations with the h-index. These include counting methods (size dependence), disciplinary bias (field dependence), career index (age dependence), and database coverage. Such limitations are common to bibliometric indicators based on publication and citation data and, therefore, cannot be fully eliminated. However, the geometric design of the i-score allows these limitations to be addressed through explicit mitigation strategies without redefining the indicator itself.

5.1. Counting (size dependence)

As co-authorship becomes more common, accurately measuring each co-author’s contribution presents a significant challenge in bibliometric analysis [75]. Size dependence occurs when researchers with many publications or those involved in large collaborative teams obtain systematically higher indicator values. Because the i-score is constructed using publication counts and citation frequencies based on full counting, it is affected by this issue. Nevertheless, several practical strategies can be applied to reduce size-related inflation.

First, size effects associated with large collaborative teams can be reduced by restricting the input data to publications in which the researcher holds a key authorship role, such as first author or corresponding author. This approach focuses the indicator on contributions that more directly reflect leadership or significant contributions in research. Second, fractional counting of publications and citations can be applied during data preparation, where credit is divided among co-authors before constructing the publication–citation curve. For example, if a paper is assigned a weight of 1/5 under fractional counting, it is counted as 1/5 of a publication rather than one full publication when calculating P. Likewise, each citation is counted as 1/5 of a citation when calculating C. Third, size effects can also be reduced by normalizing the i-score by publication volume, for example by dividing the raw i-score by a function of the TP.

5.2. Disciplinary bias (field dependence)

A well-known limitation of all citation-based indicators is their sensitivity to disciplinary differences. Publishing and citation practices vary widely across fields: some disciplines are characterized by rapid publication cycles, large collaborative teams, and high-citation density (e.g. biomedicine and physics), while others produce fewer outputs, have slower citation accrual, and lower citation density (e.g. humanities and mathematics) [75 –77]. When raw publication and citation counts are used, the i-score is field-dependent, and direct comparisons across disciplines are, therefore, inappropriate. This limitation reflects structural differences in scholarly communication systems rather than a weakness specific to the i-score.

Field effects can be reduced by normalizing the i-score within each discipline. This can be performed by replacing raw publication and citation counts with field-normalized publication and citation scores before constructing the publication–citation curve. For example, consider two disciplines, A and B. The average number of publications per author and the average number of citations per paper are both 5 in discipline A, while the average number of publications per author is 20 and the average number of citations per paper is 25 in discipline B. If we set the average number of publications and the average number of citations in each discipline to a score of 100, each paper receives a publication score of 20, and each citation receives a citation score of 20 in discipline A while each paper receives a publication score of 5, and each citation receives a citation score of 4 in discipline B. By using these publication scores and citation scores to calculate P and C, differences in publication volume and citation frequency between disciplines A and B are reduced. Alternatively, i-scores can also be rescaled relative to discipline averages to obtain a field-normalized i-score. These approaches allow cross-disciplinary comparisons while preserving the geometric meaning of the indicator.

5.3. Career index (age dependence)

Another limitation common to citation-based indicators is their dependence on career length. Because both publications and citations accumulate over time, senior researchers will almost always have higher i-scores than early-career researchers, independent of relative research quality or influence. This makes direct comparisons across career stages problematic.

To address this, bibliometric studies often restrict the analysis to specific time windows, such as a 5- or 10-year publication window. These adjustments can also be applied to the i-score, thereby allowing a more balanced evaluation of scholars at different career stages. Age-related effects can also be reduced by introducing temporal weighting. In this approach, recent publications and citations are given more weight, while older ones receive progressively lower weight before constructing the publication–citation curve. This allows the i-score to better reflect recent research activity while still considering past contributions.

5.4. Database coverage

According to Okubo [75], bibliometric studies usually begin by selecting the databases that best represent the target population. The validity of the i-score, like that of any bibliometric indicators, is constrained by the coverage of the database used. Different bibliometric databases, such as Web of Science, Scopus, and Google Scholar, differ substantially in scope, indexing techniques, and coverage. As a result, the same researcher may have different i-scores depending on the database chosen. This variation complicates comparisons across institutions or countries that rely on different data sources.

To ensure a fair and consistent comparison, it is recommended that evaluators specify the database used and apply the i-score within the same data set. However, even with consistency, the problem of incomplete or uneven coverage persists as fields such as computer science (conference papers) or the humanities (books and monographs) may be systematically underrepresented. Therefore, while the i-score can provide meaningful insights within a given data set, it cannot fully overcome the biases introduced by database limitations.

5.5. Extreme cases

Extreme cases, such as researchers with one very highly cited paper or many marginally cited publications, present challenges for many bibliometric indicators because they reduce discriminatory power. The h-index, which is based on the h-core of the publication–citation curve, tends to undervalue researchers with a small number of extremely highly cited publications or many uncited works. In contrast, indicators based on areas under the publication–citation curve (e.g. the i-score, rec-index, and m-score) may be influenced by such extreme cases.

Because the i-score uses an intermediate approach by averaging the areas of all rectangles under the publication–citation curve, the influence of extreme cases is reduced. Some additional adjustments can further limit their impact. For example, upper bounds can be applied to publication or citation thresholds, or diminishing weights can be assigned to rectangles with very high publication or citation counts. These adjustments preserve the basic structure of the i-score while reducing sensitivity to extreme values.

5.6. Implications

The mitigation strategies discussed above clarify the intended role of the i-score in research evaluation and help explain its contribution compared with existing indicators, especially the h-index. Although the i-score shares several structural limitations with other citation-based measures, its geometric design allows these limitations to be addressed in a systematic and transparent way. By using information from the entire publication–citation curve, the i-score provides better discrimination and avoids some formal problems of the h-index, while remaining adaptable to different evaluation contexts. In addition, the use of normalized or fractional-weighted values instead of raw publication or citation counts offers more opportunities to reduce size, field, and age effects. In contrast, the h-index relies on a single threshold and is, therefore, less flexible for normalization by authorship role, discipline, or time.

For these reasons, the i-score should not be viewed as a replacement for the h-index, but as a complementary indicator. When used with appropriate normalization and together with other quantitative and qualitative measures, the i-score can provide a more detailed and balanced representation of research output and impact than the h-index alone.

6. Conclusion

This study introduces the i-score, a new geometric indicator inspired by Hirsch’s concept of using a single number to measure research output. Unlike the h-index, the i-score utilizes the area under the publication–citation curve to provide a more nuanced measurement. Initial tests demonstrate that the i-score overcomes key deficiencies of the h-index, such as inconsistency, insensitivity, and lack of discrimination, by offering a more consistent and accurate assessment that considers both productivity and quality. In the final test, the i-score is compared with other bibliometric indicators using a larger data set, further validating its accuracy and reliability.

The study reveals that the i-score provides deeper insights into the relationship between research productivity and quality compared with the h-index and other geometric indicators like the m-score and rec-index, which are more limited in this regard. This research makes a valuable contribution to bibliometric evaluation, and future studies are encouraged to further validate the i-score and explore its potential applications. Also, future research could further examine and validate the mitigation strategies discussed in this study. Empirical tests could explore how authorship-based counting, field normalization, and temporal weighting affect the i-score and its sensitivity to size, disciplinary, and age-related effects. Applying these strategies across different data sets, disciplines, and career stages would help assess their effectiveness and clarify the conditions under which the i-score provides the most reliable evaluation results.

Footnotes

ORCID iD

Fei Shu

Funding

The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The study is supported by the National Science Foundation of China (# 72274048).

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Hirsch

. An index to quantify an individual’s scientific research output. Proc Natl Acad Sci U S A 2005; 102: 16569–16572. https://doi.org/10.1073/pnas.0507655102

Bornmann

. h-index research in scientometrics: a summary. J Informetr 2014; 8: 749–750. https://doi.org/10.1016/j.joi.2014.07.004

Franceschini

Maisano

. Analysis of the Hirsch index’s operational properties. Eur J Oper Res 2010; 203: 494–504.

Alonso

Cabrerizo

Herrera-Viedma

, et al. h-Index: a review focused in its variants, computation and standardization for different scientific fields. J Informetr 2009; 3: 273–289.

Bornmann

Daniel

. What do we know about the h index? J Am Soc Inf Sci Technol 2007; 58: 1381–1385.

Bornmann

Daniel

. The state of h index research: is the h index the ideal way to measure research performance? EMBO Rep 2009; 10: 2–6.

Egghe

. The Hirsch index and related impact measures. Annu Rev Inf Sci Technol 2010; 44: 65–114.

Norris

Oppenheim

. The h-index: a broad review of a new bibliometric indicator. J Doc 2010; 66: 681–705.

Panaretos

Malesios

. Assessing scientific research performance and impact with single indices. Scientometrics 2009; 81: 635–670.

10.

Thompson

Callen

Nahata

. New indices in scholarship assessment. Am J Pharm Educ 2009; 73.

11.

Costas

Bordons

. The h-index: advantages, limitations and its relation with other bibliometric indicators at the micro level. J Informetr 2007; 1: 193–203.

12.

Waltman

van Eck

. The inconsistency of the h-index. J Am Soc Inf Sci Technol 2012; 63: 406–415.

13.

Bornmann

Mutz

Daniel

. Are there better indices for evaluation purposes than the h index? A comparison of nine different variants of the h index using data from biomedicine. J Am Soc Inf Sci Technol 2008; 59: 830–837.

14.

Waltman

. A review of the literature on citation impact indicators. J Informetr 2016; 10: 365–391.

15.

Vanclay

. On the robustness of the h-index. J Am Soc Inf Sci Technol 2007; 58: 1547–1550.

16.

Ding

Liu

Kandonga

. Exploring the limitations of the h-index and h-type indexes in measuring the research performance of authors. Scientometrics 2020; 122: 1303–1322. https://doi.org/10.1007/s11192-020-03364-1

17.

Egghe

Rousseau

. The h-index formalism. Scientometrics 2020; 126: 6137–6145. https://doi.org/10.1007/s11192-020-03699-9

18.

Ghani

Qayyum

Afzal

, et al. Comprehensive evaluation of h-index and its extensions in the domain of mathematics. Scientometrics 2019; 118: 809–822. https://doi.org/10.1007/s11192-019-03007-0

19.

Wang

, et al. Which h-index? An exploration within the web of science. Scientometrics 2020; 123: 1225–1233.

20.

Alonso

Cabrerizo

Herrera-Viedma

, et al. hg-index: a new index to characterize the scientific output of researchers based on the h- and g-indices. Scientometrics 2010; 82: 391–400. https://doi.org/10.1007/s11192-009-0047-5

21.

Egghe

. Theory and practise of the g-index. Scientometrics 2006; 69: 131–152.

22.

Prathap

. Is there a place for a mock h-index? Scientometrics 2010; 84: 153–165.

23.

Purvis

. The h index: playing the numbers game. Trends Ecol Evol 2006; 21: 422.

24.

Van Raan

. Comparison of the Hirsch-index with standard bibliometric indicators and with peer judgment for 147 chemistry research groups. Scientometrics 2006; 67: 491–502.

25.

Zhivotovsky

Krutovsky

. Self-citation can inflate h-index. Scientometrics 2008; 77: 373–375.

26.

Kelly

Jennions

. The h index and career assessment by numbers. Trends Ecol Evol 2006; 21: 167–170.

27.

van Eck

Waltman

. Generalizing the h- and g-indices. J Informetr 2008; 2: 263–271. https://doi.org/10.1016/j.joi.2008.09.004

28.

Shu

. M-score: an indicator quantifying individual’s scientific research output. In: Atanassova

Bertin

Mayr

(eds) 16th International conference on Scientometrics & Informetrics. International Society for Scientometrics & Informetrics (ISSI), 2017, pp. 460–465.

29.

Levene

Harris

Fenner

. A two-dimensional bibliometric index reflecting both quality and quantity. Scientometrics 2020; 123: 1235–1246. https://doi.org/10.1007/s11192-020-03454-0

30.

Jin

Liang

Rousseau

, et al. The R- and AR-indices: complementing the h-index. Chin Sci Bull 2007; 52: 855–863.

31.

Kosmulski

. A new Hirsch-type index saves time and works equally well as the original h-index. ISSI Newsl 2006; 2: 4–6.

32.

Zhang

C-T

. The e-index, complementing the h-index for excess citations. PLoS ONE 2009; 4: e5429.

33.

Tol

RSJ

. The h-index and its alternatives: an application to the 100 most prolific economists. Scientometrics 2009; 80: 317–324. https://doi.org/10.1007/s11192-008-2079-7

34.

Egghe

Rousseau

. An h-index weighted by citation impact. Inf Process Manag 2008; 44: 770–780. https://doi.org/10.1016/j.ipm.2007.05.003

35.

Antonakis

Lalive

. Quantifying scholarly impact: IQp versus the Hirsch h. J Am Soc Inf Sci Technol 2008; 59: 956–969. https://doi.org/10.1002/asi.20802

36.

Kosmulski

. MAXPROD—A new index for assessment of the scientific output of an individual, and a comparison with the h-index. Cybermetrics 2007; 5.

37.

Vinkler

. The π-index: a new indicator for assessing scientific impact. J Inf Sci 2009; 35: 602–612.

38.

Cabrerizo

Alonso

Herrera-Viedma

, et al. q2-Index: quantitative and qualitative evaluation based on the number and impact of papers in the Hirsch core. J Informetr 2010; 4: 23–28.

39.

Anderson

Hankin

Killworth

. Beyond the Durfee square: enhancing the h-index to score total publication output. Scientometrics 2008; 76: 577–588.

40.

. The w-index: a measure to assess scientific impact by focusing on widely cited papers. J Am Soc Inf Sci Technol 2010; 61: 609–614.

41.

García-Pérez

. A multidimensional extension to Hirsch’s h-index. Scientometrics 2009; 81: 779. https://doi.org/10.1007/s11192-009-2290-1

42.

Batista

Campiteli

Kinouchi

. Is it possible to compare researchers with different scientific interests? Scientometrics 2006; 68: 179–189. https://doi.org/10.1007/s11192-006-0090-4

43.

Sidiropoulos

Katsaros

Manolopoulos

. Generalized Hirsch h-index for disclosing latent facts in citation networks. Scientometrics 2007; 72: 253–280.

44.

Fassin

. The HF-rating as a universal complement to the h-index. Scientometrics 2020; 125: 965–990. https://doi.org/10.1007/s11192-020-03611-5

45.

Namazi

Fallahzadeh

. N-index: a novel and easily-calculable parameter for comparison of researchers working in different scientific fields. Indian J Dermatol Venereol Leprol 2010; 76: 229.

46.

Chai

Hua

Rousseau

, et al. The adapted pure h-index. In: Kretschmer

Havemann

(eds). Proceedings of WIS 2008, July 29-August 1, 2008, Berlin, Germany. Humboldt-Universität zu Berlin, Institute for Library and Information Science (IBI); 2008.

47.

Valentinuzzi

Laciar

Atrio

. Two new discipline-independent indices to quantify individual’s scientific research output. J Phys Conf Ser 2007; 90: 012018.

48.

Brown

. A simple method for excluding self-citation from the h-index: the b-index. Online Information Review 2009; 33: 1129–1136.

49.

Schubert

Korn

Telcs

. Hirsch-type indices for characterizing networks. Scientometrics 2009; 78: 375–382.

50.

Opthof

Wilde

. The Hirsch-index: a simple, new tool for the assessment of scientific output of individual scientists. Neth Heart J 2009; 17: 145–154.

51.

Hirsch

. An index to quantify an individual’s scientific research output that takes into account the effect of multiple coauthorship. Scientometrics 2010; 85: 741–754.

52.

Hirsch

. h_α: an index to quantify an individual’s scientific leadership. Scientometrics 2019; 118: 673–686. https://doi.org/10.1007/s11192-018-2994-1

53.

Schreiber

. A modification of the h-index: the hm-index accounts for multi-authored manuscripts. J Informetr 2008; 2: 211–216.

54.

Schreiber

. Self-citation corrections for the Hirsch index. Europhys Lett 2007; 78: 30002.

55.

Kaptay

. The k-index is introduced to replace the h-index to evaluate better the scientific excellence of individuals. Heliyon 2020; 6: e04415. https://doi.org/10.1016/j.heliyon.2020.e04415

56.

Rousseau

Chen

. In those fields where multiple authorship is the rule, the h-index should be supplemented by role-based h-indices. J Inf Sci 2010; 36: 73–85.

57.

Abbasi

Altmann

Hwang

. Evaluating scholars based on their academic collaboration activities: two indices, the RC-index and the CC-index, for quantifying collaboration activities of researchers and scientific communities. Scientometrics 2010; 83: 1–13. https://doi.org/10.1007/s11192-009-0139-2

58.

Lee

Kraus

Couldwell

. Use of the h index in neurosurgery. J Neurosurg 2009; 111: 387–392.

59.

Zhang

. A proposal for calculating weighted citations based on author rank. EMBO Rep 2009; 10: 416–417.

60.

Bornmann

Daniel

H-D

. The citation speed index: a useful bibliometric indicator to add to the h index. J Informetr 2010; 4: 444–446.

61.

Rousseau

. A proposal for a dynamic h-type index. J Am Soc Inf Sci Technol 2008; 59: 1853–1855.

62.

Rousseau

. Probing the h-core: an investigation of the tail–core ratio for rank distributions. Scientometrics 2010; 84: 431–439. https://doi.org/10.1007/s11192-009-0099-6

63.

Burrell

. Hirsch index or Hirsch rate? Some thoughts arising from Liang’s data. Scientometrics 2007; 73: 19–28. https://doi.org/10.1007/s11192-006-1774-5

64.

Kosmulski

. New seniority-independent Hirsch-type index. J Informetr 2009; 3: 341–347. https://doi.org/10.1016/j.joi.2009.05.003

65.

De Visscher

. An index to measure a scientist’s specific impact. J Am Soc Inf Sci Technol 2010; 61: 319–328. https://doi.org/10.1002/asi.21240

66.

Molinari

J-F

. Mathematical aspects of a new criterion for ranking scientific institutions based on the h-index. Scientometrics 2008; 75: 339–356.

67.

Ruane

Tol

RSJ

. Rational (successive) h-indices: an application to economics in the Republic of Ireland. Scientometrics 2008; 75: 395–405. https://doi.org/10.1007/s11192-007-1869-7

68.

Sypsa

Hatzakis

. Assessing the impact of biomedical research in academic institutions of disparate sizes. BMC Med Res Methodol 2009; 9: 33.

69.

Mitra

. Hirsch-type indices for ranking institutions scientific research output. Curr Sci 2006; 91: 1439.

70.

Riikonen

Vihinen

. National research contributions: a case study on Finnish biomedical research. Scientometrics 2008; 77: 207. https://doi.org/10.1007/s11192-007-1962-y

71.

Wohlin

. A new index for the citation curve of researchers. Scientometrics 2009; 81: 521. https://doi.org/10.1007/s11192-008-2155-z

72.

Levene

Fenner

Bar-Ilan

. Characterisation of the $\chi$-index and the rec-index. Scientometrics 2019; 120: 885–896. https://doi.org/10.1007/s11192-019-03151-7

73.

Cronin

Meho

. Using the h-index to rank influential information scientists. J Am Soc Inf Sci Technol 2006; 57: 1275–1278.

74.

Campbell

Tippett

Côté

, et al. (eds). An approach for the condensed presentation of intuitive citation impact metrics which remain reliable with very few publications. In: 21st international conference on science and technology indicators-STI 2016 book of proceedings. Editorial Universitat Politecnica de Valencia, 2016, pp. 12.

75.

Okubo

. Bibliometric indicators and analysis of research systems methods and examples. OECD Publishing, 1997.

76.

Glänzel

. Bibliometrics as a research field: a course on theory and application of bibliometric indicators. Universidade Federal De Pernambuco, 2003.

77.

Larivière

Archambault

Gingras

, et al. The place of serials in referencing practices: comparing natural sciences and engineering with social sciences and humanities. J Am Soc Inf Sci Technol 2006; 57: 997.

Rank (by citation)	C value	P value	Rec (C×P)
1	70	1	70
2	18	2
3	18	3	54
4	8	4	32
5	5	5
6	5	6
7	5	7
8	5	8	40
9	4	9
10	4	10	40
11	2	11	22
12	1	12	12
13	0	13
14	0	14
15	0	15

Rank (by citation)	C value	P value	Rec (C×P)
1	70	1	70
2	18	2
3	18	3	54
4	8	4	32
5	5	5
6	5	6
7	5	7
8	5	8	40
9	4	9
10	4	10	40
11	2	11	22
12	1	12	12
13	0	13
14	0	14
15	0	15

Rank (by citation)	C value	P value	Rec (C×P)
1	70	1	70
2	18	2
3	18	3	54
4	8	4	32
5	5	5
6	5	6
7	5	7
8	5	8	40
9	4	9
10	4	10	40
11	2	11	22
12	1	12	12
13	0	13
14	0	14
15	0	15