Abstract

While the UK Research Assessment Exercise (RAE) has generated many critics – indeed, one sometimes wonders if it has any advocates within the universities – few disagree that it has been instrumental in raising research standards across the university sector. The 1996 RAE, the first large-scale evaluation exercise, classified 43 per cent of the research that it evaluated as of ‘national’ or ‘international’ standard; the same figure after the 2001 RAE was 63 per cent. Regular visitors to British universities over the past two decades (including both the current authors) have observed a steady improvement in standards and professionalism, in attendance at major national and international conferences and in publications in the major journals. Whether such progress could have been achieved in the absence of the RAE is debatable.
The RAE process has not come without a cost. Most departments have made meeting the RAE's goals the centrepiece of their research plans and hiring strategies over an extended period. Most academic staff have found that it has placed an added pressure on their professional lives; in this issue Andrew Russell (2009) talks about the ‘the intolerable pressure’ of the RAE and notes ‘the time-consuming and resource-hungry’ nature of the whole process (p. 63). Not least, rising academic standards induced by greater efforts have generated the expectation of more resources, an expectation that has so far remained unfulfilled.
Research evaluation is here to stay, but how do we make the methodology more transparent, the outcome more accurate and the whole process less burdensome for those within its purview? Our original article aimed to test the proposition that a portfolio of quantitative indicators, with citations at its core, could replace the peer-review process employed in the 2001 RAE for political science. The four commentaries on our approach rehearse a variety of criticisms. Ron Johnston takes the view that it is impossible to quantify anything associated with research quality and that peer review remains the only way forward. Andrew Russell and Claire Donovan are mainly concerned with measurement issues, as well as (especially in Russell's case) the costs involved in implementing a metrics approach. Finally, Albert Weale, in the most thoughtful of the four contributions, highlights a range of conceptual and empirical issues. We group the major criticisms under three headings – the reliability of citations, measurement, and implementation – and address each of them in turn.
Are Citations Reliable?
At the heart of most criticisms of a bibliometric approach to measuring research quality is the view that citations are inherently biased and can only provide, at best, a partial view. The logical extension of this argument, expressed most bluntly by Johnston (2009), is that only peer review can deal with the complexities of evaluating research quality. The other three commentators are more familiar with bibliometrics and have raised several points about our analysis.
Donovan (2009) questions the citation window that we use in our analysis. She is correct in saying that as we based our analysis on citation counts at the end of 2006, the temporal fit with the 2001 RAE is not exact. However, work by Thed Van Leeuwen (2006) on the social sciences shows clearly that if there are reasonable levels of aggregation (as the RAE analysis definitely shows) then research that is highly cited very early on will be highly cited later as well. It is therefore unlikely that rerunning the analysis with a shorter citation window would significantly alter our results. Of course, there will always be some individual papers that will buck the trend, but these have been shown to be rare (Van Raan, 2004). Donovan also questions how we treated co-authored publications. Where a publication was co-authored by researchers from different institutions, both were fully credited with the citations from the publication if both put it forward for assessment.
Both Donovan and Russell comment on the restrictive nature of journals covered in the ISI Web of Science database and, by implication, the problems of using citation analysis for non-STEM disciplines. This is a valid point, but as we explain in our article, we covered all publications entered in the 2001 RAE, whether they appeared in books, ISI-indexed journals or in some other publication. Russell is correct that our citation counts included only those journals listed in ISI, but would an expanded list including more marginal journals really change our results? We think not, but this is – as they say – an empirical question. In addition, Donovan cannot make the comment that target-expanded metrics (i.e. analysis that incorporates citations to books, chapters, etc.) will not be used in future exercises. As more research output is placed in institutional repositories, and data analysis increases (even more) in sophistication, it may well become feasible and cost-effective to analyse not only citations in indexed journals, but also citations to books and book chapters.
Questions of Measurement
Johnston and Weale comment on our simulation exercise, when we estimate the RAE rank that a department would have received, based on our portfolio of indicators – minus having a department member on the RAE panel. Johnston views this simulation exercise as inadequate as it is not a good match to the RAE scores. We take a different view. First, only one department moved by more than one point (and that was the only department awarded a 2) and this indicates to us relative stability in our model. Second, only one-third of the departments changed their grade; Johnston regards this as a large number; we would disagree. Third, can Johnston really claim that the peer review panels always ‘got it right'? The problem is that there is no absolute measure against which to judge either peer review or metrics. Weale (2009) makes a related point by criticising us for characterising peer review as ‘subjective’ and metrics as ‘objective’ this is a fair point that we acknowledge.
Several of the comments concern our methodology. Both Johnston and Russell criticise the use of OLS regression techniques to estimate our model, and Weale notes the non-linearities in our dependent variable (RAE rank). These are, of course, reasonable points. However, we did conduct the analyses using logistic regression techniques (as we state in endnote 17 to our article) which is more appropriate for our dependent variable and reached the same substantive results. We relied on OLS regression in the published article because it simplified the simulation exercise in predicting department grades utilising a metrics model. In a related criticism, Weale queries our procedure of rounding the regression estimates to arrive at predicted RAE grades. This seems to us a simple, objective and transparent method; one wonders how the RAE panel handled cases where departments straddled two grades. Did the panel round up or down in cases of uncertainty, and on what basis? Our data suggest that they erred on the side of generosity.
The Cost of a Metrics Evaluation
The cost of the 2001 RAE was huge – in time, resources and in the planning that went into it, not least by the departments themselves. In addition to putting forward a more objective evaluation method than peer review, our intention has been to produce a method that substantially reduces the overall cost of future research evaluation exercises. Russell queries whether the cost of our approach would indeed be less than full-blown peer review. We believe that a metrics approach would be much cheaper, easier and more efficient to implement. Russell is correct that some academic oversight would be required, but only in the planning and writing-up stages.
Both Russell and Donovan question the use of a metrics-only approach, or one solely based on citations. In fact, we suggest that a portfolio of measures should be used, including not only citations but research income, graduate student load and so on. And we do not suggest the elimination of peer review completely, but rather using it sparingly for planning, and for the dilemmas and difficult cases that will inevitably emerge. Simply mechanically feeding numbers into a funding formula without detailed scrutiny – not just in social sciences, but also in science – would wholly defeat the purpose of achieving a reliable and cost-efficient evaluation process.
Overall, the comments on our article bear out our original hypothesis – that a metrics approach with citations as its core is an achievable, reliable and efficient method of evaluating research quality. The commentaries by Donovan, Russell and Weale all raise constructive points, most of which could be incorporated into an eventual model that could be applied to non-STEM disciplines. If our work has generated a debate that can resolve these issues, then we will have achieved our major goal.
