Abstract
Olof Hallonsten (2021) raised some very thought-provoking issues regarding the current evaluation regime in science. Like many of us, I share his concerns regarding the immense pressure the university is currently facing to demonstrate its utility (Mirowski, 2011). Much of this pressure is under the guise of economic and budgetary concerns, but there are obviously deeper ideological reasons at play (Bourdieu, 2004; Moore et al., 2011). Utility is after all an abstract concept that can be given any substantive definition (Kymlicka, 2002). In light of constant economic pressure and now the ecological crisis, the dominant discourse defines utility in terms of innovation and sustainable growth. This urgency has led some to advocate for a more intimate relationship between the university and the market, much to the dismay of others who see this analogously as merchants desecrating the temple. The dominant discourse is contingent, however, as neoliberal ideology would recommend a closer relationship between the university and the market regardless of global warming or any other crisis (Slaughter et al., 2004; Mirowski, 2013). Setting those crises aside, independent of all of this, technological development in computer science has created new possibilities for tracking and evaluating knowledge (Van Noorden, 2010). All of these factors converge towards essentially one issue: should science be further rationalized and if so, to what extent and how? Hallonsten (2021) criticizes what appears to be the growing consensus, or perhaps implicit consent, in society that science should be further rationalized through exogenous interference.
Despite the provocative tone, Hallonsten (2021) ultimately supports a middle of the road approach, as he repeats Linda Butler’s plea for sanity. As quantitative performance measures were being introduced to Australia and the United Kingdom, Butler (2007) promoted a balanced approach, arguing that although metrics had their place, for instance by making the process more efficient and cost effective, qualitative peer review should remain the keystone in research evaluation. This implies that both extremes should be avoided: i) the categorical refusal of any metric for any purpose; and ii) the dystopia where peer-review is abolished in favor of an algorithm. Whether we like them or not, metrics are here to stay, but it is up to us how we use them in our practice and how we tolerate others to use them without our blessings. I think the balanced approach is ultimately the one to follow, but this is easier said than done, as it necessitates a concerted effort to simultaneously use, develop and criticize them. Opinions are bound to vary in both the choice of metrics and how they are used for scientific and administrative purposes. Supporting Butler (2007), Hallonsten (2021) contributes to this discussion by making his own plea that historical evidence should be taken into consideration when evaluating the productivity of science. Basically, he is arguing that science has been immensely productive well before performance benchmarks were ever conceived – so productive in fact that he promotes shifting the burden of proof over to those who dare claim otherwise.
The perverse effects of the current benchmarks
Hallonsten (2021) argues that much of the current practice in quantitative performance evaluation is pointless and counterproductive. While the former predicate adjective is needlessly provocative, there is already a growing literature criticizing the unbalanced usage of metrics in the evaluation of science (Gendron, 2008; Adler and Harzing, 2009; Espeland and Sauder, 2009, 2016). Many of these negative effects may be labeled unintended consequences or perverse effects (see Merton, 1936; Boudon, 2016). For the sake of brevity, I shall limit myself to highlight only a few of these, categorized in three broad themes: i) the modality of scientific research; ii) power and research; iii) the peer relationship.
Firstly, the current evaluation regime can actually foster conformity with benchmarks that are counter-productive to originality, scientific rigour and ethical conduct. For instance, it can incentivize researchers to ignore worthwhile fields or issues that demonstrate less
Secondly, the fetishization of the citation index (or other similar metrics) can reinforce practices that are more akin to power and prestige-seeking than actual scientific progress (Hazelkorn, 2015). While ideally scientists should adhere to reason in their quest for
Finally, I believe that the current evaluation regime cultivates animosity, resentment and disdain between colleagues (i.e., beyond the unavoidable minimum). It fuels this between those who
Concrete actions towards a more diverse and just evaluation regime
Regarding accusations that the university suffers from a want of productivity, Hallonsten’s (2021) recommendation to reverse the burden of proof is bold. That being said, brilliant as it is as a counter-rhetoric, public officials and university administrators are not likely to be particularly impressed given the immense pressure they are under to rationalize public services for the sake of a narrow and ideological kind of ‘efficiency’ (e.g. new public management, see Chandler et al., 2002; Lorenz, 2012). Reversing the burden of proof is certainly a worthy long-time goal, but it cannot be accomplished overnight. More concrete actions are needed in the short-term. Let us discuss what this might entail.
Firstly, I believe that a plea for intellectual diversity is in order. There is much discussion at the moment towards inclusion and diversity in the social sense – discussions which are long overdue. However, there is also another kind of diversity that requires attention: the recognition that there exist different types of scholars and worthwhile contributions. The administration can evaluate us all they want, they can create hundreds of metrics, etc., but we need to stand firm in the recognition of different career paths. If you regularly apply for funding – good. If you only apply from time to time when a project is ready and well-developed – just as good. If you never apply and
Secondly, I believe that researchers should be more involved in the field of science studies (Latour, 1999; Latour and Woolgar, 2013) and other key disciplinary fields whose work is of general scientific interest, such as the sociology of valuation (Cetina-Knorr, 2009; Lamont, 2012; Helgesson and Muniesa, 2013). While most of us are not actively involved in these scientific fields, we are all concerned given our scientific practice. Such fields are more necessary than ever to address the profusion of performance metrics that are already here (Van Noorden, 2010) and those that are to come. We should keep in mind that new measures are regularly created, and they will inevitably be used in unintended ways. This was the case for the Journal Impact Factor which was originally developed to guide libraries in their purchasing and indexing decisions (see Garfield, 1963; McKiernan et al., 2019). They are now often used to some capacity in the review, promotion and tenure (RPT) process (Abbott et al., 2010). In many respects, most researchers have no idea that Pandora’s box has already been opened. Our only hope is to play a greater part in the conversation and to apply critical thinking in how we choose to use them in our individual and collective practice. The balanced approach means encouraging attempt to develop ‘responsible’ metrics and practices (see Wilsdon et al, 2017). While much work remains to be done, I would argue that most are also currently unaware of the considerable effort to criticize and modify the current metrics (Sauder and Espeland, 2006). For instance, Emilio Ferrara and Alfonso Romero (2013) have suggested ways to mitigate the self-citation bias in the
Finally, as a corollary to the previous point, much is to be gained by applying the comparative approach to the meta-evaluation inherent in science studies. This approach would provide the reflexive insights needed to develop a truly informed and balanced approach. It would highlight how evaluation practices differ between disciplines, universities and countries, and it would help identify the implicit values behind all evaluative methods and their consequences. In some ways, I am less concerned by the practices themselves as much as the intentions and values behind them. As Baruch Spinoza would say, nothing is inherently bad in and of itself. It rather depends on how it is used. In short, while we do not have the power to stop the ‘evaluation frenzy’ (Hallonsten, 2021: 14), we do possess the capacity of evaluating their evaluative practices in return. Accountability goes both ways: evaluate not, lest ye be evaluated. This comparative meta-evaluation might even inspire new measures. Why not create a whole series of metrics for important issues in science which are being swept under the rug? Measures could be created to track and compare the level of government support for science. Other potential benchmarks: i) the percentage of GDP spent on higher education; ii) the ratio of tenure track positions to enrolled students; iii) an index of academic freedom. There will be purists who refuse to play this game, arguing that creating alternative measures serves to legitimize the quantification of performance. But it seems to me that a pragmatic approach should be adopted. One is almost tempted to formulate a law for this modern age:
