Abstract
Oh, the joy and honor of having such a distinguished group of colleagues responding, with such eloquent and thoughtful pieces, to a call for debate on the basis of my provocative article ‘Stop evaluating science: A historical and sociological argument’. I of course personally never doubted the article’s relevance and timeliness, but I was somewhat distrustful of its abilities to make any difference. After all, one alleged main consequence of the current evaluation hysteria in science is that journals are nowadays flooded with manuscripts submitted mostly to advance the career(s) of the author(s), with less attention paid to the actual contributions they make. In that flood of publications, what difference can one article make? This is mostly outside of the author’s control. Therefore, I would like to commend and deeply thank the editors of this journal for their decision to let my article be the starting point for further debate. Not only was attention thus drawn to the article itself but, more importantly, it invited colleagues to counter and amend my arguments and analytical assertions, to the greater benefit of the topic and how it is handled in scholarly work.
A slightly more analytical reaction gives rise to two important inferences. Firstly, that colleagues are usually better at summarizing the key arguments of one’s work. Secondly, that colleagues are also better suited to spot the gaps, fill them, and build on them to make both supplemental and superior interpretations. Before highlighting the latter capacity of the responses, I will settle on the former and try to recapitulate the (intended) message of my article ‘Stop evaluating science’, using mostly excerpts from their responses.
The currently ubiquitous practice of evaluating science meant to assess its usefulness and productivity is essentially pointless, and this has several reasons. First of all, ‘evaluation is antagonistic’ to many of the key values of science, including innovation (D’Agostino and Malpass, in this issue). Second, it ‘inflates bureaucracy in unnecessary and counterproductive ways, wasting and misdirecting precious resources’ (Brighenti, in this issue). Thirdly, because the metrics used ‘are shallow, over-simplified and inaccurate’ (Knaapen, in this issue), they cannot capture the real meaning and benefit of science, since this is unavailable to most of us (Shinn and Marcovich, in this issue). ‘Nobody can be really sure about long-term consequences of any seemingly ‘useless’ research’ (Khomyakov, in this issue), and therefore we should hesitate to judge what’s good and bad science, other than on the basis of substantial historical evidence. Such evidence clearly shows that science ‘has been immensely productive well before performance benchmarks were ever conceived’ (Lizotte, in this issue), and therefore, it would be productive also ‘without shallow quantitative managerial devices’ (Hannud Abdo, in this issue). Another way of making the same argument is to say that ‘the basic aim of science is not to be ‘productive’ – whatever the exact meaning of that term’ (Gingras, in this issue). In this capacity, science is evidently successful, although this is a tough sell since ‘while reformers can draw on popular rhetoric of how science should operate, critics must wade into the murky waters of real scientific practice’ (Peterson and Panofsky, in this issue). Nonetheless, the ‘growing consensus, or perhaps implicit consent, in society that science should be further rationalized through exogenous interference’ (Lizotte) is fallacious. Hence the injunction in the title of the article: stop evaluating science.
Debate on the basis of this statement, and the arguments behind, are not only inspiring and useful but are probably the most responsible and productive way forward for the scholarly community. Although my article ‘Stop evaluating science’ was ‘deliberately pointed and provocative’ (Hallonsten, 2021a) and probably could be (mis)read as a closed statement of sweeping claims, with no interest of the author to hear counterarguments, it also ended with the note that ‘[s]erious debate [. . .] should ensue to handle this important matter’. Importantly, it is not only impossible for an author to anticipate all counterarguments to a historical-sociological argument but indeed ill-advised to try to do so. In no small part, the already expressed joy over the turnout of the call for debate is due to a personal conviction that no scientific publication, regardless of how comprehensively presented it is, marks the end of anything, except maybe (paraphrasing Winston Churchill) the end of the beginning of an inquiry.
Science (including social science) deserves to be understood as a
Three topics from the varied collection of responses are, in my view, particularly suitable for further discussion that hopefully can inspire and incentivize further valuable scientific work in this important area of study, in the spirit of the Popperian, Mertonian, and Habermasian theory of scientific knowledge briefly recapitulated above. To them, I have taken the liberty to add a fourth topic, which is not prevalent among the responses but which was part of the purpose of my original article to explore, and that I hope can be of some significance in the continuing debate.
Quantification
A key issue in the debate is whether the blame should be put on numbers and evaluations themselves or on their mere use in irresponsible ways. Some of the collected responses claim that ‘numbers are not
While I recognize the limitations of the frame of analysis provided by the functionalist sociology of science, and the conceptual tools it provides, I remain convinced that there is a point to be made about science’s inner logic being compromised or corrupted by quantitative performance evaluation, because this means the colonization of the sphere or institution of science by (parts of) the sphere or institution of economy (and bureaucracy). I have argued this in other recent publications (Hallonsten, 2021b), pointing at the evidence that suggests that the incentive structures in science become corrupt by evaluation and the governance and resource distribution schemes tied to it, something that several of the responses seem to agree to. For example, ‘overemphasis on these benchmarks can promote cronyism’ and ‘focusing only on the evaluated criteria will ultimately overshadow the core missions of learning and research’ (Lizotte); ‘measures replace the object they initially seek to measure’ because ‘concern for evaluation becomes more important than concern for the real phenomena evaluation is supposed to assess’ (Brighenti); and ‘auditable mechanisms of science evaluation invite gaming the system in pursuit of individual success and are highly prone to giving rise to distortions in perceptions and behavior’ (D’Agostino and Malpass). To this shall perhaps be added my own point that quantification has the consequence that accountability no longer means responsibility for conduct, but instead and only means capable to be counted, and so the only way science can responsibly carry out its mission in society (whatever this is), is by subjecting itself to performance evaluation with quantitative and thus oversimplified metrics. If this belief takes root, and I believe it has already among a significant share of those in power of governing science, it is of little use to try to accomplish a middle way and reclaim control over metrics and evaluation practices. Lizotte’s ‘law for this modern age’, that ‘
Introspection
Related to the previous topic is the contentiousness of my simplified dichotomization of ‘internal’ and ‘external’ evaluation. While most of my critique in ‘Stop evaluating science’ was directed towards the bureaucratic structures that impose themselves on science and run on the distrust against science’s abilities that began to spread in the 1960s and 1970s, it is also clear that science itself, and scientists, should not be spared from blame for this development. My article could perhaps be read as a Foucauldian argument that we are caught up in a vicious circle of surveillance and discipline (cf. Brighenti), but it also highlighted that the internal reward systems of science seem particularly prone to lend themselves to performance evaluation and rankings of the most superficial sort, with the benign support of many scientists. Also highly inappropriate and invalid indicators such as the ‘
More specifically, we should undertake further serious analyses of the alleged flaws of peer review, not least including causal relationships between changes in academic practice and in politics and bureaucracy. There are many problematic developments, for which the division of responsibilities is unclear and that therefore are in great need of investigation and further discussion. For example, on one hand, we have identified ‘academic capitalism’ as the operation of universities as profit-seeking businesses with not only grant money but also publications and citations as currency and capital, that makes the achievements of individual scientists and groups mere means towards this capital accumulation (e.g. Münch, 2014). Journals and citation indexes provide the techniques for sustaining this system and make up a business of its own (e.g. Macdonald, 2015). On the other hand, we have allegations of a deeply corrupt peer review system, both for journal publication and for grant allocation (to the extent that some suggest that the latter even should be replaced by lotteries, see Roumbanis, in this issue) – it is, they say, riddled with arbitrariness and chance, incompetence and ineffectiveness, conservativeness, nepotism, abuse of power, and discrimination (e.g. Laudel, 2006; Miller, 2006; see also Brighenti; Schneider, Horbach and Aagaard). Who is to blame? What came first? What caused what? Clearly, these are important matters that must not be handled too lightly but deserve serious analyses.
One promising starting point for such work is to hypothesize with the help of new or adapted concepts and conceptualizations. The concept of evaluation is itself very flexible and riddled with ambiguity. As the editors of this journal pointed out in the very first paragraph of their editorial inviting this debate (Jaclin and Wagner, 2021), evaluation is and has always been an integral part of scientific practice, and so it might make little sense to distinguish too harshly between ‘internal’ and ‘external’ evaluation. Well-deserved critique towards my own use of this admittedly oversimplified dichotomy was issued in the responses (see especially Schneider, Horbach and Aagaard), and quite evidently, as analytical tool, the dichotomy is far from sufficient. Maxim Khomyakov is therefore doing us all a favor when suggesting instead a taxonomy made of a ‘substantial evaluation’, a ‘moral evaluation’, and a ‘utilitarian evaluation’, with the first being part and parcel of scientific inquiry (in the Popperian and Mertonian meaning, see above), the second being a means for society or its institutions to evaluate the consequences and risks of scientific activities in a wider meaning, and the third representing the evaluation of the utility of science through some cost/benefit-analysis. While Khomyakov settles with noting that the third is the ‘most problematic’, I would probably take the argument one step further so that utilitarian evaluation, in this taxonomy, is the form of evaluation that is (in my words) ‘essentially pointless and mostly counterproductive’ (Hallonsten, 2021a). But how do we distinguish between utilitarian evaluation and the other two? How can we amend and adapt the taxonomy to enhance the explanatory value of our analyses? Once again we find here a useful starting point for further studies, and recognize that giving rise to further questions is an extraordinarily useful feature of any concept or conceptual scheme, or taxonomy.
Democratization
Yet another topic of crucial interest, that goes to the heart of the question of science’s role in society, is the relationship between scientists and their audience, or between expertise and laypeople. Max Weber argued that one of the main advances of civilization into modernity was the willful ignorance of us all about ‘the conditions of life under which we exist’ (Weber, 1946: 139). Leaving aside the question of whether this is generally a good or bad development, we can conclude that it is inseparable from the growing and indispensable role of institutionalized science in modern society, a role it has played well, to say the least. The corollary question, in this context, is then whether it is possible to reverse this development without destabilizing or damaging the chances for further contribution of scientific progress to human and societal progress. Quoting Weber again, since science ‘cannot tell anyone what he should do – but rather what he can do’ (Weber, 1949: 54), let me make absolutely clear that the question here is not whether this is desirable, only whether it is possible.
Clearly, science has something that could be called an ‘internal and lawful autonomy’ and is governed by norms for proper behavior and conducive of continued productivity (Weber, 1946; Merton, 1973; Hallonsten, 2021b). Some would perhaps, with a terminology that is more up-to-date, call it ‘institutional logics’ (e.g. Thornton et al., 2012). I have argued, in ‘Stop evaluating science’ and elsewhere, that peer review (or organized skepticism) is inherent and central to this self-governance, and that the self-governance is being compromised by exogenous forces that can be summarized as bureaucratization, politicization, marketization, and the like. To this should be added democratization, meaning both public control over knowledge development, dissemination, and knowledge use, and the form of
Alexandre Hannud Abdo’s argument about ‘deep democratization’ of knowledge – that the knowledge society is now a ‘fulfilled promise’ because knowledge at great scales has been made available to all and everyone – is fascinating and offers a very optimistic view on what others have called the ‘death of expertise’ (Nichols, 2017) and the rise of populism in its various shades, including recent and more vulgar varieties of ‘knowledge resistance’ (Klintman, 2019). I am not questioning either of these views, but once again wish for a sociologically and historically informed discussion over the boundaries between the institution of science and the society it serves and lives on. It seems to me that both Hannud Abdo’s ‘deep democratization’ and my own argument about general distrust in science, seconded by Khomyakov who notes that science ‘today is unknown, unintelligible and frightening activity’, hold. The impetus should of course be deeper exploration of this very issue, and here I would like to bring the toolbox of classical sociology (of science) into the mix. Knaapen argues that peer review indeed is important but ‘unlikely to be enough to challenge the economization of science’, and suggests instead that ‘more diverse external evaluation of science’ be applied in order to ‘assure science pursues a much broader range of public values, such as truth, democracy, well-being and other forms of social, economic and epistemic justice’. While I am generally sympathetic to this idea, I wonder how and if it can really be done. It was implicit in my argument in ‘Stop evaluating science’ that the outcomes of science – its products, in lack of better words – should be evaluated with attention according to their contributions to society in the widest meaning possible (and certainly not restricting itself to economic growth), but I nonetheless see great risk in the ambition to open up science to even more scrutiny, even with the sincerest of intentions. David Peterson and Aaron Panofsky offer a rather convincing argument for this when they note that ‘too often reformers lack practical knowledge about the domain in which they tinker’.
Economization (again)
‘Stop evaluating science’ started off with the assumption that the current pressure on the institution of science to demonstrate its money’s worth is due to a dual erroneous belief that its key (or even only) purpose is to drive economic growth, and that it is fulfilling this purpose unsatisfactorily. It continued by arguing that neither logic nor evidence lies behind these assumptions and the evaluation frenzy they have created, and presented as its main analytical contribution a sociologically-oriented historical review of the developments that have led to this situation. As part of this, the article also argued that the ‘evolutionary, cumulative, serendipitous, recombinant, and interactive processes by which scientific research contribute to technological and social innovation’ (Hallonsten, 2021a), often stretching out far in time and leading to unpredictable results that show up in unpredictable places, have been convincingly demonstrated in key works in the history and sociology of science, and that anyone taking the time to ponder these stories and the learnings from them will note the relentless inability of quantitative and superficial performance metrics currently used to capture these impacts and thus make justice to the scientific activities behind them.
But performance evaluation is ubiquitous nonetheless, and the reasons for this must therefore be sought elsewhere. Quite evidently, the true understanding of the nature of scientific inquiry and how it is productive and contributory to technical and social innovation has fallen short compared to other dominant narratives, such as the idea that the purpose of any societally mandated or supported activity is to drive economic growth. This is what I have chosen to call
Broadening the analytical frame somewhat, there is also much to suggest that the current ubiquity of evaluation in science, just as in society generally, is part of the same overall development of the 20th century (and beyond) of intensified bureaucratic and political efforts to control and correct things and do away with pluralism, uncertainty, and spontaneous order, in favor of rationality, predictability, and control. Such attempts are part of the universal solution of bureaucratic management that supposedly ‘protects us against chaos and inefficiency’ and guarantees that ‘organizations, people, and machines do what they claim to do’ (Parker, 2002: 2). Although such management ideology of course has roots in Taylorism and thus is inseparably tied to capitalism (but not necessarily anything like market fundamentalism or similar), it has become increasingly difficult to separate from public sector bureaucracy and its various attempts to control life. The most recent model is ‘New Public Management’ (e.g. Hood, 1991), a set of bureaucratic governance tools that notably include both quasi-marketization and quantification. Statism and expansionist welfare state policies are to blame for this just as much as market fundamentalism or ‘neoliberalism’. Great sociology is available for those interested in the deeper meaning, causes, and consequences of these developments. The modernization project is both continuous and basically apolitical, because it means the system’s invasion of the life world (Habermas, 1984) is based on the inherent expansionist character of the instrumentally rational (‘zweckrational’) at the expense of other values (Weber, 1957: 115ff.). Peterson and Panofsky describe something similar, in different terms but connected specifically to science vis-à-vis society’s other institutions, in their erudite discussion over efficiency: ‘Our inability to chart basic scientific progress undermines the ability to measure efficiency. The notion of efficiency only makes sense in the context of established means/ends relationships. The goal is to organize the means in the optimal way to achieve the desired end. The problem is that, in the area of basic science, the end is unknown’ (Peterson and Panofsky). Analyses of ‘reflexive modernization’ as a continuation of modernity demonstrate that the recognition and documentation of risks of all sorts, and the evaluation of the abilities to mitigate them, become a major task of society’s institutions (Beck, 1994) which further promote economic thinking, together with bureaucratization, and replace accountability as responsibility for conduct with accountability as capable to be counted.
Here as well, a call for introspection: What did we, as social scientists analyzing the role of science in society, do to hinder or bolster this development? It is quite clear that the structural transformation of (Western) economies in the 1970s and so on coincided (or reciprocated) with the renewal and expansion of the explanatory ambitions and reach of the economic sciences to include knowledge and technological development as factors for economic performance (e.g. Freeman and Soete, 1997; Landau and Rosenberg, 1986). The most evident feature of this development was, likely, the coining of the term ‘knowledge economy’ to denote an economy where knowledge has replaced raw materials and physical labor as the most crucial production factors. As social scientists, we are partly to blame for the proliferation of this policy
Essentially, the idea of the ‘knowledge economy’ invites further invasion of the institution of science by the systems or spheres of capitalist economy, politics, and bureaucracy, and their logics. The reason is of course that if knowledge is identified as the most crucial resource in today’s economy, then obviously the institutions, organizations, and people that produce, maintain, disseminate, and develop knowledge must be governed and evaluated as any other unit of production. There is little to suggest that the intentions on behalf of politicians and bureaucrats enacting university reforms and imposing quantitative performance evaluation schemes on (academic) science have been anything but sincere: the logic by which they organize their domains of production, policymaking, management, and administration, is one of instrumental rationality. This is, by all accounts, apposite in the market economy and the bureaucratic state, and so the attempts of politicians and bureaucrats to reform universities to make them more efficient naturally follow the same logic, in the name of efficiency and proper goal attainment. Whether or not their good intentions make the prospects of breaking or reversing this development better or worse, is another matter for further study and discussion. But broken or reversed it must be, because no matter how we interpret the details and where exactly we look in terms of blame, I maintain that current performance evaluation in science is indeed pointless and mostly counterproductive. The many interesting and highly contributory responses to ‘Stop evaluating science’ have added much insight and perspective, yet not convinced me that this is in any way an exaggeration. Thus it seems to hold, as a point of departure for discussion, far beyond what has been accomplished here.
