Abstract
Introduction
Background and problem statement
A major buzzword in current International Development practice and academia is ‘evidence-basedness’ (White and Raitzer, 2017). Following discussions of aid effectiveness of the 1990s and 2000s, a shared recognition has emerged among scientists and professionals that learning and accountability should be central concerns (Doucouliagos and Paldam, 2008; Easterly, 2007). Banerjee and Duflo (2011) famously pioneered these concerns in their experimental poverty research. One straightforward way in which actors and institutions in the International Development sector attempt to be (more) evidence-based is through evaluation of policies and programmes. However, the rise of ‘evidence-based’ Development Cooperation policy has caused evaluations to overemphasise accountability at the cost of learning (Kogen, 2018). This tension, between learning (i.e. reflecting on past programmes in hopes of improving these) and accountability (showing the ways in which taxpayer money is spent), is commonly referred to as the ‘dual purpose’ of evaluation. What is more, it is found that the goal of accountability often overshadows learning purposes of evaluations (Bjørkdahl et al., 2017).
One reason for this is that quantitative studies, such as randomised controlled trials (RCTs), can demonstrate direct impacts of programmes, while the benefits of qualitative research focused on policy learning are much less easily measurable and interpretable; these unfold over time and emerge from complex factors and stakeholders interacting in the process of programme implementation (Slade et al., 2020). As such, quantitative evaluations tend to focus on accountability between donors, implementing organisations and beneficiaries, overlooking the Compared to the large volume of publications on ‘good practices’ and ‘best practices’, far less scholarly attention has been paid to ‘bad practices’ or ‘worst practices’ despite their widespread prevalence. As a result, public officials have failed to learn valuable lessons from these experiences. (p. 4)
As Dunlop states, analysing cases where learning did not happen (or policy failures) is important, not least because failures may prove a breeding ground for learning, according to May (1992): Cases involving policy failure are useful to consider since failure serves as a trigger for considering policy redesign and as a potential occasion for policy learning. One of the basic tenets of the organisational learning literature is that dissatisfaction with program performance serves as a stimulus for a search for alternative ways of doing business . . . Policy successes might be said to provide a stronger basis for learning by making it possible to trace conditions for success. However, dissatisfaction serves as a stronger stimulus for a search for new ideas than success. (p. 341)
In short, policy learning scholarship is dominated by survey-based research and its focus on policy success skews our perception of policy learning.
A recent study by Pattyn and Bouterse (2020) stresses the importance of focusing on interactions between policymakers and evaluations in learning processes. They find that engaging policymakers in the evaluation design increases evaluation use (Pattyn and Bouterse, 2020). Finally, Barbrook-Johnson et al. (2020) show that views of evaluators influence evaluation practice. For instance, the variety of backgrounds that evaluators come from lead to different conceptions of what constitutes an evaluation in the first place (Barbrook-Johnson et al., 2020). Hence, this study asks the question, ‘How do evaluators and policymakers interact and what, if any, adjustments follow from the illustrative evaluation?’
This study focuses on learning (rather than accountability), using a mix of qualitative methods. It is focused on the position of evaluators and their interaction with policymakers. Finally, it analyses the adjustments made by policymakers and their managers, by following an illustrative evaluation as-it-happened. Because the study’s data collection took place as the evaluation process unfolded, the subsequent policy changes were not yet known. In this way, the study avoided the tendency of focusing on usual suspects and stories of successful policy change. In short, this article aims to address the following knowledge gaps:
○ Addressing the lack of processual qualitative studies in policy learning scholarship by researching the interactions between evaluators and policymakers, and
○ Refocusing attention from accountability to institutional learning by analysing the follow-up of an unfolding evaluation process.
Theoretical framework
In order to situate this study within current policy evaluation scholarship, this section will first discuss institutional learning. Second, it provides an overview of existing evaluation uses, a metric used to analyse learning. Third and finally, it sheds a light on the positions of policymakers and evaluators.
Institutional learning
An important source for understanding policy change and learning is Hall’s 1993 article ‘Policy Paradigms, Social Learning and the State’. Hall distinguishes between three potential ways in which states change policies. A first-order change refers to changing levels of existing instruments (e.g. tax rates increase by
Evaluation use
Government-commissioned evaluations are expected to not only serve accountability, but also stimulate institutional learning. As such, practitioners are ‘utilization-focused’, implying that evaluations are constructed with a specific user in mind and valued according to their usefulness (Patton, 2011: 315).
Types of evaluation use and learning found in policy evaluation scholarship.
Especially instrumental, conceptual and empowerment use are relevant, for this is when learning takes place (Bouterse, 2016). In order to understand the variety of ways in which evaluations may be used, it is important to take a closer look at their users (policymakers) and creators (evaluators).
Policymakers and evaluators
It is advisable to analyse policymakers and evaluators at the individual level, since they are best positioned to describe their own changes in learning. In a recent study, Schmidt-Abbey et al. (2020: 205) call for an increased need to focus on evaluators themselves, given their ‘embeddedness within an evaluand’. Grob (1992) studied policymakers and evaluators, which according to him sometimes appear to be worlds apart. He characterises evaluators as critical and concerned, and eager to make a difference, yet often ending up frustrated when their findings are ignored or misused. Policymakers, on the contrary, complain that evaluations are too long, published too late or at times irrelevant (Grob, 1992).
Policymakers and evaluators therefore have separate spheres of influence (see Figure 1). Nonetheless, Pattyn and Bouterse (2020) show that their interaction may result in improved uptake of evaluation lessons. What is more, increased cooperation (e.g. developing a research question together, holding regular feedback interviews) between policymakers and evaluators may benefit learning through a process called developmental evaluation, or adaptive evaluation (Patton, 2011: 305). Hence, it is worthwhile to study the interaction between evaluators and policymakers, visualised in Figure 1.

Conceptual scheme detailing the chronological process of an evaluation trajectory and policymakers and evaluators’ spheres of influence.
Conceptual scheme: Key concepts and operational definitions
The conceptual scheme in Figure 1 guides the analysis of this study by highlighting its key concepts and relationships, showing an evaluation process. It will be used to structure the analysis of the study when presenting its results. Given the variety of contextual factors at play, it will be impossible to establish a causal relationship, hence the exploratory nature of this study. Nonetheless, a number of key concepts will be disentangled, and their relationships analysed. The main concepts of this study are evaluation, evaluandum (object of evaluation) and adjustment. On one hand, the study aims to analyse the position of the evaluator and their interactions with policymakers. This part of the study finds itself in the evaluators’ sphere of influence. On the other hand, it analyses the interactive learning process of policymakers and evaluators by tracing the managerial adjustments following the illustrative evaluation.
Methodology
Research setting
Empirical data collection took place within the Dutch Ministry of Foreign Affair’s Evaluation Department. This is a relevant research setting for three reasons: First, carrying out research here ensured access to rich qualitative data (e.g. Terms of Reference and interviews) which improved the robustness of the study. Second, the Evaluation Department is one of the first government evaluation units (founded in the 1970s) of development aid, resulting in a long tradition of evaluation expertise and high level of ‘maturity’ (Pattyn and Bouterse, 2020). As such, the Netherlands has a strong evaluation culture (Dahler-Larsen and Boodhoo, 2019). Third and finally, the setting provides the researcher with the opportunity of studying evaluation and policymaking ‘as it occurs’, increasing the ecological validity of the study. As the day-to-day business of policymaking is included in the analysis, the study paints a rich description of learning processes. This study’s units of analysis include evaluations, evaluators and policymakers. The units of observation are employees of the Evaluation Department, policymakers of the Ministry of Foreign Affairs and evaluation reports.
Data collection
To answer the question of this research, the following data sources were used: three evaluation reports (ranging from development cooperation to foreign trade and international relations–themed studies), semi-structured interviews with evaluators and policymakers (
The illustrative evaluation process, used to study learning specifically, concerns the publishing and response to the report ‘Less Pretension, More Realism’ (Directie Internationaal Onderzoek en Beleidsevaluatie (2019a)). It is referred to as the ‘illustrative evaluation’ for the remainder of the article. Using a snowballing sampling technique, interviews were held with evaluators and policymakers, including the author of the policy response and the director of the respective policy department (Directie Internationaal Onderzoek en Beleidsevaluatie (2019b).
All interview transcripts, meeting minutes and documents were uploaded to Atlas.ti, coded using two cycles (starting with hypothesis coding, ending with evaluation coding) and subsequently thematically analysed. A detailed overview of the collected data can be found in Supplementary Table S1.
Limitations and data quality
This section shortly lists potential limitations and assesses its data quality. The study cannot infer causality, as there is no way of establishing a counterfactual, that is, what would have happened in a given situation if there had not been an evaluation. Moreover, it must be emphasised that the adjustments that follow evaluations are not per se
To decrease selection bias among interviewees, all employees of the evaluation department were interviewed and posed the same questions to increase replicability in different thematic fields, or in other locations (Bryman, 2012; LeCompte and Goetz, 1982). Focusing on one evaluation department provides limited external validity (Bryman, 2012; LeCompte and Goetz, 1982). In this study, data collection took place in a mature evaluation setting. With decades of experience, this department has built a strong reputation and extensive knowledge of past, current and future programmes. Hence, the study’s findings and recommendations may not be generalised to just any evaluation setting, but may prove relevant for other mature evaluation contexts.
Results
This section presents the main results of the analysis along two spheres of influence of the conceptual scheme. The model also portrays the illustrative evaluation process. First, it presents the position of evaluators and, second, it illustrates policymakers’ adjustments in response to the illustrative evaluation. A full overview of the variety of data collected (interviews, participant observations and documents) for this study can be found in Supplementary Table S1.
Speaking truth? Evaluators play different roles and are uniquely positioned
In the semi-structured interviews with policymakers and evaluators, respondents were asked about their perceptions of evaluators. A number of themes recurred in the interviews surrounding questions about their perceived impact as well as their position within the Ministry.
A number of assumptions and views surrounding what evaluators ought to do, or not do, became apparent. For instance, several respondents indicated the evaluation department is too academic, as it desires to be ‘the expert’. As one policymaker put it, ‘The evaluation department has the tendency to want to come up with new methods, and first becoming experts in a domain rather than using existing material and moving ahead’ (policymaker, interviewee 30, 2019). Interestingly, respondents held contrasting views about how critical evaluators should be. Several respondents indicated evaluators need to be more critical, as the evaluation department is precisely the department that can afford to do so, because its reputation and budget is strong. As such, it should not shy away from writing critical reports. It differs from consultancy and nongovernmental organisation (NGO)-based research: It suffers less from positive bias, which arises when evaluators over-report positive findings (or even exclude negative ones), in order to uphold a good relationship with the organisation funding the evaluation. Other respondents, on the contrary, urged evaluators to strike a more diplomatic tone: ‘Evaluators need to avoid “attacking” policymakers by writing more diplomatically. Though there is a risk of writing too diplomatically; this requires pedagogic skills’ (evaluator, interviewee 27, 2019).
Furthermore, several notions of the relationship between policymakers and evaluators surfaced from the interviews. A recurring concern among policymakers and evaluators alike was the apparent divide in understanding of each other’s context: It is important for evaluators to understand the limits (in terms of workload, political sensitivity) of policymakers, and what their spheres of influence are. For instance, a recommendation to increase capacity is applauded by employees, but at the same time, they cannot decide to hire people themselves. (Policymaker, interviewee 30, 2019)
Besides their perceived lack of understanding, there is certainly a sense of appreciation for each other’s work: Policymakers speak highly of evaluators and acknowledge their independent position: I also tell them (fellow policymakers, red.) to, when in doubt, ask IOB (the evaluation department) for advice, they can be seen as neutral experts, and their advice only sharpens conclusions we as policymakers draw about an evaluation. The reputation of IOB is high, both in the Netherlands and abroad. (Policymaker, interviewee 30, 2019)
Finally, the expert status is recognised by policymakers, who indicate there is a recent desire to improve monitoring and evaluation (M&E) capacity in several departments: ‘At the same time, I think now, there’s more desire for having ex-evaluators in policy departments, because evaluators have time, unlike policymakers, to get really deeply informed with a topic, which means they become almost experts’ (policymaker, interviewee 30, 2019).
In summary, respondents hold a variety of views regarding the position of evaluators. On the basis of the interview data presented above, it was found that various, and at times contrasting, functions were attributed to the evaluation department. To this end, a typology was created of roles, characteristics, outcomes and a discussion of their advantages and disadvantages. This typology is presented in Table 2.
A typology of evaluation department’s roles, based on interviews with evaluators and policymakers (
These various roles are within the sphere of influence of the evaluator and may therefore serve as a deliberation tool. If evaluators are conscious of their respective roles, within the team and institution, they become more aware of their acquired understandings and the partiality and potential complementarity of that. The specific implications of the typology will be discussed in the ‘Implications and recommendations for M&E practitioners’ section.
Finally, the interview data comprised many views of the interactions between policymakers and evaluators. This policymaker–evaluator nexus, where varying types of evaluation use surfaced, and hence learning may take place, will be discussed in the next section.
Speaking truth to power? Policymakers and managers adjust in various ways
This section presents the results of interviews conducted with policymakers and evaluators, as well as a document analysis (i.e. the evaluation report and policy response letter), all pertaining to one illustrative evaluation trajectory. Three different types of evaluation use (symbolic, instrumental and empowerment) were found and will be discussed below.
Symbolic
Evaluators found that the achievement and sustainability of results had been impaired by high levels of fragmentation: Funding was spent too scarcely between various small and geographically distant activities. The policy response letter of the Cabinet (signed by the Minister of Development Cooperation and Trade) recognised this recommendation. The Ministry asserted it has started limiting the number of activities, as more focus will increase the quality of Dutch efforts in development cooperation (Directie Internationaal Onderzoek en Beleidsevaluatie (2019b)).
During the interviews, several policymakers indicated that this lesson is not new: Fragmentation had been a recurring issue in development cooperation spending. However, two policymakers did point out that the document for policymakers to ‘make their case’ better for reducing fragmentation, vis-à-vis their managers, but also towards implementing organisations like NGOs. As such, the evaluation is used as a substantiation for the ongoing fragmentation discussion within the Ministry.
Instrumental
In response to recommendations, a number of tangible actions have been taken: first, the establishment of an internal working group for defragmentation efforts as well as exploring alternatives to tendering, which include important so-called ‘change agents’ within the Ministry. The goal of this group is to investigate the existing bottlenecks in defragmentation efforts and to find the best way to reduce the number of activities of departments by about 30 per cent. It was one of the first instances that a dedicated working group was established after an evaluation, thus setting up the stage for a ‘learning-team’ in which collective learning could come to full fruition.
Empowerment
Evaluators find an overemphasis on accountability vis-à-vis learning in current M&E efforts. The use of standardised indicators is justified, but its dominance damages the use of M&E for learning purposes. Policymakers face pressure from Parliament to report results. As a consequence, result frameworks developed in advance hardly suit the changing and fragile contexts in which programmes take place. In this way, both NGOs and the Ministry are not incentivised to reflect and learn, or report negative results, either fearing the loss of funding or facing parliamentary criticisms (Directie Internationaal Onderzoek en Beleidsevaluatie (2019a)).
The Cabinet acknowledges that monitoring and evaluation should be given more attention across the board. Hence, it promises to increase the capacity for M&E staff as well as training current employees, both within the Ministry and at embassies (Directie Internationaal Onderzoek en Beleidsevaluatie (2019b)).
In interviews, policymakers recognise the tentative rising interest in M&E across the Ministry. There appears to be more room to do something around ‘lessons learnt’ and M&E. One policymaker thought that, on one hand, external pressures, like politicians asking for transparency about results, drive this development. On the other hand, she observed an internal drive to organise M&E better, although this differs per subject and level: ‘At the activity-level, there is a lot of opportunity for change and amendment. It gets trickier at higher levels, where political wishes may run counter lessons we learn about effectiveness’ (policymaker, interviewee 31, 2019).
Summarising, the results of the illustrative evaluation trajectory showed a variety of adjustments and interactions between policymakers and evaluations. Symbolic (evaluation is used as substantiation in internal discussions about fragmentation), instrumental (the goal of 30% activity reduction and establishment of a working group) and empowerment (call to increase staff capacity in the ministry) uses of evaluations were found. These findings are presented in Table 3, adapted from Bouterse (2016), which recaps evaluation uses and presents an illustration from this evaluation trajectory.
Results: Types of evaluation use and illustrations of learning found in case study data (evaluation reports (
Implications and recommendations for M&E practitioners
This penultimate section takes the study’s key findings and, based on their implications, formulates a number of recommendations to M&E practitioners. A snapshot of these findings, implications and recommendations can be found in Table 4.
Key findings, implications and resultant recommendations.
As reported in the ‘Results’ section, two key findings were distilled from the study’s data.
First, evaluators play different roles and are uniquely positioned. The typology of roles, presented in Table 2, gives an idea of these roles, typical characteristics and corresponding products. This is not the first study to challenge the idea of evaluators as singularly oriented to research methods and models. Skolits et al. (2009) find that evaluators take on a wide variety of demands and recommend a more ‘situational’ perspective on the role of the evaluator. They find that consideration of the expected evaluation activities, their particular demands and required products (e.g. types of deliverables) warrants careful consideration of roles when recruiting evaluation team members (Skolits et al., 2009). Therefore, this study recommends deliberation of required roles at the very outset of an evaluation trajectory. However, role deliberation is by no means definitive. Evaluators may, where possible, take on multiple roles throughout an evaluation trajectory. Verwoerd et al. (2020) find that combining the role of evaluator and facilitator, for instance, resulted in an evaluation that better matched the project under scrutiny. This flexibility in roles can provide an evaluation with emergent qualities, where adjustments can be made in response to needs of policymakers, (external) researchers or changing political realities (Verwoerd et al., 2020). Hence, the benefit of an evaluation trajectory with emergent qualities, that allows evaluators to change roles when necessary. Furthermore, the study found that evaluators are deemed independent and having time to get deeply involved in a project. Grob (2012) shows that while decisions in policymaking are never made by one person or organisational entity, evaluators have a unique position because of their independent and helpful reputation. What is more, the nature of their work allows evaluators to build their knowledge, since they have the time to get deeply acquainted with programmes under scrutiny, as well as state-of-the-art research of ‘what works’ (Tourmen et al., 2021). Their unique, independent position, as well as the time they have to build a strong basis of knowledge, implies their added value lies with acting as knowledge brokers while recognising the partiality of their own knowledge and need for knowledge exchange with others.
Second, three types of managerial adjustments were found when analysing the illustrative evaluation trajectory: symbolic, instrumental and empowerment. A more detailed overview of these adjustments was presented in Table 3. Managers may use evaluations in a symbolic way, for instance, to substantiate an already ongoing discussion. This entails a risk of evaluators being pressured to report previously held beliefs (Pleger and Sager, 2018). However, Pleger and Sager find that these influences are not necessarily negative, but may also be positive. They offer three differentiating questions to evaluators to discern the type of influence at hand: Is the attempt to influence consciously or unknowingly (
Discussion
This section highlights key contributions of the study and subsequently outlines potential avenues for future research.
The first key result, that evaluators play different roles, was summarised in Table 2. The idea of
Second, three types of managerial adjustments were found in response to the illustrative evaluation trajectory. Certainly, the Ministry has, in response to this evaluation, taken concrete actions, summarised in Table 3. Examples of these include the postponement of a parliamentary debate (the Ministry wanted to await evaluation findings and lessons before publishing the new subsidy framework), the target to reduce activities by about 30 per cent and the goal of increasing M&E capacity and cutting the number of activities per policymaker. Simultaneously, these illustrations highlight three types of evaluation use (see Table 1), corresponding with Bouterse’s (2016) overview of evaluation uses: symbolic, instrumental and empowerment use of evaluations. Interestingly, the illustrative evaluation process shows resemblances with Hall’s (1993) fundamental framework for policy change. For instance, the goal of reducing fragmentation, to which policy departments have responded by initiating 30 per cent cuts in activities, portrays a first-order change, a mere decrease in the level or ‘setting’ of an instrument. Furthermore, the suggestion to move away from tendering as method of contracting implementing organisations portrays a second-order change, the changing of instruments. Finally, the typical recommendation for evidence-based programmes hints at a paradigmatic change. This also portrays the lively debate surrounding ‘what works’, in which evaluators have a role to play as knowledge brokers, is alive and well. As Hall (1993) points out, in the realm of first- and second-order change, there is room for expert judgement. The paradigm, however, provides the context in which potential adjustments are made. These are not directly amenable because they refer to reigning worldviews and are the result of political contestations, determining, for instance, who is deemed an expert. Although evaluators can hardly influence the dominant paradigm, a government can look at evaluation departments for inspirations and input about alternative, perhaps better, paradigms than the status quo. In this combination, of looking back and reflecting, but also offering alternative ways of thinking and acting, lies the worth of an evaluation department.
In conclusion, we believe this study’s empirically based typology of evaluator roles constitutes a novel contribution to policy learning scholarship. These roles call for careful consideration of evaluation teams and incorporation of emergent qualities in evaluation trajectories. The role of knowledge broker is promising, since evaluators’ time and reputable status gives them credibility and extensive insight into programmes. Managers adjust to evaluations in various ways. Yet, evaluators are equipped to respond to potential pressures by knowing how to discern positive from negative influences, as well as by engaging proactively with stakeholders. Finally, the study contributes to an ongoing methodological gap in evaluation literature identified by Moyson et al. (2017). Using a mix of qualitative methods, and analysing an evaluation as-it-happened, the study presents unprecedented insights into evaluation processes within a Ministry.
Several suggestions for future research arise from this article. A replication study could be executed in another context, for instance, in the Ministry of Foreign Affairs of another country, which may have different organisational structures, or within another Dutch Ministry. It would be interesting to analyse whether follow-up and learning work through similar mechanisms in other policy areas. Future studies could incorporate elements of systems thinking and institutional analyses to discern bottlenecks and path dependencies in policy learning. Furthermore, in terms of methodology, future research could use time series methods, to analyse whether evaluations’ recommendations stick in the long-term, or comparative studies to analyse the follow-up of several evaluations, instead of one illustrative evaluation. Finally, future studies could dig deeper into the enabling circumstances for learning, in order to move closer to the ideal of ‘evidence-based’ policymaking. An example research question could be, ‘What factors incentivise, or constrain, policymakers to learn from evaluation?’ There’s a lot to learn.
Supplemental Material
sj-docx-1-evi-10.1177_13563890221109620 – Supplemental material for Speaking truth to power: Exploring a Ministry’s evaluation department through evaluators’ and policymakers’ eyes
Supplemental material, sj-docx-1-evi-10.1177_13563890221109620 for Speaking truth to power: Exploring a Ministry’s evaluation department through evaluators’ and policymakers’ eyes by Lotte Levelt and Nicky Pouw in Evaluation
Footnotes
Declaration of conflicting interests
Funding
Supplemental material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
