Abstract
Introduction: AI innovation in Finnish public administration
Attempts to automate public services and administrative decision-making through AI are ongoing internationally, nationally endorsed as an inevitable and economically desired technological revolution and embedded in a broader global wave of AI policies and strategies (Bareis and Katzenbach, 2021; cf. OECD.AI, 2025). The allure of AI to enhance the efficiency, availability and quality of public services and welfare benefit distribution has also become compelling within Finland's public administration. This appeal is driven by demographic challenges (e.g. an ageing population and labour shortages), fiscal pressures and organizational aspirations to be regarded as innovative and proactive in adopting and leveraging emerging technologies. However, Finnish public organizations are still in the early stages of innovating and integrating AI, constrained by robust legislation on data protection and decision-making automation (Koivisto et al., 2024). At the same time, the present government's programme urges public organizations to expand automated decision-making and create legislative conditions for the ‘full use of the opportunities provided by digitalisation and artificial intelligence’ (Finnish Government, 2023, 6; see also, e.g. 44, 125). Finnish public organizations have shown a notable enthusiasm for experimenting with rapidly proliferating AI (Kuntaliitto, 2024). While AI technologies frequently fail to fulfil their promises, their development often continues for a significant period before they are cancelled or overhauled (Ratner and Schrøder, 2024). This issue is particularly timely, as the launch of ChatGPT in late 2022 accelerated global investments in generative AI and the infrastructures enabling its development and use (Hao, 2025) while also propelling Finnish public organizations to experiment enthusiastically with such technologies.
This article examines this growing interest by tracing the innovation process of a generative AI decision-support tool within the Finnish public sector and how technical challenges are addressed to keep AI innovation going. More specifically, we ask the following research question: How is a generative AI tool's innovation sustained through the boundary-crossing and justificatory practices of the innovator team, despite recurring failures to deliver on its promises? In doing so, we respond to Suchman's (2023) call for more critical attention to the ‘thingness of AI’ by showing how the tool is made an
Our study followed the development of a large language model (LLM)-based generative AI tool designed to assist the organization's internal claims specialists in identifying up-to-date and relevant guidelines needed for them to make decisions on citizens’ claims. The tool was supposed to solve a widely recognized problem resulting from a vast array of complex, scattered, and ever-changing guidance documents that were considered obstacles to better, faster and more efficient decision-making. For its users, the tool functioned as a bot and appeared similar to popular generative AI software such as ChatGPT; its answers were obtained and refined with additional user queries and prompts. The tool's responses were accompanied by links to the identified relevant parts of guideline documents so that users could validate an answer or search for more information. Technically, the tool was based on a document database, a search engine and an LLM for response generation.
To examine the tool's development process, we conducted nearly 1.5 years of ethnographic fieldwork, observing the organization's innovation team through four fast-paced phases characterized by intensive collaboration and experimentation. Our research focused on half of this team – 15 in-house experts whose work centred on concept design, technological exploration and experimentation – that we refer to as ‘innovators’. They occupied roles such as IT architects, software developers, innovation managers, service designers and business analysts. Their expertise spanned technological development, data science, organizational processes, project management, product ownership, strategy work, social sciences, foresight, design thinking and emerging technologies. Due to the innovation team's multiprofessional composition, their daily work involved boundary-spanning tasks both within the team and in collaboration with other organizational units such as frontline work. Their innovation activities were also supported by design and innovation consultants from a private company.
This study contributes to research on generative AI innovation in the digital welfare state in three ways. First, compared with more deterministic AI systems, we show how generative AI tools can be particularly difficult to challenge as nonfunctional, helping to explain the accumulation of failing AI innovations in the public sector despite their portrayal as successful projects. Second, we show how the tool's development unfolded as a highly politicized, cross-boundary process shaped by diverse justifications and organizational dynamics and how innovators responded to setbacks through discursive practices, new experiments and a shift towards organizational and user-related explanations of failure. Third, we identify key actors, boundaries and categories of boundary work that sustained the tool's innovation. Together, these findings explain how fictional technology promises rendered the tool an irresistible imaginary object and kept the innovation process in motion in the face of significant challenges.
The article proceeds as follows. First, we introduce the theoretical framework of boundary work and regimes of justification. Then, we describe the research methodology and the AI tool's development phases. This is followed by the presentation of findings, which trace the enactment of nine justification frames grouped into three categories according to their justificatory basis. Finally, we discuss the study's main contributions, note its limitations and suggest avenues for further research.
Theoretical framework
Boundary work and boundaries
Boundary work refers to the activities, both intentional and unintentional, through which individuals and groups create, maintain or modify boundaries that distinguish professions, occupations and groups (Langley et al., 2019). These boundaries are not static; they shift as professionals seek to protect their resources and interests, assert authority or adapt to disruptions (Abbott, 1988; Gieryn, 1983). Beyond competition, boundary work also enables collaboration by creating shared spaces that allow actors from different social worlds to engage in joint activities (Bowker and Star, 1999; Choroszewicz, 2025; Choroszewicz and Alastalo, 2021). The literature highlights various forms of boundary work, with existing boundaries remade or new ones established to create and preserve distinctions between groups and safeguard current resources or obtain new ones (Zietsma and Lawrence, 2010). Following Callon (1986), we understand boundary work as encompassing translation moves that define problems, roles and ‘obligatory passage points’, aligning heterogeneous actors and interests around a specific sociotechnical configuration – in our case, the generative AI tool as a route for addressing perceived organizational challenges (see Van Van Lente and Rip, 1998).
Our analysis draws on Langley et al.'s (2019) conceptualization of boundary work to capture the variety of activities that sustain the AI tool's development. Unlike earlier notions that emphasize competition (Abbott, 1988; Gieryn, 1983), Langley et al. distinguish three co-existing forms: competitive, collaborative and configurational.
Langley and colleagues define competitive boundary work that resonates with those in the original studies on boundary work (Abbott, 1988; Gieryn, 1983), which focus on people's acts of defending, maintaining and producing boundaries to exert privilege, power and legitimacy. This form of boundary work includes mechanisms through which established or privileged groups – such as innovators – draw on differences and distinctions to guard or enlarge their space of influence, and through which individual innovators make explicit to other organizational actors the collective and individual boundaries of their competence.
Collaborative boundary work enables collaborative goals and efforts despite competing interests and perspectives. It focuses on downplaying demarcations, conflicts and differences in favour of building connections and aligning interests. It involves negotiation, collaboration, coordination, adjustment and compromise between actors to perform the required work. As Levina and Vaast (2005) have shown, collaborative boundary work can be facilitated by individuals known as boundary spanners who possess a unique or desired expertise or position that enables the initiation and creation of joint interests by transitioning between and navigating different social worlds. Boundary spanners might be especially important in transitioning technologies from conceptual design to active use.
In configurational boundary work, powerful actors (e.g. organizational managers, lead innovators) usually work through boundaries to reshape the existing landscape. Their goal is to influence others’ behaviour so that new, temporary activities serve collective purposes. This may be particularly useful for powerful organizational actors to change the activities of others by temporarily separating or integrating people, ideas or objects into new arrangements, despite their continuing to belong to their original boundaries.
In our analysis, we trace the boundary work initiated by the Finnish innovation team as they ‘innovate under conditions of uncertainty’ (Beckert, 2016, 185). We examine how their efforts to justify the tool's development involved crossing and reshaping boundaries through experiments, strategic collaborations and translation that aligned expectations and enrolled other organizational actors.
Regimes of justification
Luc Boltanski's and Laurent Thévenot's ‘sociology of critical capacity’ (1999) highlights discontinuities, contestations and negotiations as key constituents of social life, focusing on moments of dispute and justification to uncover underlying social assumptions (Atkinson, 2019, 312). In this approach, people's critical capacities are activated in moments of actual or expected dispute, which often arise from differences in conceptions of ‘worth’ or when there are different opinions on whether a certain worth is at risk. In opposition to structuralist tendencies to conflate action with structural forces, Boltanski and Thévenot (1999) call for people's agentic capacities to oppose and negotiate such social forces. People do not passively submit to social circumstances but actively debate, contest and influence the directions that social processes take (cf. Velkova and Kaun, 2019). Notably, Boltanski and Thévenot show how people navigate through such disputes and uncertainties by referring to structured and recurring ‘orders of worth’. These orders underlie and motivate the discursive practices, critiques and justifications at play in disputes and represent different understandings or aspects of the moral and common good.
Boltanski and Thévenot's major work,
We draw on Boltanski and Thévenot's justification framework to analyse how justifications are constructed in anticipation of disputes, which orders of worth are mobilized in discursive practices and how these articulations of ‘good’ evolve as both driving forces and protective layers in the tool's innovation process. We use the term
Our analysis thus speaks to longstanding work in the sociology of innovation, especially the sociology of expectations, which has shown that expectations, promises and future-oriented narratives are constitutive of innovation processes rather than mere reflections of technical possibilities (Borup et al., 2006; Dandurand et al., 2020). Building on this literature, we start from the assumption that AI innovation processes are saturated with expectations about desirable technological futures. We treat regimes of justification as a key resource for understanding how promissory narratives gain traction and draw on boundary work to analyse how these expectations are enacted, negotiated and stabilized in situ through justificatory practices around a concrete generative AI project, thereby grounding promissory narratives in specific moral orders.
Researching generative AI innovation in public administration
Ethnographic fieldwork
The research material was collected through multi-site digital ethnography (Pink et al., 2016) in a Finnish public organization between May 2023 and September 2024, a period marked by rapidly growing interest in LLMs in the wake of ChatGPT's release in late 2022. The tool's innovation process began three months into our fieldwork, which centred on observing the organization's innovation team. Initially, we focused on their weekly meetings and co-creation sessions with external consultants. The team then prioritized the development of this generative AI tool – selected as the first among seven design concepts – because of its perceived potential to catalyse the remaining ideas. Development then progressed rapidly and intensively. It accelerated when the decision was made to develop the tool using an LLM-based solution. From that point onward, for over a year (August 2023–September 2024), our fieldwork followed the team's intensive innovation activities around the tool, which had quickly become one of its central innovation projects and was widely seen as highly promising.
We typically spent 3–7 h per week observing the team's activities surrounding the tool's innovation process. Except for a few face-to-face workshops and public events, all other sessions were conducted online, as the innovators who composed the team were geographically dispersed, and the team operated primarily through online meetings. This included formal and informal discussions, design workshops, demonstrations of technical developments, the planning of six and execution of five rounds of user experiments, and numerous internal and public presentations by the innovators and their collaborators. Many of the more concrete innovation activities were performed in small-group meetings to which we were almost always given access upon request and over time invited by default. Thus, our access to the tool's innovation process was nearly unrestricted. The only exceptions were a few key meetings with higher-ranking managers and internal decision-makers from which we were excluded due to concerns that our presence might influence the discussions.
Research data and analysis
Our material includes fieldnotes from 221 h of observation, documents produced by the innovators for internal and external use, selected media coverage of the organization's AI innovation efforts and 27 interviews with innovators, consultants and claims specialists who tested the tool.
Our investigation employed an abductive analytical approach characterized by iterative movements between empirical data and theory (Van Hulst and Visser, 2024). This process began during fieldwork, prompted by our puzzlement at the rapid pace of the AI tool's innovation, marked by sudden turns of events, a multitude of activities and rich discursive practices. We began systematically tracking the decisive cross-boundary activities and evolving discursive practices, pointing us to the concepts of ‘boundary work’ (Langley et al., 2019) and ‘justification worlds’ (Boltanski and Thévenot, 2006) to further explore and structure our observations. Following the conclusion of our fieldwork, we synthesized our initial observations and analyses to construct a comprehensive overview of the AI tool's development process (see Figure 1), identifying critical phases and events, key actors and their roles, the justifications invoked to initiate and sustain the development and the evolving boundaries being negotiated and reshaped. Our ethnographic approach, drawing on multiple data sources including observations, interviews and close readings of related documents, enabled us to capture the tense dynamic between the innovators’ assumptions, expectations and verbalizations on the one hand and their concrete actions and outcomes on the other as the AI tool's development unfolded. We adapted Boltanski and Thévenot's ‘dynamic realism’ – a ‘theoretical orientation that seeks to grasp action in its relation to uncertainty’ – into an ethnographic lens on experiments as ‘reality tests’ (2006, 17, 133–138) through which actors attempt to settle uncertainty. This approach allowed us to trace how the innovators’ assumptions and practices were repeatedly challenged by the tool's opacity and its failures to meet expectations of accuracy, precision and consistency, thereby highlighting the need for continuous revisiting of assumptions and a rearticulation of the arguments sustaining the innovation process. Methodologically, we align here with calls to study algorithms ethnographically by following the multiplicity of practices through which people engage with them (Seaver, 2017), while we also highlight their dynamic otherness as crucial to the uncertain process of human-machine interaction.

Phases of the AI tool's development and critical innovation activities.
Finally, we iteratively examined how each justification frame emerged along the innovation trajectory (see Table 1) to highlight their specific function to deflect attention from the tool's unmet promises. Together, these justifications created a driving engine and a protective structure around the tool's shortcomings in accuracy, precision and consistency (see Figure 2), sustaining the process by enabling further experiments aimed at demonstrating the tool's imagined value and keeping the innovation in motion.
Descriptions of the identified justification frames.
The development phases of the tool
We identified four fast-paced phases marked by intensive experimentation and collaboration (Figure 1). Across all phases, the innovation team engaged in constant teaming, re-teaming and sub-teaming to plan, execute and evaluate experiments. Regular weekly meetings and long co-design sessions were complemented by ad hoc gatherings initiated by the lead innovators and often involving organizational and external stakeholders (e.g. consultants and managers of the frontline work).
Framing and safeguarding AI innovation across boundaries
Our findings trace how nine justification frames (see Table 1) were enacted by the innovators in collaboration with other organizational actors and external consultants to sustain the momentum of the AI tool's innovation despite the tool's unmet promises. We group these frames into three categories based on their justificatory basis: the tool and its fictional expectations, the innovation process and its conduct and the innovation ideology and mindset. These findings highlight how the identified justifications form a justificatory package, a protective structure for the tool. This structure is related not only to the tool itself but also to the broader context and process of its development (Figure 2).
Tool-oriented justification frames: From fictional promises to failure
Five justification frames were enacted by the lead and technical innovators, in collaboration with key consultants and organizational managers, to construct and reinforce the imagined value of the AI tool and a ‘prospective structure’ for its development (Beckert, 2016, 177–178; Van Lente and Rip, 1998, 206). These frames drew on the industrial, vitalist, fame, market and civic orders of worth, making the tool appear compelling to both internal and external stakeholders. Invoking these frames legitimized the tool's continued development despite disappointing results and persistent uncertainty about how to resolve its technical limitations, particularly in ensuring accurate, precise and consistent outputs.
The industrial and vitalist frames were the first to emerge as catalysts of the tool's development. The industrial frame conveyed the tool's promise to enhance claims specialists’ efficiency, productivity and consistency in decision-making processes. It was soon joined by the vitalist frame, forming a power couple that reinforced each other throughout the innovation process. The first frame emphasized efficiency benefits for the organization, while the second stressed care for the claims specialists’ well-being and job satisfaction, which was expected to reduce their proneness to decision-making errors. The specific argument in the vitalist frame was that claims specialists were cognitively burdened by complex, scattered and ever-changing guidance documents and that the tool would ease this strain by helping them find and use relevant guidelines documents and information for decision-making on citizens’ claims: There are errors being made. (…) The high cognitive strain raises the possibility of errors. (…) There is “noise” in [the organization's] decisions, meaning that similar applications do not always receive similar decisions. (…). There's a cognitive overload on staff due to the volume, updating activity and fragmentation of instructions. [The organization] has received notices from [external officials] to reduce the cognitive load. (Innovators and consultants, team meetings and experiment negotiations with test user team, winter 2023–2024)
The continuity of both frames was ensured by intense collaboration between the innovators and external consultants to create and sustain a speculative imaginary of the tool (Beckert, 2016, 173–178) that could be achieved if prompted properly and skilfully and if supported by the right organizational processes. This ideal of a tool that was accurate, fast, precise and consistent in its outputs permeated the lead and technical innovators’ communications and discussions about the tool, leaving little room for critical questions or doubts raised by other innovators or organizational actors. The primary focus remained on continuous learning through rapid cycles of testing and user feedback, which was presented in ways to provide more evidence, motivation and justification to continue developing the tool. A prominent technique for maintaining this imaginary was the argument about the tool's accuracy, understood as the perceived correctness of its outputs, presented using clear and high percentages – 70%, 75% and 80% – and repeatedly emphasized by the lead innovators across meetings throughout the entire innovation process as evidence. These percentages were scarcely ever questioned; when they were, no further discussion ensued, as the following fieldnote excerpt illustrates: During a meeting on the last experiment's results, a leading innovator sought a clear answer from the leader of the experimenting user team about whether the precision of the tool was satisfactory for them to implement this tool in their work. The leading innovator claimed that the tool's rate of correct answers was at the level of 75%–80% before the experiment and argued that the experiment was successful in slightly increasing this percentage of correct answers. At the very end of the meeting, a manager from the collaborating welfare domain returned to the innovator's estimated rate of the tool's correct answers with a follow-up question: “Was that 75% really, entirely correct? Was it verified?” The leading innovator replied that “75% is my recollection. But I can check”. (Discussion of the fifth experiment's results, summer 2024).
The innovators generally attributed the tool's failure to deliver on its promises of accuracy, precision and consistency to organizational processes and often referred to users’ unrealistic expectations of AI technologies. The required fixes to those processes and user capabilities were bounced around in meetings between lead and technical innovators and key consultants. The exchange of ideas was especially dynamic when they reflected on the user feedback from the experiments and its implications for the continuity of the tool's innovation, including introducing further experiments: The challenge lies in users’ inability to effectively interact with it [the tool]; they are tied too much to the old way of working. How could [users] learn away from using it as they would use Google? (…) And it takes time to learn away from that! 75% [the estimate of the tool's correct outputs] sounds good. But they’re [the test users and their superiors] saying it needs to be 100% correct. We will never get that. (Innovators and consultants, internal meetings on reflecting lessons from experiments, autumn 2023–summer 2024).
The remaining three frames – fame, market and civic – emerged during phases 2 and 3 in response to mounting evidence from the experiments showing the tool's limitations: specifically, the inaccuracy, inconsistency and errors in its outputs. The fame frame was built on the often-expressed popularity and desirability of the tool within the organization and highlighted the widening recognition of the innovators’ work, portraying them and the organization as forerunners in employing generative AI technologies for societal good. The frame strengthened after the fourth experiment, which produced mixed results and highlighted the persistent boundary between innovation and frontline work. Because users had produced a thick volume of critical insights into the problematic aspects of the tool's functioning, it was impossible to declare the experiment a success and continue the innovation process at the same intensity and scale.
However, while the lead and technical innovators, together with key consultants, were engaged in collaborative efforts to interpret user feedback, narratives circulating across the organization increasingly portrayed the tool as delivering on its promises. Consequently, the lead innovators were contacted about the tool's availability for testing in other organizational units. Soon, the innovators reported in internal meetings that the tool had gained widespread demand within the organization – it was now ‘desired by everyone’. The innovators and consultants perceived these inquiries as proof of the tool's potential value, which justified the innovators’ further efforts to prepare it for production and scaling: Everyone is really excited about this [tool]! The managers have a strong desire to get this [tool] – do we give them a beta or…? The attraction around it [the tool] is huge; [a team of users] wants it for the summer for their substitute workers. Customer Service have asked if they could get their own. Additionally, several other possible use cases have been identified. (Team meetings, winter–spring 2024)
Simultaneously, two other frames – market and civic – were enacted during phases 2 and 3 to redirect attention towards what the organization and its users could do to improve the tool's outputs in terms of accuracy, precision and consistency. The market frame highlighted the tool's potential cost savings for the organization. The technical and lead innovators, together with consultants, jointly sought information to estimate the tool's potential cost-effectiveness and prepare concrete business case calculations for the collaborating organizational managers. Yet, proving the tool's market worth turned out to be impossible; in fact, the experimental results produced evidence against it by indicating that realizing the tool's potential value would probably require excessive resources from frontline workers and other organizational actors. However, the leading innovator continued to insist on the market frame in internal discussions with other innovators: The business impact of this is, like, really big, and I don’t think that our team has even realized just how big it is.
Have you calculated it? No, but (…). (Exchange between a leading innovator and a team member during a discussion of the likelihood of leadership supporting further experiments, winter 2023–2024).
Simultaneously, the industrial (the comparative precision of the tool vis-a-vis humans) and civic (more equal and fair decisions) frames were enacted more intensively to compensate for the market frame's failure to show the tool's value. Their combination deepened the discussions among the lead and technical innovators and the consultants, which revolved around human experts’ limited technical capabilities and errors versus the potential of AI technologies to enhance human experts’ decisions. To counter criticisms of the tool's imprecision and inaccuracy, organizational data on annual citizen complaints about claims decisions were retrieved to illustrate the scale of human errors. The innovators emphasized the tool's potential to streamline, standardize and improve the accuracy of information retrieval, thereby potentially reducing complaint-related costs. The ideas from the market, industrial and civic frames circulated actively in small group meetings of innovators and consultants, highlighting how complaints increase workload, strain resources and escalate organizational expenses: When we talk about generative language models, we just have to accept that these will make mistakes. You never get rid of that. And [human] mistakes are being made all the time in [claims] processing even now! If the data are correct, the AI should not be biased towards fatigue and so on. Now, the assumption is that the machine would be worse than the human. So, this could be a quality-enhancing tool, and we could escape the black-and-white comparison [between AI and human decisions]. If only we could break that illusion of infallibility, that [the organization] is infallible; a human is not infallible. This is an opportunity to make a big impact [with the tool's innovation]! (Small group meetings, winter 2023–2024).
Later, once it became clear that the tool's outputs require additional interpretative work by human experts, the innovators’ argumentation evolved into emphasizing attentive collaboration between claims specialists as users and the AI tool, with the aim of improving the specialists’ decisions while reducing technological risks (cf. Elish, 2019). The lead innovators defended the tool by stressing how it merely supports human decision-makers and that the tool's positive impacts on efficiency and productivity can be achieved only if users understand the assistive ‘nature of AI’, as this fieldnote excerpt demonstrates: Instead of considering it a decision support tool that requires precision, we could advocate for it as a sparring buddy, where its outputs are not strictly right or wrong; instead, they need interpretation rather than accepting them as given. [Team meeting, winter 2023–2024]
A year into the AI tool's innovation process, its inconsistent and inaccurate functioning persisted, and its operations remained largely a black box (Bender et al., 2021) – ambiguous, uncertain and uncontrollable – even to the most seasoned technical innovators. Yet, the lead innovators were keen on continuing the tool's development to production and scalability, as the promises embedded in the five tool-oriented justification frames sustained strong momentum behind the project.
The process-oriented frames: sustaining innovation through testing
Two justification frames – the control-emphasizing industrial frame and the flexibility-highlighting project frame – were collectively enacted from phase 1, with justifications grounded in how the innovation process itself was organized and controlled by the innovators in collaboration with key consultants.
The industrial frame dominated the tool's development process through meticulous collective efforts to establish a controlled, evidence-based approach that relied on the best intellectual resources and systematic experimentation to validate the tool's value. The multiprofessional team of innovators, supported by key consultants, was mastering innovation as both a praxis and a competence. They drew primarily on frameworks, templates and best practices from the private sector design industry. These intellectual resources were catered for the team by the external consultants. These resources were then deliberated, tested and iterated in workshops and meetings throughout our fieldwork to align with the team's needs. The resulting setup was subsequently and repeatedly presented to internal and external stakeholders, effectively reinforcing the image of a well-controlled and methodical innovation process: ‘Let's prepare a logical argument for it. (…) Now we have enough facts to move on’ (Leading innovator, team workshop, autumn 2023).
The project frame was enacted through presenting the innovation process as fast, agile and adaptable collaborative work, capable of flexibly adjusting plans and teams in response to emerging information and experimental results (see Pisano, 2019). This involved continual teaming, re-teaming and sub-teaming to run experiments, evaluate them and define next steps with pace. The experiments conducted were not incidental disruptions but deliberate, constitutive elements of the innovation process, designed to probe, stretch, downplay and occasionally redraw organizational boundaries while negotiating tensions and competing values. The experiments also made the innovation process more predictable, with its structure of moving from one experiment to another, each introducing new collectively formulated hypotheses and new groups of test users.
Through the experiments, the innovators engaged in collaborative, competitive and configurational boundary work with other organizational actors, particularly managers of the frontline work and claims specialists, whose involvement appeared critical for sustaining the tool's development and process around it. The collaborating managers facilitated five rounds of experiments and provided two experienced claims specialists for ad hoc consultation. While these specialists contributed valuable insights from their daily work, their influence on the tool's design was limited. As their roles were defined by the innovators, they never fully acquired the role of boundary spanners (Levina and Vaast, 2005).
Importantly, following Boltanski and Thévenot (2006, 133–138), these experiments can be understood as ‘reality tests’ that ‘enable judgments to reach a grounded and legitimate agreement’ (Boltanski and Thévenot, 1999, 367), thereby enacting the industrial frame as a justificatory act. These were moments when the innovators and stakeholders came together to experiment with a generally shared conception of values they sought to advance and protect and to assess whether those values were being realized or put at risk through the tool. Key worths to be tested here were industrial and civic – work efficiency and productivity and the accuracy and uniformity of claims specialists’ decisions for fair and equal treatment of citizens. The innovators, consultants, managers and claims specialists aligned around the expectation that the tool would deliver accurate, precise and consistent outputs to support decision-making. When the tests revealed that this goal was unattainable, the lead and technical innovators creatively proposed new directions for development and testing instead of treating the test results as grounds for pulling back. The experiments gave them a controlled ‘specific epistemic advantage’ (Jackson, 2014, 229), enabling justification for continuing the tool's innovation process and framing unexpected outcomes as part of a learning trajectory. In this way, further experiments enacted the project frame by continually generating new findings on which the team could flexibly iterate. The sense of being on a learning curve also discouraged open criticism of the tool itself and redirected attention towards organizational and user-related factors that were assumed to affect its performance. The justifications for rapid experiments were developed by the lead innovators and key consultants and communicated repeatedly within and beyond the organization as follows: Without a rapidly set-up experiment, we might have made wider investments in a technology that would not have answered customers’ needs and expectations. We also acquired valuable insights going forward. (Lead innovator in media coverage, autumn 2023)
The tool's failure to perform as promised was further normalized as ‘business as usual’ and presented as a prerequisite for collaborative learning essential to successful AI innovation. This framing emphasized the need for constant revision of plans and expectations, making the tool irresistible to powerful organizational actors as a ‘thing’ (Suchman, 2023) deemed worthy of continued development through future releases: The important thing here would be that “continuous development”, that it is already delivering value and benefits, but we have to keep developing it. It [the tool] already deserves to exist. This is a change in the way we think; it [the tool] will develop through future releases. [Failures] are really annoying when you are very close. But being very close can mean that this is already worthwhile. (Lead innovator, team meeting, summer 2024)
Furthermore, the innovation team adopted new initiatives, such as consultant-led AI ethics workshops and design-oriented research on the tool's user interface, in response to internal criticism emerging within the team during phase 4. The lead innovators enthusiastically highlighted these initiatives in discussions with team members and collaborating stakeholders: We also talked with [the researcher] about improving the tool's user interface. It can help us solve the technical problems and improve the tool's functioning by steering users to use it in certain ways. (Team meeting, spring 2024)
However, when these activities did not improve the technical core of the tool – the LLM-based solution – which appeared to be the key reason for the tool's failure to meet its promises, they were largely ignored by the lead and technical innovators. It was the managers of the frontline work who objected to further testing and implementation due to the problems with the tool's reliability. However, a few months later, we learned that the innovation process had resumed at the request of other organizational managers and was being advanced to the piloting stage.
The innovation ideology-oriented frames: legitimizing boldness and speed
Two ideology-grounded justification frames – project and civic – contributed to legitimizing the bold initiative and rapid pace of the tool's development. These frames were enacted by the lead innovators in collaboration with key consultants and organizational managers and were maintained throughout the innovation process by all innovators.
The project frame was characterized especially by highlighting the need for a fluid and experimental mindset – the willingness and capability to ‘move fast, fail fast’ and, importantly, to learn more. This emphasis on speed and iterative learning reflected not only a response to technological uncertainty but also the diffusion of private sector innovation principles into the public sector innovation practices, increasingly shaping state digitalization agendas. Mediated by consultants and innovation networks, these principles introduce a market-oriented rationality that frames agility, experimentation and taking risks as virtues in public administration (Sharon, 2018). Thus, the innovation process involved not just adopting a new technology but enacting an ideology of constant adaptation and innovation acceleration rooted in private sector practices. The innovators’ alliances with organizational managers resulted in configurational boundary work enabling the creation of ‘experimental spaces’ (Zietsma and Lawrence, 2010) for continued real-world testing of the tool by claims specialists. Rapid cycles of development and experiment were framed as sources of continuous learning and safeguards against costly technology missteps, thus presenting the team as agile and adaptable. The lead innovators, who served as key brokers between the innovation team and organizational management, applauded the team for their bold initiative and speed in planning the next rounds of tool testing: Absolutely – this is such a big thing that we need to press the accelerator! I like the speed here, straight to an experiment. (…) This is awesome! (…) Great thing, this was fantastic news to end the day with. (Team meeting, autumn 2023) The organization now has good speed – many other organizations are still wondering what this [generative AI] could mean. (Internal presentation to leadership, summer 2023)
The civic frame was enacted specifically by the lead innovators and a key consultant, who drew on their powerful organizational positions and wide innovation networks to secure an official mandate for the innovation team to leverage new technologies such as generative AI. This choice of technology was actively legitimized by them in media appearances and at various public events. The civic frame was enacted through imaginaries about AI as an inevitable force for positive social change (Bareis and Katzenbach, 2021; Wirtz et al., 2019), and an opportunity society could not miss. They emphasized the anticipatory power of AI technologies to improve citizens’ access to public services and the recognition of citizens’ diverse and complex individual needs: Our goal is for AI to improve daily lives in Finland. AI helps serve people in need of support. It is impossible to respond to changing individual needs without smart automation or AI. (Lead innovator in media coverage, winter 2024)
Within the organization, the civic frame was enacted by the lead innovators in their calls for organizational changes, including revising restrictive cloud policies and guidelines to facilitate their processing by AI technologies and equipping future users of AI technologies with the necessary skills.

The justificatory package of tool-, process- and ideology-oriented frames, highlighting a mutually reinforcing configuration and the persuasive power of the whole. Emerging criticism related to the tool's technical limitations is highlighted in red.
Discussion and conclusion
The rapid proliferation of generative AI has intensified innovation efforts and sparked widespread optimism about its potential to optimize public administration. These dynamics are located within a wider international policy landscape around AI in public services (OECD.AI, 2025). Our study unfolded alongside this new wave of excitement and urgency, providing a timely opportunity to examine how an innovation team in the Finnish public administration sought to realize the emerging promises of generative AI. To capture how the generative AI tool's innovation process was kept in motion despite its technical limitations to deliver on its promises, we combined frameworks of boundary work (Langley et al., 2019) and regimes of justification (Boltanski and Thévenot, 2006). Alongside recent analyses of how AI systems are continuously evaluated and maintained across their production chains (Wirth et al., 2025), we conceptualize how organizational actors sustain AI innovation trajectories through promissory expectations and justificatory work. In doing so, our findings align with research on flawed or faltering AI innovations in the public sector (e.g. Ratner and Schrøder, 2024), highlighting the often-overlooked challenges behind such initiatives. Below, we discuss our study's contributions in greater detail.
Situating our study at the intersection of boundary work and regimes of justification deepens our understanding of innovation as a highly politicized process (see, e.g. Braunsmann et al., 2022; Suchman and Bishop, 2000). This perspective reveals how nine powerful justification frames (Table 1) were collaboratively constructed to mediate tensions between technical limitations and the innovators’ ambitions, shielding the tool from internal and external criticism. In doing so, these frames kept the innovation process in motion through ongoing experimentation despite the absence of demonstrable technical success. They formed a protective structure around the tool (Figure 2), rendering the tool and its development increasingly irresistible within the organization. The five tool-oriented frames drew on common promises of AI technologies – efficiency, cost reduction and fairness (Wirtz et al., 2019) – and were particularly vivid during problematic phases when the tool repeatedly failed to perform as promised. They safeguarded the tool's imagined value from criticism concerning its lack of accuracy, precision and consistency. Meanwhile, ideology- and process-based justifications created institutional conditions that legitimized bold initiative, speed and an experiment-driven agenda. Far from being passive accompaniments, these frames actively shaped the innovation's trajectory by normalizing setbacks, sustaining momentum and legitimizing the overall process. This perspective complements the sociology of expectations, which has shown how promissory narratives organize innovation (Borup et al., 2006; Dandurand et al., 2020), by specifying how such narratives become powerful through their anchoring in justification frames and plurality of moral goods they invoke.
Importantly, our findings show that the enactment of the justification frames required the innovation team to navigate within, across and beyond organizational and professional boundaries. Boundaries within the innovation team, between innovators and management, between innovation and frontline work and between public sector innovation and the private sector design industry were strategically reshaped to shield the tool from criticism and sustain the innovation process, despite the innovators’ limited capacity to improve the tool itself.
Collaborative and configurational boundary work enabled the innovators’ alliances with organizational managers and consultants, while competitive boundary work, especially by the lead and technical innovators, reinforced the divide between the flexible world of innovation and the controlled routines of frontline work (Choroszewicz, 2025; Davies, 1983). For example, when the tool's promises proved unattainable, the lead innovators reframed the tool's success as dependent on the organizational changes and appropriate user engagement, emphasizing human involvement as central to AI technologies (Le Ludec et al., 2023; Tubaro et al., 2020). This strategy reflects what Siffels and Sharon (2024) describe as the constructivist aspect of technosolutionism, where problems are redefined to fit a pre-existing technological solution, shifting the burden of adaptation onto users and organizations rather than questioning the tool itself. Together, these dynamics show how justificatory regimes depend on boundary work to prevent or neutralize criticism, obscure failures and constrain alternative innovation pathways, thereby advancing our understanding of how organizational dynamics and discursive practices jointly sustain innovation trajectories, even when technical success remains elusive.
Our findings further underscore that the persistence of the innovation process was inseparable from how failures to meet the tool's promises were managed, reframed and justified. Rather than treating repeated breakdowns as grounds for halting work or reconsidering alternative technologies, the evolving justification frames prevented these breakdowns from challenging the imagined worth of the tool and at times even transformed technical failures into signs of progress. In this sense, the five experiments failed to function as reality tests (Boltanski and Thévenot, 2006, 133–138) of the tool's viability and its capability to deliver on its promises, succeeding instead as engines for sustaining the innovation process through the robust justificatory framing of setbacks as necessary steps in a learning trajectory and the continual rearticulation of the justificatory packages more broadly. These experiments were also successful in uniting the hands-on innovators and in fostering alliances with powerful organizational actors to address the uncertainty surrounding rapidly emerging technologies like generative AI (see Ananny, 2024). This dynamic reveals a paradox; although justification frames sustained the innovation process for over a year, they simultaneously constrained the tool's innovation by discouraging the exploration of radically alternative technological pathways. Internal criticism within the innovation team played a role, but its effectiveness was limited by the fact that these critical voices were not deeply involved in the experiments or in assessing their outcomes in ways that could have strengthened the experiments’ role as reality tests. Internal criticism was further constrained by the flexible enforcement of technical expertise, which allowed the lead and technical innovators to manage expectations and maintain the momentum of the AI tool's innovation despite its persistent shortcomings. By foregrounding failure as both a technical and discursive phenomenon, our findings advance theoretical insight into how justificatory regimes not only legitimize innovation but also shape its trajectory, often at the expense of genuine learning and technological diversity. In Callon's (1986) terms, the AI tool and its justificatory package came to function as an ‘obligatory passage point’ for addressing organizational challenges, making alternative technological trajectories harder to articulate and pursue.
Taken together, these insights invite us to view justification frames not only as isolated units but also as interconnected sets that form tactical combinations that are particularly appealing across multiple organizational and professional boundaries. The environment in which justifications are made and received is crucial. Specifically, we identified an enduring combination of the industrial, vitalist and civic frames – promising efficiency for the organization, well-being for frontline workers and more equal, transparent and consistent decisions for citizens – that formed a powerful trio capable of enacting broad value resonance, especially within the context of the Nordic welfare state (Winthereik et al., 2024). As the innovators were able to enact such a compelling justificatory package, the AI tool seemed unstoppable. This helps explain why certain technological visions, fictional expectations and promissory narratives (Beckert, 2016; Borup et al., 2006; Dandurand et al., 2020) can become insistent drivers of innovation processes and crystalize into what Van Lente and Rip (1998) call a
Our study also contributes to the literature on regimes of justification by extending its application to the domain of generative AI innovation, where justification frames do not merely defend existing arrangements but actively constitute the ‘thingness’ of emerging technologies (Suchman, 2023). We show that understanding innovation requires attention to the collectively enacted discursive infrastructures that make technologies appear inevitable and irresistible, even when their actual performance remains limited.
Our study also shows that in the context of generative AI innovation, acknowledging the failure of these innovations is particularly challenging due to the opacity of LLM-based tools and the strength of surrounding narratives. These tools function as black boxes (Bender et al., 2021), making it difficult for innovators and other actors to identify the causes of tools’ failures and possibilities for correcting them to meet expectations of accuracy, precision and consistency. This opacity creates a need to shift attention from technical issues towards transformative practices (Houston et al., 2016) aimed at either
Finally, the study has certain limitations that can be addressed in future research. It is based on a single case study in one public organization during an early period of excitement around generative AI. Comparative studies across multiple public organizations and later stages of the hype cycle could reveal whether our findings apply more broadly to other generative AI innovation processes over time. Future research could examine how justifications and boundary work unfold not only during concept building and experimentation but also in later piloting and scaled use of similar AI tools. Finally, further research should examine the boundary work of frontline workers who experiment with and use these tools in their everyday practices, focusing on their responses to disruptions and to the justifications enacted by innovators.
Footnotes
Acknowledgements
The authors thank the anonymous reviewers and the editor, Matthew Zook, for their thoughtful comments, which helped improve this manuscript. The authors are grateful to Tuukka Lehtiniemi for the invitation to present an early version of this paper at the seminar ‘Repairing (with) Algorithmic Systems’, and its participants, especially Minna Ruckenstein and Alison Powell, for their valuable comments. Finally, the authors would like to thank the public organization, its employees (the organizational managers, the innovation experts and test users) as well as the collaborating consultants and other stakeholders for granting access to the observed meetings and events and agreeing to be interviewed.
Author contributions
The authors contributed equally to this work.
Funding statement
The authors received the following financial support for the research: Marta Choroszewicz received a two-year grant from the Finnish Cultural Foundation (Teresia and Rafael Lönnström Fund) for the project ‘Perpetual piloting and invisible work of automating public services in Finland’ and a one-year grant from the Ella and Georg Ehrnrooth Foundation for the project ‘Navigating experimental AI tools within technology-enthusiastic context of Finnish public administration’; Antti Rannisto received financial support from the research project ‘Civic Agency in AI? Examining the AI Act and Democratizing Algorithmic Services in the Public Sector’ (no. 357349) funded by the Research Council of Finland.
Declaration of conflicting interest
Marta Choroszewicz declares no potential conflicts of interest with respect to the research, authorship, or publication of this article. Antti Rannisto is employed part-time by a consultancy that provides digital and AI-related advisory services to organizations, including the public sector organization studied in this article. He had no involvement in any consultancy work related to the case reported here, did not participate in or oversee any such advisory projects, and had no relationship with the organization beyond his role as an academic researcher.
