Abstract
A defining feature of generative artificial intelligence (GenAI) is its capacity to generate novel and creative outputs, a capability that sets it apart from traditional discriminative AI (Acar et al., 2024; Bouschery et al., 2023). This has given rise to a new interest in artificial intelligence (AI) as a tool for innovation. As the majority of data pertinent to innovation projects is textual (market insights, customer needs, technological knowledge, concept descriptions, project documentation, etc.), the growing capabilities of large language models (LLMs) such as OpenAI’s ChatGPT, Google’s Gemini, or Meta’s Llama have contributed to the increased interest in AI as a novel innovation tool (Piller et al., 2023). While prior literature has discussed the relationship between GenAI and innovation in general (Bilgram & Laarmann, 2023; Cooper & Brem, 2024; Füller & Hutter, 2024; Roberts & Candi, 2024), we ask in this article:
We propose that answering three questions can guide organizations in navigating the opportunities of AI (especially GenAI) for innovation. First, firms must structure their tasks (and desired outcomes) regarding the level of trust required when engaging a GenAI to support a development process. Second, managers need to know when general AI models are sufficient and when customized models trained for a firm’s specific domain are required to generate a trustful outcome. Third, different AI models require different skills and capabilities from their users. Only when the skills match the task requirements will using GenAI lead to higher innovation performance.
Question 1: Do We Need Trust in the GenAI Outcome?
One of the most prominent concerns organizations have when deploying GenAI is its inaccuracy (McKinsey, 2023), such as LLMs’ infamous hallucinations of making up facts and references (Ji et al., 2023). Recent developments like combining a generative output with a web search have reduced this tendency. Still, everyone who asked ChatGPT to draft their curriculum vitae (CV) will have recognized that the algorithm makes up facts. While easily spotted in one’s CV, this can lead to severe concerns when summarizing technical literature or prompting the system to identify latent customer needs. Ultimately, decisions along the innovation process must be based on accurate attributes developed from robust prediction models using the best available data. This points to the need for these systems to be developed and trained specifically for an industry or product line to derive trustworthy results from domain-specific data, leading to informed decisions.
However, there are also situations wherein inaccuracy is not inherently bad. For example, in upfront ideation and discovery tasks, organizations don’t need to trust GenAI’s results. Here, GenAI’s hallucinations are a feature that fosters creativity. In ideation, creativity and out-of-the-box thinking are the goal. Trained models could lead to path dependencies and unwanted biases. Thus, contrary to popular belief, we suggest that the issue is not to find a way to make all AI-generated results accurate so that they are always trustworthy but rather to be able to assess the level of trust that specific use cases require and adjust the accuracy of GenAI results accordingly.
The level of accuracy of, and therefore trust in, the GenAI depends on the kind of data and resources invested in its development and training. Organizations must first understand the factors and their interplay that make up a particular task to assess whether the GenAI’s underlying models sufficiently reflect these in their ground truth (Lebovitz et al., 2023). Second, they must ensure the quality and quantity of data that the model training process requires to achieve the desired accuracy of AI-generated results. By understanding the trust dimension, organizations can manage expectations and allocate resources accordingly and effectively. For example, understanding customer sentiment on a certain product using ChatGPT will give you an answer, but the accuracy will be low. However, using an LLM on internal customer reviews can provide highly accurate insights into customer preferences (Yuan et al., 2022). Some starting questions to explore the trust dimension further are: What are the output expectations? Will the AI’s output be used as a thought starter or as the base for a decision? What type of information is required as input for the model: internal or external data? If external information is sufficient, how credible and verifiable are the underlying data and training?
Question 2: Do We Need a General or an Expert Model?
AI models can be differentiated into general and task-specific expert models.
Additionally, the structure of the models and built-in fluency can help summarize and categorize information extremely efficiently. Finally, these abilities can be accessed at a very low cost. But at the same time, the knowledge capabilities of these systems cloud people’s understanding and expectations of what these systems can do. Using ChatGPT for ideation can yield interesting ideas (Bouschery et al., 2024; Guzik et al., 2023; Meincke et al., 2024). However, the underlying model is not explicitly trained for a specific market. Therefore, even if the response sounds credible because the underlying language model excels at fluency, it is not credible enough to make authentic design and business decisions at advanced innovation stages that require specificity and accuracy.
Navigating the Possibilities of GenAI for Innovation Management
A synthesis of these two questions and decision flows reveals a landscape that can facilitate navigation of the potential applications of GenAI in innovation management, as shown in Figure 1. The horizontal axis asks about the level of trust and confidence a task requires. The vertical axis shows the type of GenAI, either a general model or an expert system. The two left quadrants (#1 and #2) correspond to tasks where trust in the outcome is unimportant, as there will be other checks and evaluations of the results. These are generally questions in the problem space of an innovation project to develop compelling concept alternatives. The quadrants on the right (#3 and #4) address the solution space, where, for example, the best concept is transformed into a technical solution and launched into the market. Hence, these situations require high trust in the AI’s prediction. In the following, we explore these four quadrants in more detail.

Matching GenAI applications to specific innovation tasks.
Additionally, these models assist in navigating larger general knowledge fields by quickly assimilating and presenting information that would otherwise require extensive manual research, thereby enabling innovation teams to focus on applying this knowledge to develop and refine their concepts. In addition to knowledge extraction capabilities, text-to-image generators DALL-E, Stable Diffusion, or Midjourney are robust general models that can generate basic mock-ups to make “crazy” ideas more tangible. Much of the prior literature on GenAI for innovation can be placed in this field.
The promise of GenAI in this stage is to navigate the overall complexity that comes with the traditional tools and find optimal solutions more quickly. At the same time, engineers and designers often focus too much on the technical solutions they already know and are comfortable with. GenAI can help explore a more extensive solution space and overcome path dependencies. We see this combination of established (but also complex to operate) expert systems with GenAI capabilities as a significant opportunity (Davenport et al., 2023; Salvador & Sting, 2022). Marion et al. (2024) discuss the case of
Question 3: Do We Have the Right AI Capabilities Within the Innovation Process?
Our discussion of the first two questions proposes that organizations navigating the deployment of GenAI for innovation face a nuanced challenge, balancing concerns about inaccuracies, especially in critical tasks, with the benefits of GenAI’s creativity during ideation. The trustworthiness of GenAI hinges on understanding the specific requirements of different innovation tasks and adjusting its accuracy accordingly. This involves discerning between versatile general models, exemplified by widely used systems like ChatGPT, and task-specific expert models tailored to handle domain-specific complexities. Firms need to ensure GenAI aligns effectively with the intricacies of the specific use cases on the task level within their innovation process.
There is a third question organizations have to ask next to understanding the required trust on a task level and the difference between general and expert models: How to assess and develop the human capabilities of their innovation teams (Gama & Magistretti, 2023; Igna & Venturini, 2023; Kemp, 2024). Given the speed with which these technologies are evolving, the question of integration within the organization becomes more critical. As Siemens demonstrates, a top-down approach of strategically investing in internal development and strategic partnerships is one approach. However, these initiatives take time, and the latency of implementation may result in technology deployment that is already dated. Hence, bottom-up, more democratized approaches are required, too, where teams and individuals select, use, and build tools as they see fit. While this comes with challenges within an organization, such as control and security, this “citizen development” approach fosters speed and agility—and is primarily supported by the abilities of GenAI (Davenport et al., 2023).
Conclusions and Outlook
Incorporating GenAI into the innovation process presents a unique set of opportunities and challenges. As previously stated, the most effective means of leveraging GenAI is understanding the subtle nuances of trust, the distinction between general and expert models, and the alignment of human capabilities with AI tools. The decision-making process regarding the degree and manner of trust to be accorded to AI outputs is not merely a technical consideration; instead, it is a strategic one that can influence the trajectory of innovation within an organization. Our investigation underscores the significance of contextual factors in determining the optimal degree of trust in AI-generated outcomes. In the ideation and discovery phases, where creativity and unconventional thinking are paramount, the inaccuracies of GenAI can be utilized as a distinctive feature rather than a shortcoming. Conversely, in the later stages of product development, where precision and domain-specific knowledge are paramount, reliance on expert models becomes indispensable.
This dual approach to AI deployment suggests several avenues for future research. It is incumbent upon scholars and practitioners alike to investigate the manner in which disparate industries and organizational contexts shape the equilibrium between creativity and accuracy in AI-generated outcomes. Furthermore, the ever-changing landscape of AI capabilities, like the current emergence of AI agents, necessitates further investigation to ascertain the optimal ways of utilizing these tools to enhance decision-making and innovation. In this context, Hagendorff et al. (2023) suggest studying LLMs’ behavior and reasoning abilities. They propose to engage LLMs in behavioral experiments that have traditionally been aimed at understanding human cognition and behavior to better understand their “reasoning” and factors that generate a specific response. Our findings highlight the necessity of a strategic and multifaceted approach to AI integration for practitioners. It is imperative that organizations not only invest in the requisite technological infrastructure but also cultivate the human expertise necessary to interpret and leverage AI outputs effectively—as well as the human expertise to know the nature of the task in the first place. The potential of GenAI to democratize innovation processes through “citizen development” initiatives, as well as the risks associated with inadequate training and oversight, require careful consideration.
In conclusion, as GenAI continues to reshape the innovation landscape, the critical challenge for researchers and practitioners is to develop frameworks that balance the inherent trade-offs between creativity and accuracy, generalization and specialization, and automation and human judgment. By addressing these challenges head-on, we may come closer to unlocking the full potential of GenAI for innovation.
