Abstract
Introduction
In the field of architecture, effective ideation hinges on the ability to represent ideas. Traditionally, drawings, photographs, and other visual media have been used to stimulate ideation and communicate design concepts. However, recent advances in generative artificial intelligence (AI) have made it possible to generate detailed and realistic representations of architectural concepts, using prompting in natural language as a general-purpose interface.1–6 The use of generative AI and procedural design is not new to architecture, and their use date back to the 1970s. 7 However, text-guided generation marks a paradigm shift that could affect the architectural design process. Text-to-image generation tools3,8 are one such example of generative AI. Text-to-image generation tools can allow for a quick conceptualization of ideas with natural language during the idea generation process. Thus, these tools have the potential to transform the way architects and designers develop and communicate their ideas.
In this article, we study how different text-to-image generators can support creativity during the “fuzzy front end”
9
of new concept development in the early stages of the architectural design process. In particular, we investigate: 1. How can text-to-image generators support creativity and ideation during the early stages of architectural design? 2. How effective are out-of-the-box text-to-image generators in the context of architectural design, and what future considerations could developers take into account? 3. What are the typical challenges of text-to-image generator use and text prompting for novel users?
In a laboratory study, 17 participants developed a concept for a culture center using three popular text-to-image generators; Midjourney, 10 Stable Diffusion, 3 and DALL-E. 8 Through standardized questionnaires on creativity support tools and group interviews, we learned that image generation could be a meaningful part of the design process when design constraints and imaginative ideation are carefully considered. Generative tools support the serendipitous discovery of ideas and an imaginative mindset, which can enrich the design process. Through our study, we highlight several challenges of image generators and provide considerations for software development and educators to support creativity and emphasize designers’ imaginative mindset.
Related work
To contextualize the use of text-to-image generators in architectural design, we first describe the different ways creativity has been approached in the architectural design process. Then, we present current literature on how creativity-supporting generative tools have been applied in the architectural design process.
Creativity in the architectural design process
Architecture is highly relevant to creativity as a field concerned with solving problems in contextual and effective ways. 11 The prior literature has developed multiple notions for understanding what aspects affect creativity and how it can be supported in architectural design. For instance, Sarkar and Chakrabarti formulated creativity as the function of novelty and usefulness, allowing for the assessment of different design outcomes. 12 This two-factor definition of creativity into novelty (or its related synonyms, such as originality, unusualness, or uniqueness) and usefulness (or its related synonyms effectiveness, fit, or appropriateness)—has become popular in scholarly literature as a way to operationalize and measure the concept of creativity. 13 However, creativity in architecture is not only related to producing a final outcome that is novel and useful but it is also related to the application of one’s creative skills during the creative design process. Casakin et al. found that creative thinking skills are more related to verbal skills rather than figural skills. 14 Consequently, the authors proposed that the creative skills in architecture are also generalizable to other problem-solving areas in life. Additionally, Baghai Daemei & Safari found that students’ experience of design processes is a critical aspect of creativity. 15
Creativity has also been operationalized in architecture using various tools. Park et al. studied how text stimuli can improve a person’s imagination in nonlinear architectural design tasks.
16
Using text snippets from Italo Calvino’s
Text-to-image generation in architectural design
Text-guided diffusion models1–4 have become a popular means of synthesizing novel digital images from input prompts written in natural language, and generative AI is increasingly being employed in academia and in the industry. Generative AI in architectural design has been explored in two surveys. Through reviewing machine learning research trends in architecture, Ozerol & Arslan Selçuk found that generative AI is rising in popularity. 20 However, machine learning was more often applied in 3D generative methods than in 2D. This suggests that the fast development of text-to-image generative methods has not yet reached the architectural research community. In a survey of generative systems in architectural, engineering, and construction research between 2009 and 2019, BuHamdan et al. found that many generative methods are focused on architectural, structural engineering, and urban design disciplines. 21 However, the most popular architectural use cases (facade design, form generation, layout generation) represent more geometric processes, and the role of more conceptual creativity in generative systems is still left to be unexplored.
Text-to-image generation systems provide an easy-to-use interface due to the ability to respond to natural language prompts. However, the creativity of text-to-image generation currently still hinges on the skill of its users. 22 To control the output, users have to resort to special keywords in the prompts to produce images in a certain style or quality. 23 Longer prompts also typically produce images of higher quality. 24 While text-to-image generation tools can be intuitive, their application in the context of architecture remains yet to be explored.
Seneviratne et al. used a systematic grammar to explore the robustness of a text-to-image generator in the context of the built environment. 25 The study found that the image generator was broadly applicable in the context of architecture. However, architectural semantics contain ambiguities, 26 and the real-world benefit of text-guided image generation remains to be explored. In this article, we specifically focus on how text-to-image generation tools can support human divergent creativity during the early-stage concept design process.
Method
We designed a laboratory study in which architecture students engaged with text-to-image generators in a short architectural design task. The study was tested in a formative pilot study in which two authors and three colleagues used text-to-image generators to create their dream home. The formative pilot informed the design of the main study design, as follows.
Study design and procedure
We conducted three sessions (henceforth
The study procedure was as follows. Participants were first asked to provide informed consent. Participants were then given an introduction to text-to-image generators. The text-to-image user interfaces were not modified in any way. Participants were introduced to the tools with a short presentation on how to use the text-to-image generators for architecture and some unrelated example images produced by each of the three tools. Participants were then given a short interactive tutorial task that revolved around generating and iterating an increasingly complicated image of a pineapple. Only basic functionalities—text prompts and generating variations or upscaled versions of generated images—of the tools were allowed during the session to ensure comparability of the three tools. Advanced features that participants were not allowed to use during the study include features like inpainting or using generated images as the basis for future generations. These features were omitted since not all of the three tools contain all advanced features. Participants were then presented with a design brief and began working on the task.
Participants had between 1 h 15 min and 1 h 25 min to work on their task. The duration varied according to how long the initial stages of the study lasted and at what point the participants felt they had completed the task. In each session, all three tools were used, with 1–2 participants each using one of the three tools. Each participant had their own laptop (either personal or borrowed) to work with. While they worked individually towards their own solution, participants could talk and discuss freely amongst each other, as well as take breaks when needed. This decision was taken to emulate a collaborative work environment in an organization and to support the participants’ individual creative needs. One researcher was present at all times to answer questions. A second researcher acted as an observer, taking notes of any interesting discussions and observations during the design session. The sessions were recorded using a conference microphone that could capture the audio in the whole meeting room. Once participants finished working on the design task, each participant presented their work to the group. To further motivate the participants, we asked the participants to vote for the best work using ranked choice and ranking their own design lowest. Participants were compensated with a 15 EUR gift card and the winner of each session was awarded an additional 15 EUR gift card.
Data collection
At the end of the session, we administered the Creativity Support Index (CSI) 27 to evaluate how the image generators supported the participants’ creative processes. Based on the recommendations by Cherry and Latulipe, 27 collaboration was marked as an optional item. This approach was adopted as participants were allowed to collaborate, but some opted not to do so actively. After the ideation sessions, we conducted semi-structured group interviews focused on three main aspects: (1) how well the tools could produce the required images, (2) whether the tools provided novel solutions and (3) how participants thought the tools could be used in their design practice. As we learned more about the tools in the first session, we also focused on participants’ comments about an “ideal” tool that would support their design tasks. Two researchers conducted the interview, with one leading the discussion and another acting as a scribe, taking comprehensive notes. The participant’s comments during each session were written down by a researcher during the interview, and complemented with transcribed audio recordings. The full commentary consists of 2905 words and the audio recordings are approximately 50 min in total. The commentary was analyzed using content analysis 28 to identify the participants’ experiences using the tools.
Image generation tools
The study was conducted with three text-to-image generation tools: Midjourney (version 4;
Participants
The list of 17 study participants and their ages, genders, self-reported years of studying architecture, study session, image generation tool, and prior experience with image generation.
Results
In the three sessions, participants produced images with diverse prompts. In the following, we describe the generated images, analyze the prompt language used by the participants, and then the interview data, and general feedback during the sessions. In the qualitative section, we evaluate the efficacy of the image generators in facilitating the design task, examine the participants’ utilization of prompts to visualize their ideas, and discuss the qualitative insights gleaned from the group interviews.
Generated images
Overall, the participants’ concepts were understandable, as expressed through the generated images of floorplans, material samples, and indoor views. All participants were able to deliver all required images for their short presentation and had ample time for ideation and exploration during the session.
The participant-chosen winners of each session are depicted in Figure 1. Participants P4 (SD), P9 (SD), and P12 (DE) won in their respective sessions S1–S3. As the voting was used for motivating the participants, we refrain from making exhaustive differences between the winners and other works, but in the following, we describe some aspects we noticed. The winners’ generated images suggest a strong sense of presence and shape language. Notably, all three winners focused on using wood in their designs. The three participants’ floorplans, interior views, and facade materials voted best works from their respective sessions S1–S3. (a) P4 floorplan. (b) P4 interior. (c) P4 material. (d) P9 floorplan. (e) P9 interior. (f) P9 material. (g) P12 floorplan. (h) P12 interior. (i) P12 material.
Additionally, the winning designs use organic forms that would be difficult and time-consuming to model and produce with 3D modeling software. However, while the winning concepts use organic forms that was not the case for all participants, as approximately half of the conceptual images used rectilinear forms. Then, some approached the design task through biophilic design language, taking inspiration from plants, trees, and mushrooms. For instance, the concept by P4 (see Figure 1(d)) uses the scaled-up mushroom to convey space and atmosphere as if a person were standing under a mushroom.
Besides the winners, we highlight selected images that exemplify the participants’ more adventurous experiences with the text-to-image generation tools in Figure 2. Participants typically followed a straightforward methodology for generating their ideas, but especially at the beginning of each session, some exploration was conducted. For instance, P1 (see Figure 2(a)) prompted Selection of images highlighting how the participants used image generators in unexpected ways. In (a), P1 focused on representing a floorplan in an abstract style with a prompt “watercolour plan view thick black walls.” In (b), P7 used a more ornate style for the facade. In (c), P9 wanted to create a ”honeycomb”-style material, which produced an actual honeycomb. In (d), P13 employed the design task’s site by including factory chimneys, effectively suggesting location. In (e), P3 experimented with using strawberry as a facade material, and in (f), P13 could not generate usable floorplans so they went for a more experimental approach.
Generating floorplans proved to be especially challenging (see the floorplans in Figures 1 and 2). The floorplans were rarely the black-and-white plan drawings that participants had learned to expect and instead were usually colored, three-dimensional, or had nonsensible layouts (e.g., rooms with sharp corners or missing doors). Material samples were also challenging, as the image generators were ineffective in generating facades. Thus, many participants defaulted to simple images of the material or used exterior perspective views. At times the participants had to settle for images that did not meet their criteria, which can often occur when building prompts. 6
Creativity support
The mean and standard deviation values for the counts, scores, and weighted scores for each of the six subfactors in the CSI, across all three image generators. The mean factor counts are the number of times that participants chose that particular factor as important to the task (out of six factors). The factor scores range between 2 (worse) and 20 (better). Weighted factor scores are calculated by multiplying a participant’s factor agreement scale score by the factor count.
Prompts
Participants wrote a total of 588 prompts during the three sessions with an average of 39.2 prompts per participant (
The participants’ prompts were overall descriptive, using appropriate terms to describe an intended outcome. Some examples of prompts include: • architectural floorplan of concert hall and a grand staircase (P1, DE) • elevation detail photo of brutalist style culture center in winter time (P11, MJ) • floor plan of zaha hadid wooden culture center over the estuary in bright winter day (P12, DE) • akosnometric explosion image of cultural center floorplans (P13, DE) • architecture, floorplan, round, realistic materials, cultural center, HQ, 4k (P14, SD) • experiential architectural wooden pavilion building in the park (P17, MJ)
Prompt sequences
Participants would typically only start over with completely new prompts when results were unsatisfactory or when starting another task. We manually grouped similar prompts together, and identified breaks where participant switched from one sequence of prompts to another. These sequences (i.e., groupings of prompts belonging to one idea) consist of an initial prompt that was iteratively extended by the participant with keywords to improve the resulting images (see Figure 3). On occasion, participants would return to previous sequences later on, for example, Seq2 → Seq3 → Seq4 → Seq2. The following is an excerpt of a prompt sequence P13-Seq1 (DE) with changes in between prompts highlighted: 1. Modernistic and ecological cultural center in a waterfront environment with people standing in the foreground of the image 2. Modernistic 3. Modernistic, abstract and ecological cultural center in a waterfront environment 4. Modernistic, abstract and ecological cultural center in a waterfront environment in an island with people standing in the foreground of the image in a sunset The length of participants’ prompt sequences demonstrates the commitment of participants to a train of thought during the ideation session. Most ideas would spawn at least a few (

The sequences of prompts were up to 24 prompts in length (for P10, MJ) with an average sequence length of 8.4 prompts (
Prompt language
Figure 4 depicts the most common keywords and keyword combinations used by participants. The keywords largely focus on the design task given to participants: creating a concept for a regional culture center. We find that most participants adopted the language used in the design brief, as evident in their prompts. For instance, “floorplan” and “facade” were often used. Only few participants veered off from this given path and experimented with other terms, such as synonyms (e.g., using “wireframe” or “layout” instead of floorplan). As for the use of “prompt modifiers” commonly used by practitioners of text-to-image generation,22,23 a minority of participants used such modifiers to enhance their images. These modifiers include, for instance, HQ, 4k, photo realistic (P14, SD) and 8k, unreal engine (P15, SD). Names of architects were used very sparingly by participants in their prompts. The three names of architects were Most frequently used tokens in participant-written prompts. The plot on the left depicts the 25 most frequent tokens with stop words removed. The plot on the right depicts the 25 most frequent n-grams.
Qualitative insights
During the introduction, the participants quickly learned how to build basic prompts, and when working on the design task, the participants learned strategies that supported their creative goals. Participants stated the image generators were “
Synthetic images in the architectural ideation process
The participants in all three groups had different views on how the generated images supported their architectural design and ideation processes. As the participants had different backgrounds and amounts of experience with architectural design, their needs for creativity support tools varied. To understand how image generators can support their creative process, the participants compared image generation to familiar applications. The most frequent comparison was to Pinterest, a website that the students use as a source of inspiration for architectural ideation. One benefit of image generation was the capability to produce unique images that are closer to what the person aims for with their specific design concept—
While the tools were useful for generating images, participants had mixed opinions on how the tools and the images could be used for ideation. One participant explained that relying on image generation alone is not meaningful for ideation, “
However, participants had mixed opinions about how generative software can be part of a design process. On the one hand, the designer needs to actively make sense of the systems and how they are useful for meeting design goals. On the other hand, the systems were recognized to have an influence on the design process. As such, while the image generators provided flow-state creative experiences, image generators were not seen as neutral or value-free tools. As architectural design processes have grown ever more digital, in the future, architects require a better understanding of the value of the image generators:
Randomness and sensemaking
The participants found the tools produced results that seemed random to what they expected from the prompts. Using image generation was seen as “
Learning strategies
In order to generate meaningful images, participants implemented different strategies. Using relevant vocabulary was a key aspect. Participants found themselves using words and phrases which they use in their usual design processes but were not conducive to generating useful images. As floorplans were found especially tricky to generate, participants attempted to get better results or interesting variations by using alternative words such as
The length of the prompts was another point of strategizing. Participants had differing opinions on how the applications handled the prompts. For instance, P5 (MJ) found that modifiers like
Participants’ improvement recommendations
Drawing inspiration from other applications, such as Pinterest, one participant found value in being able to have a curated algorithm that supports a person’s personal style. Liking certain generated images could help this algorithm to produce similar images or recommend other generated images. Additionally, refining some specific parts of the image was seen as useful. While advanced editing features of the image generators, such as inpainting and outpainting, were not used in our study, it is interesting that participants recognized the utility of such features. Being able to specify and re-generate certain parts of the image would help to form a more precise concept.
Finally, some participants valued having constraints in the system. Architecture often works around the many parameters brought by the site’s context, so having them present would help the creative ideation process. For instance, one participant suggested: “
Unexpected results and prompting challenges
The participants faced several struggles while building their prompts, some of which were rather unexpected. Many participants tried to visualize ideas using references to local buildings and city features. These features are unlikely to be recognized by the image generation system, and while the generated images included parts of such buildings (e.g., factory pipes), features (e.g., islands or bridges), or landmarks, the images would not reflect the actual location of the design task in any meaningful way. For instance, P5 (MJ) tried the prompt of a
Another struggle of participants was removing objects and text from the generated images. Negatively weighted prompts
23
are a technical feature of image generation systems that can address this shortcoming. However, very few participants made use of this feature, and others tried to emulate the feature in their prompts, adding terms such as
While participants typically pursued a coherent idea, but occasionally after exploring many variations of the same prompt, participants noticed they were going down a path that did not lead to the expected results. These participants regressed to earlier versions of their own prompts. This implies that even with fine-tuning prompts, it is unlikely that the user can achieve exactly what they want. While the image quality can be improved with skillful text prompting, it can also be a limiting factor in the creative process if overused.
Discussion
One implication of our study is that text-to-image generators could offer ideation tools in the context of creativity in architecture. In current systems, a lot of effort has to be put into the prompt language, as the choice of words has a great impact. 24 Recent research in the field of Human-Computer Interaction (HCI) also points out text prompting as a skill with a learning curve. 6 Typically longer prompts lead to higher quality results. 24 Chang et al. describe prompt writing as an art form and that ”artists”—in our case, designers—also benefit from being highly skilled with natural language. 29 And indeed, when using AI for design, “A Word is Worth a Thousand Pictures,” as Kulkarni et al. eloquently phrased it in the title of their work. 30
Ideation and creativity with AI
In a 2013 lecture, the renowned contemporary architect Bernard Tschumi explained the role of visual media in architecture as “
Use of AI in the architectural design process—lessons learned
Through our design tasks with image generators, we learned how the students adopted the image generators and how our findings can inform the use of generative methods in the future. To this end, we propose points of consideration for developing image generators for the domain of architecture and practices for adopting image generation in architectural education.
Considerations for the design of image generators
The image generators we used supported the participants’ creativity equally. However, for architectural design purposes, we point out two main aspects that can improve the experience of concept ideation.
Our study shows that the image generators failed to generate conventional floorplan drawings. We suggest that due to the fact that floorplans are more abstract representations than, for example, perspective visualizations, the training dataset of floorplans images has insufficient labeling for architectural design purposes. While generative methods have been popular,
21
fine-tuned language models can support generating more refined floorplans through text.
35
However, the availability of architectural floor plan datasets is still an open issue, as well as the appropriate size of training data and data curation for “
Additionally, the participants found that it was difficult to track the flow of different ideas in the vertically scrolling interfaces of the image generators. Participants suggested that using a more spatial layout would help to explore different aspects of the design concept while also being able to zoom out and see how the ideation has progressed.
Considerations for educators
Many researchers believe that large language models (LLMs), such as ChatGPT and GPT-4, will affect how education is carried out in Higher Education. LLMs can be applied for general problem solving and can act as personal intelligent tutors.
37
This new capacity for solving given design tasks could also affect how architecture is taught in Higher Education. The capacity of LLMs to respond to natural language prompts is a result of their emergent behavior.
38
Prompt-based interaction with the LLM is a means to trigger this emergent behavior. However, architectural design is more than visualization, and careful consideration of the value of image generation is needed. Understanding how the given methods work in the larger context of problem-solving in spatial design is part of the designer’s expertise. Additionally, educators need to recognize how new technologies shape and support students’ understanding of the principles of architecture, such as materials, form, and function, as well as the site-specific context that is critical for architectural projects. For instance, through observations on student design skills after adopting CAD techniques, Meneely and Danko suggested that design education should focus on the question of “why” to promote reflection on the usefulness of different technologies for the design process.
39
The authors state: “
Limitations
We acknowledge that text-to-image generators can be used very effectively, especially by expert users and when using advanced features. In our study setting, the participants were largely inexperienced and started using them from scratch, at least in the context of the design task. Better instructions or extended tasks could improve the generated images and help sort out the challenges that came up in our experiment. Additionally, as we motivated the participants with an additional prize for the best-voted ideas in each session, the winning works are only representative of the participants’ views. Additionally, our study was not focused on the quality of the participants’ work but rather on how the students adopted the tools to support their creativity. To focus on quality, future studies could evaluate the ideation outcomes with experts.
Conclusions
The recent rapid development in image generation has the potential to transform the design processes in architecture—a field heavily concerned with the production of visual media. We conducted a laboratory study with 17 architecture students to understand how they adopted image generation in the early stages of architectural concept ideation. Using standardized questionnaires on creativity support, prompt analysis, and group interviews, we learned that the participants approached image generators with different creative mindsets. In order for image generators to be effective and meaningful in architectural design, the design of the image generators needs to support creative exploration, and architectural educators need to emphasize appropriate usage and teach advanced usage.
Supplemental Material
Supplemental Material - Using text-to-image generation for architectural design ideation
Supplemental Material for Using text-to-image generation for architectural design ideation by Ville Paananen, Jonas Oppenländer and Aku Visuri in International Journal of Architectural Computing
Footnotes
Declaration of conflicting interests
Funding
Supplemental Material
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
