Abstract
Keywords
Introduction
Population expansion and climate change have been the main driving forces to address the challenge of adequate housing aligned with sustainable development goals. 1 However, the building design process is iterative, time-consuming, and costly. 2 Despite specifications concerning climatic and contextual differences, facilitating this process would therefore make a significant contribution. Detailed technical drawings, such as floor plans are to be prepared in the design development stage to illustrate more refined aspects of the design. 3 A floor plan is arguably the most used visual representation by architects, and serves as a compact blueprint of a building, effectively communicating the arrangement and spaces, structural elements, and openings. In addition, floor plans convey the architectural qualities of a building through the function of the spaces, and connection to the outside.
While computational design methods formalize the input features and some of the objectives through parameterized functions to cast the problem as an optimization problem, 4 the architectural qualities are often linked to complex functions or unknown and high dimensional parameters that cannot be expressed in closed formulations. One approach to formulating the generative process is learning from real-world examples where architectural intelligence is manifested through real design examples. In recent years, data-driven approaches - most prominently deep learning models - have shown to be capable of extracting useful information from complex and high-dimensional data. 5 Hence, such approaches are particularly interesting for the design of floor plans, which is a complex generative task. Nevertheless, formalizing the evaluation metrics to validate the competence of the deep learning model is rather challenging and the focus of this article.
The advancement of artificial intelligence technologies has made it possible for computers to display human-like capabilities for comprehending, interpreting, and generating unique solutions. 6 The human-machine collaboration has been investigated in many areas, from transferring human visual knowledge to robot vision in disassembly tasks 7 to autonomic architectures for smart buildings. 8 The way machines assist architects in designing floor plans harkens back to the onset of digital culture in architecture. The idea of automatically generating and evaluating floor plans was established by the conceptual foundation built upon the “Flatwriter: Choice by Computer” project. 9 The fundamental concept beyond the Flatwriter not only laid the foundation for machine-aided generative design, but also tapped into the concept of human-machine interaction and participatory design. 10 The notion of using intelligent machines for generating and evaluating floor plans has been a persistent theme in data-supported architectural design. Currently, similar questions are being explored, but under different computational conditions and with more sophisticated digital tools.
In automated floor plan generation, the format of the generated output is a crucial factor to ensure further flexibility and editing possibilities. Compared to raster images, vector graphic formats (e.g., dwg, svg, etc.) offer more possibilities of integration into the computer-aided architecture design (CAAD) procedure, 11 better performance at global reasoning, 12 and no post-processing requirement. 13 Moreover, due to the inherent characteristics of architectural technical drawings such as collinearity, orthogonality, and corner sharing between adjacent spaces, the task of generating floor plans is inherently different from the generation of natural images or languages.11,12 As a result, developing intelligent machines for floor plan generation is challenging, especially when machines’ and designers’ inputs are both considered in the design process.
In this study, the interplay among The framework of the interplay among data, machine, and human in a floor plan generation task.
Related work
The related works are organized based on the three bubbles: data, machine, and designer (Figure 1). Specifically, the following is discussed: the publicly available floor plan datasets; the involvement of machines in the floor plan generation task; the involvement of the designer in the evaluation process; and their interrelationships.
Floor plan datasets
Data plays a critical role in relation to both machine and designer. Data affects what and to what extent generative models can learn the task and determine, depending on the format, which evaluation strategies can be used. Architectural design datasets come in a range of complexity, annotation style, and quantity, tailored to suit different purposes. 14 For instance, ROBIN 15 was intentionally designed to serve as a plan retrieval study, and HouseExpo 16 was drafted for indoor layout learning. As the affordances that the employed dataset can offer have a significant impact on the problem formulation and complexity of the experiments on floor plans, the choice of a suitable dataset is critical. More specifically on the floor plan generation studies, RPLAN dataset has been relatively widely used.11,13,17–22 Despite the availability and high quantity, RPLAN still lacks some real-world design-related characteristics, such as the limitation of floor layouts to single apartment units (as opposed to multi-unit apartments) and lack of data about the elevation and furniture.
As technical architectural drawings convey not only geometrical but also topological and semantic information, a comprehensive analysis of floor plans requires sufficient data on these three levels. 14 More recently, studies have been conducted on a dataset presenting comprehensive geometrical, environmental, and infrastructural data of 45k apartments in Switzerland, called Swiss Dwellings (SD). 23 Although previous research investigated the importance of feature selection process in predicting room labels, 24 and micro-climate context visualization of floor plan instances in SD, 25 no generation task has been experimented with using this dataset.
Floor plan generation
Machines often play intermediate role between data and designer, linking the curated data and designers’ critical insights through floor plan generation and evaluation. The idea of automated floor layout generation has attracted considerable attention in recent decades, even before the data-driven methods started flourishing.26,27 Since the advent of deep learning (DL), 28 learning-based methods for generating floorplans have become a de facto approach.11–13,17–22,29 The input to these generative models have taken different formats, addressing either of geometrical constraints and topological requirements or both of them. Accordingly, the learning-based floor plan generation studies can be categorized into three groups depending on the input data modality: building boundary, graph, and combined building boundary and graph.
Building boundary as input
Floor plan generation workflows that use building boundaries as fixed input data are more similar to real-world design circumstances in which the building must be designed on a specific site. In the RPLAN study, 17 an encoder-decoder network was trained to predict the position of internal walls within a fixed building boundary given as input and predicted room types. A post-processing step was employed to turn the pixel-level wall predictions into the vectorized representation. Later in the WallPlan model, 13 the front door location was also considered as the fixed input besides the building boundary. After initializing the building boundary with windows prediction, a graph generation and semantics generation networks were joined to predict the internal walls as well as the room types.
Graph as input
A shift in the input type from the building boundary to the graph, representing the bubble diagram in the architectural design workflow, was made with the purpose of the control over the spatial relationships between spaces. In the HouseGAN model, 12 a generative adversarial network (GAN) was trained, taking the rooms’ type, number, and spatial adjacency gathered in a single input graph. With the defined architectural constraints, the model resulted a set of axis-aligned bounding boxes of rooms as possible solutions. Integrating the same network strategy with a conditional GAN, the HouseGAN++ model 19 was trained to convert the input bubble diagrams to the segmentation masks of rooms and doors. With similar layout connectivity representation as input graph, the HouseDiffusion model 20 was later introduced to directly generate a vector floor plan by predicting the coordinates of rooms and doors using denoising a diffusion process. In another study using the connection graph as the input, the GTGAN model 29 used a graph transformer GAN to control the room relations as graph nodes. As a result of the mentioned studies, using the graph as the input to the floor plan generation pipeline would maintain the topological requirements, while overlooking the geometrical (e.g., area or building boundary) constraints.
Building boundary and graph as inputs
Considering both building boundary and graph as the input to the floor plan generation pipeline would benefit both geometrical and topological design constraints. The Graph2Plan model 18 was devised to take the building boundary and user constraints in the form of room numbers, locations, and adjacencies, as the input in the floor plan generation framework. In this pipeline, graph layouts were retrieved from a dataset and then adjusted in the given input layout, leading to a set of bounding boxes for rooms and a floor plan raster (i.e., pixel-based) image. In another study introducing the FLNet model, 22 user inputs were set in the form of building boundary, room types, and spatial relationships as graphs. The space layout resulting from the embedded vectors of the input graph were aligned inside the given boundary, resulting in a raster floor layout output. Although the two mentioned studies differed in the sequence of applying boundary and graph constraints in the floor plan generation workflow, both resulted in floor plans satisfying topological and geometrical requirements.
Floor plan evaluation
As the last element of the presented framework, evaluation of generated floor plans influences the way the performance of computer vision models is measured and the extent to which the designer’s insights are incorporated. In general, two types of evaluation methods have been implemented in the related studies. The first category incorporated quantitative analysis of the generated floor plans such as diversity, compatibility, and mean positional error (MPE), whereas the second category corresponded to qualitative analysis (also referred to as user study).
Quantitative approach in floor plan evaluation
Different quantifiable metrics have been either employed from other research domains or particularly devised for the floor plan analysis task. One of the earliest methods was calculating the actual distance from the predicted location of generated internal walls in the floor plan to those in the ground truth, expressed in meters. 17 This expression is limited to the applications where the correspondence of each pixel in the floor plan image to the real measurements is known. Other than measuring the distance between corresponding elements, diversity of the generated floor plans were also regarded as a quantifiable metric to measure the creativity of the trained computer vision models. Diversity has been assessed through calculating the Fréchet inception distance (FID) metric,11,12,19,20,29 comparing the distribution of generated floor plan images with the distribution of a set of real images known as ground truth. Moreover, in the studies in which floor plans have been represented as graphs, a metric called graph edit distance 30 was employed to assess the topological aspects of the generated layouts. This metric quantifies the compatibility between the input bubble diagram and the one reconstructed from the generated floor plan.12,19,20,29 Based on both image and graph distances, a metric called SSIG was later introduced for evaluating the structural similarity of floor plans. 31 SSIG was coined to address the lack of structural awareness of pixel-based metrics (e.g., Intersection-over-Union), as well as practical difficulties of pairwise graph matching approaches.
Qualitative approach in floor plan evaluation
Besides the abovementioned quantitative analysis, some qualitative assessments were also employed in form of user studies to integrate the expert insights in floor plan generation process. The plausibility of the generated floor plans against the real ones was assessed by a group of participants in the earlier studies,
18
also including a vigilance test to verify the results from experiment.
17
With the same idea, the
The combination of domain-specific quantitative metrics and user studies would lead to an effective mixed-methodology assessment framework for the evaluation of floor plans. Despite the importance of incorporating expert knowledge in the floor plan generation workflows, the quantitative evaluation approach has received considerably more attention in previous studies. The reasons could stem from the relative ease of the calculation as well as the lack of access to expert evaluators. Moreover, all the studies in which the qualitative approach was followed, sought to verify the realism of the generated floor plans. However, comprehensive qualitative assessment of the architectural layouts goes beyond this assessment, also addressing topological and geometrical aspects.
Current study scope
Benchmark datasets for the training and evaluation of generative methods are crucial for the development of d deep learning models. In this study, the focus is on the evaluation of the generated floor plans from two computer vision models that have been trained on the MSD dataset. Specifically, attention is given to the human evaluation beyond merely comparing the results with the dataset samples. Accordingly, the novelties of this study are as follows: • Applying two computer vision models trained on the newly introduced MSD dataset for the floor plan generation task. • Proposing a hybrid evaluation scheme of AI-generated floor plans • Highlighting the role of the designer in the process of intelligent floor plan generation and evaluation
The remainder of the papers is divided into four main subjects: The method and materials including the dataset, computer vision models, and the hybrid evaluation scheme of the generated floor plans in Section three, the quantitative and qualitative results of the floor plan evaluation process in Section 4, discussion and conclusion including the interpretation of the results, limitations of the current study, and the further developments in Sections 5 and 6, respectively.
Method and materials
Based on the framework presented in Figure 1, the structure of this section also contains three main aspects, each corresponding to one of the “data”, “machine”, and “designer” bubbles. Accordingly, the development of a novel floor plan dataset, the specifications of two computer vision models trained on the introduced dataset, and the details of the hybrid floor plan evaluation scheme are presented.
Modified Swiss dwelling (MSD) dataset
MSD
32
originates from the SD database
23
– an extensive collection of building layouts in numerical format. The geometrical data in SD is contained in a DataFrame in which each row is an architectural detail that describes a stand-alone space (e.g., living room, corridor) or element (wall segment, window, furniture). The columns define the related data of each architectural entity, for instance, “geometry” as a polygon represented by WKT format, “entity type” categorizing the type of the entity such as “feature” or “area”, and an identifier (ID), which associates the geometrical detail to the belonged site, building, plan, floor, apartment, and unit. The , SD dataset was further processed into MSD by visualizing the plans and curating a well-balanced ML-ready dataset. The dataset focuses on medium- to large-scale residential buildings, excluding mixed-use floor plans, and minimizing similar data instances. The cleaning process includes Removing features such as kitchen sinks, and bathtubs • Filtering out non-residential floor layouts by detecting certain room types • Sampling one-floor level per building to avoid duplicate data instances • Keeping more complex layouts by eliminating layouts with few areas
Ultimately, the number of floor plans in MSD equalled 5.372. Each floor plan is defined as an image in which the pixel value indicates the room type. In addition, the corresponding access graph is attributed to each floor plan. The nodes in the graph are defined by room type, zone type, centroid, and polygon. The edges in the graph indicate access connectivity and are either “door”, “front door”, and “passage”.
Computer vision models
Even though the focus of the paper is the evaluation of the generated floorplans, we provide the technical details of the baseline computer vision models to be able to correlate the results with the approach. As an output of the floor plan generation competition on the MSD dataset, 33 the two top approaches were selected for evaluation in our study. The inputs are the structural elements of the floor plan (i.e., external walls and interior load-bearing walls) and an access graph representing the connection of grouped areas (i.e., zones). The zones are defined based on environmental and architectural requirements of interior residential spaces as follows: (1) zone 1: bedrooms (2) zone 2: living room, kitchen, dining room, etc., (3) zone 3 storerooms, bathrooms, and toilets, and (4) zone 4: balconies. The learning process is tuned to turn the inputs into a complete floor plan. The two baselines are referred to as: (1) segmentation-based, and (2) generative-based models.
U-GCN
In the segmentation-based approach,
34
a U-Net
35
was used to encode the building structure into a compressed feature vector representation and subsequently decode the representation into the rasterized floor plan. To condition the U-Net on the zoning graph, a graph convolutional network (GCN)
36
was used, encoding the zoning graph into a feature vector. The feature vector was concatenated to the latent representation of the U-Net; together used as input to the decoder part of the U-Net. The U-Net and GCN were updated simultaneously throughout training. In addition, the pre-trained • The encoder of the U-Net consisted of four down-sampling convolutional layers, each doubling the channel dimensions (64 → 128 → 256 → 512). The layers comprised 3 × 3 learnable convolutional, batch normalization, ReLU activation, and 2 × 2 Maxpool, respectively. The decoder had the same structure as the encoder; however, used up-sampling convolutional layers. • The GCN was a stack of two graph convolutional layers, each with a 256-sized hidden node feature dimension. Global mean pooling on the output node features was used to compute the graph-level feature vector. • The multi-class cross-entropy loss on the pixel predictions, and Adam optimizer with an initial learning rate of 0.001, and batch size of 16 were employed.
Modified HouseDiffusion
In the generative-based approach
38
the Modified HouseDiffusion (MHD) model was introduced. Instead of denoising pixels, as is done in conventional diffusion models, MHD denoises 2D coordinates of the areas. The MHD’s architecture is based on learning the relations between the corner points of each room polygon. To condition the model on the building structure, an extra attention module between all corner points of the polygons and corner points in the building structure was added. Also, to associate the correct room type given the zoning type, a graph attention network (GAT)
39
was trained separately. The generated floor plans resulting from the two baseline models were further investigated through a hybrid evaluation scheme. Model and training details are as follows: • The GAT model consisted of five consecutive graph attention layers. Initial node features were one-hot encodings of the zoning type, and edge features depending on the connectivity (i.e., door or a wall). The hidden node feature dimension was set to 64, and a ReLU was used between consecutive layers. The output node feature dimension was equal to the number of room types. The cross-entropy loss between the output node features and the ground truth (a one-hot encoding of the room types) was employed. Moreover, Adam were used with an initial learning rate of 0.001, batch size of 128, using dropout (0.2) for regularization, and applying early stopping. • The MHD model was built on
Hybrid evaluation scheme of generated floor plans
Given the necessity of keeping the designer in the loop of the design process, this study proposes a hybrid evaluation scheme for the assessment of generated floor plans (Figure 2). The scheme is divided into two main steps, and the process is done sequentially. To test the trained generative models, a sample of the generated floor plans first undergo a quantitative evaluation, in which the visual and topological similarities between the test data and the ground truth (i.e., samples of training dataset) are measured. Afterward, a fraction of the top results (i.e., the ones passing the quality threshold) are filtered to go through the qualitative evaluation, in which the features that are more efficiently captured by human expert vision are to be assessed. Eventually, the final top results are achieved. The details of the two mentioned steps and their sub-steps are explained in the following sub-sections. The proposed hybrid evaluation scheme for generated floor plans assessment.
Quantitative evaluation
The generated floor plans of the two models were quantitatively evaluated both at the image pixel level and at the graph level. At the pixel level, the mean Intersection-over-Union (MIoU) is used as a similarity measure between the generated and ground truth floor plans. MIoU stems from a simpler metric called IoU, which measures the pixel-wise intersection divided by the union (i.e., combined) pixel areas of a floor plan pair (the predicted one and the corresponding ground truth). One step further using MIoU, the overlap between two-floor plans is measured as the average per space. This metric would capture the features related to the amount of pixels in different regions of the floorplan, which is architecturally translated to the area of each room. The MIoU values can range from 0 to 1, with higher values showing higher overlapping of the predicted and the ground truth floor plans.
At the graph level, the consistency between the predicted and ground truth graphs is checked. For comparing the predicted and ground truth graphs, the adjacency graphs from the predicted geometries were first extracted. The graph compatibility is then measured by counting the edges in the ground truth room graph that are retained in the predicted adjacency graph, divided by the number of edges in the ground truth room graph. Using the graph compatibility metric, topological qualities of a floor plan would be measured. The range of this metric is similar to MIoU, which values closer to one show higher compatibility of topological aspects between the predicted and the ground truth floor plans.
In the quantitative evaluation phase, 800 floor layout samples were initially selected as test set. To ensure a logical distribution of both simple and complex floor plans, the test set was divided into five subsets based on the spaces counts. Afterwards, 10 samples were randomly selected from the subsets that passed the IoU threshold specific to each subset. The IoU thresholds for the subsets with 15–19, 20–29, 30–39, 40–49, and 50+ area counts were set to 0.35, 0.35, 0.30, 0.25, and 0.22, respectively. As a result of this filtering process, 50 floor plans were qualified for the qualitative evaluation step to be further investigated by a group of knowledgeable users with architecture background.
Qualitative evaluation
In the qualitative evaluation phase, the filtered data instances from the quantitative evaluation step went through a user study by 14 participants with architectural background. A survey was arranged including 102 image data instances, comprising 50 floor plans of each model plus two ground truth for vigilance test purpose. The participants were asked to evaluate the given floor plans based on step-by-step approach (Figure 3). As a result of implementing the vigilance test, the survey results of the participants who did not correctly answer the stated questions for the ground truth floor plans were disregarded. This allowed for more accurate interpretation of survey results of higher quality. Consequently, 10 evaluations were qualified to be further processed. The qualitative evaluation steps, scoring system for each step, and answer options.
More specifically, the qualitative evaluation of the generated floor plans followed a high-level to detailed approach. Each step contained three possible options to be chosen for the related questions. First, it was checked whether different units on a building floor level are distinguishable enough. The definition of the term “unit” in this study is a whole dwelling arrangement containing all the required spaces. Second, the typology of the dwelling was assessed by considering the number, or presence/absence of certain room types. After the first two evaluation steps, participants could choose to continue further with the assessment of either one of the room proportions and topology requirements, or both. The qualitative evaluation process and the scoring system are illustrated in Figure 3. The cumulative scores of all steps equal to 1, each of which is linearly divided among answer options. Between the two parallel steps of room proportions and topology requirement, the former was assigned a higher weight since the computer vision models were initially conditioned by the zoning graph. The results of the proposed hybrid evaluation scheme are reported in the following section.
Results
Following the steps of the evaluation scheme presented in the previous section, the results are reported in two main sub-categories. Firstly, the quantitative results applied on 800 sampled floor plans of two computer vision models are reported. Subsequently, the qualitative results are explained on the 100 filtered floor plans resulting from the previous step.
Quantitative analysis
Quantitative results on floor plan generation for U-GCN and MHD.
Qualitative analysis
User study results of evaluating 100-floor plans based on different architectural criteria (The values correspond to the mean scores per user in different criteria as well as in total.).

Generations of MHD and U-GCN. The inputs (zone graph and building structure) are provided in the left two columns; the generations of MHD (with and without WCA) are given in the next two columns; those of the U-GCN in the fourth column; and the ground truth pixel maps in the final column. Examples are given from small (top) to large (bottom) building layouts, which was measured by the number of rooms. Colours represents the room types.
To measure the agreement among users, Cohen’s Kappa coefficient 40 was calculated for each user compared to the average responses of all the users in each model. The coefficient was calculated pairwise, containing two sequences corresponding to each of the four qualitative evaluation criteria. The negative values of this coefficient demonstrate poor agreement, while the 0–0.20 range shows slight, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–80 substantial, and 0.81–1.0 almost perfect agreement. Based on the distribution of data, three categories were defined based on 45% and 65% percentile of data at each criterion. Accordingly, the responses of each user were assessed in unit differentiation, typology, room proportions, and topology criteria against the corresponding values for the average responses. As it is shown in Table 2, only 20% of the users showed fair to moderate agreement for the U-GCN model, whereas 40% agreed on the four defined criteria at the level of fair to substantial for the MHD model.
The values from the two models’ results are plotted as boxplots in Figure 5, specifically showing the distribution of the user-study scores, with the size of the box indicating the spread of the distribution. The highest mean score for the U-GCN model corresponds to the typology criterion, whereas MHD gained the higher mean score in the room proportions aspect. Both models performed relatively poorly in representing proper topology, which indicates the drawbacks of the models in satisfying more complicated architectural qualities. The most distinguishable difference between the two models can be seen in the proportion assessment criteria, which demonstrates the superiority of the geometry-based model in showing measurement qualities of spaces in the pixel-based one. Qualitative results on floor plan generation for the U-GCN and MHD models.
Discussion
Comparing the overall performance of the two computer vision models trained on the MSD dataset for the task of floor plan generation, the results can be interpreted on two different levels. Firstly, the models can be assessed based on the gained scores at different stages of the proposed hybrid evaluation scheme, which would give insights regarding the robustness of different generative models in the floor plan generation task. Secondly, the role of designer in this process can be investigated, focusing on the qualitative analysis study. Accordingly, helpful guidelines can be extracted to further improve the human-machine interaction in the building design process. The details of the two mentioned interpretations are as follows:
The performance of models
For the MHD model, although the MIoU scores are the least for the category of floor plans with 50 rooms, the graph compatibility scores are the highest. Regardless of the model, this shows the importance of evaluating the generated floor plans based on different geometrical and topological criteria, such that diverse architectural qualities could be covered in the assessment. Moreover, implementing the hybrid evaluation scheme in this study allowed different steps of the evaluation framework to be compared. Although the U-GCN model outperformed in quantitative evaluation step, the MHD model showed higher scores in qualitative assessment, in addition to the capability of being assessed by graph-based metrics. Conducting the qualitative evaluation also revealed the extent to which the generated floor plans could be assessed. In the sense that participants were given the option to further continue the assessment based on either room proportions, typology requirement, or both. Nonetheless, all the participants chose to assess all the instances based on both of the parallel steps. Furthermore, based on the results from the qualitative study, the AI-generated plans perform on average half as the ground truth (Figure 6). More specifically, the ML-based models showed more precise performance in capturing high-level attributes of the floor plans, namely unit differentiation and typology. Therefore the progress should be made in the future models to also be robust at extracting more detailed features such as room proportions and topology. The overall comparison of U-GCN and MHD models (left final qualitative evaluation score, right Kappa coefficient values for each user).
The effectiveness of the evaluation scheme
Besides the comparison of two presented computer vision models by the proposed quantitative and qualitative metrics, this study also shed light on the performance of the evaluation scheme itself. Compared to the evaluation approaches followed in previous works, in which the impact of expert evaluators were limited to assess the realism (i.e., whether the generated floor plans look like real case) of the outputs,11,12,19,20,29 the current study benefited from step-by-step high-level to detailed evaluation approach, addressing multiple architectural aspects. Moreover, the agreement assessment results among the users in this study proved that the task of floor plan evaluation is highly subjective (Figure 6, right). Although the number of participants in the qualitative assessment part was limited, the main objective of performing such assessment in this study was to explore the role of the designer in the process of generative floor layout design. It was shown that expert knowledge is a determining factor in assessing AI-generated floor plans. The role of designer can be interpreted and further improved by two approaches. In the sense that the designer can either enter the evaluation process of the floor plans later in more detailed steps, or the models can be fine-tuned based on the designer’s input in an interactive setting such as the Reinforcement Learning approach.41,42
Conclusion
Due to population expansion and climate change, demand for providing adequate housing has raised. Given that the housing design process is inherently iterative, time-consuming, and costly, facilitating the design procedure would of high impact. In this regard, automated floor plan generation task has benefited from recent technologies in two different methodological ways. While computational methods seek to parametrize the design space, the parametrization is learning-based approaches using computer vision models has been shifted to neural networks’ parameters. In other words, implementing AI-based techniques into the building design process can solve problems that algorithmic approaches show limitations to tackle. Implementing either of these approaches, attention must be paid to the climatic and contextual specifications of the intended building to be designed or assessed.
In this study, a task of floor plan generation using a novel dataset was defined, and accordingly, two computer vision models were trained and tested. A hybrid evaluation scheme, composed of quantitative and qualitative analysis was proposed and later applied to the trained models. Based on the results, both the competence and the gap to develop more robust models in future iterations were concluded. It was shown that despite the advancements in computer vision models in task of floor plan generation, they still struggle capturing the architectural qualities which can be assessed by expert knowledge. Moreover, the role of the designer in the evaluation of AI-generated floor plans highlighted the need for reconsidering the evaluation and design pipelines within this realm to adapt to the new technological advancements. Consequently, the task of floor plan evaluation is concluded to be non-trivial, calling for unified evaluation systems, metrics, and scoring. Furthermore, employing a coherent evaluation scheme makes the comparison of different studies on architectural datasets more feasible.
The limitations of the current study and therefore the potential for further expanding this line of research fall under the number of expert users and the amount of floor plan being assessed by them. The evaluation scheme could benefit from more collective inputs of the evaluators and hence reducing the possible sources of bias. Moreover, this study focused on assessing geometrical and topological qualities of generated floor plans; however, broader analysis can be conducted based on other architectural qualities such as the environmental demands, particularly context-specific architectural characteristics and regional guidelines. Further improvements of the current study can be envisioned in the following directions: • Devising new evaluative metrics tailored to floor plan architectural, technical, and performative assessment • Fine tuning the current generative models towards enhancing the more detailed architectural qualities such as room proportions and topology requirements • Integrating the designer’s input in the AI-driven building design loop side by side of data and the machine
The research data including the test floor plans for the qualitative evaluation phase and their corresponding ground truths can be accessed upon request.
