Abstract
This research introduces a three-dimensional (3D) building learning and generation framework based on graph theory and generative AI models, Building-Graph-AI. The framework aims to encode 3D building models into graph-structured data suitable for training graph neural networks (GNNs) and to generate layered 3D models at the detailed building component level. We test various encoding methods and neural networks, selecting the most effective method and defining it as Graph-BIM encoding. The results demonstrate that the Graph-BIM encoding method can reconstruct and generate detailed 3D building models from simple geometries and constraints. Compared to existing methods based on voxel, point cloud and 3D field, Building-Graph-AI excels in learning and generating detailed, hierarchical 3D models at the building component level, such as walls, columns and floors. By bridging the gap between geometric design and AI-based training and generation, this framework enhances the adaptability and efficiency of AI applications in architectural design.
Keywords
Introduction
In the concept design phase of architecture projects, simple geometries are progressively refined under design constraints to create detailed building models. Recent progress in Artificial Intelligence (AI) attempts to replicate this design workflow.1–3 Nevertheless, existing generative AI models in architecture design face challenges in learning complex three-dimensional (3D) building spatial layouts at a component level. This leads to a difficulty in generating highly detailed and layered building models- comprehensive representations that organise elements like walls, floors, and ceilings into distinct layers to illustrate their relationships, which are essential in the design phase.
Chang, Cheng, Luo, et al. 4 use voxels, a volume element in 3D space used to analyse volumetric data, to represent building space and generate buildings at the voxel level, but the results demand manual detailing. Signed-distance field (SDF), on the other hand, is a method defining the shape of a 3D object by calculating the distance from any point in space to the object’s surface, focus on producing the outer structure of building models, making it challenging to capture the interactions among building components.5,6 Zhong et al. 7 use graph representations that replace room details with generated building space but cannot predict the components inside the room. A point cloud is a collection of points representing specific locations in 3D space, while a voxel is the volumetric pixel exhibited as a small cube, and neural fields are neural networks functions that assign values like color and density to spatial coordinates. The difficulty of generating high-detail models end-to-end from geometry is that deep learning models can hardly capture the spatial connections between the geometric properties of building volumes and the elements of complex layouts.
A robust encoding method is needed to address this challenge so that AI can learn how detailed building components relate to each other. Encoding refers to mapping high-dimensional data to a low-dimensional space for data downscaling and feature learning. 8 In this work, encoding refers to translating 3D building models into data format that AI can understand. Point clouds,9,10 voxels, 11 and neural fields5,6,12 are common 3D model representations that can be encoded for AI-based learning. These methods primarily concentrate on the spatial attributes of individual elements and have limited capacity to learn how these elements interrelate within a spatial layout.
We introduce an encoding approach that automatically converts Building Information Modeling (BIM) data into graph-structured data. This encoding combines hierarchical BIM data with the strengths of graph-based representation for AI learning, which we call Graph-BIM encoding. Graph-based data, consisting of nodes and edges, represents relationships between entities. 13 BIM data includes detailed information on each building component, such as walls, doors, and windows, along with the spatial connections linking these elements. 14 These data types are compatible with each other. Building components are encoded as nodes by integrating BIM data and graph representations. Physical connections, spatial relations, or functional links are encoded as edges. This graph-based structure allows AI to capture the details of every building element and reflect the relationships that define the spatial layouts. Recent work has shown that graph representations can successfully learn to build spatial layouts.4,7,15–18 However, they typically face two main challenges. First, they do not handle the data formats well, making it difficult to learn irregular architectural configurations. Second, they usually depend on room function bubble diagrams or space voxels, which limits the ability to directly control and generate components such as columns and walls. By contrast, Graph-BIM encoding helps AI learn from various BIM data types and reconstruct or produce detailed BIM models under direct control. To test the viability of Graph-BIM, we use Variational Graph Auto-Encoders (VGAE) within a Graph Neural Networks (GNNs) framework.
GNNs are neural networks designed to process graph-structured data, enabling the graphs to model relationships between building entities, which can connect and influence each other. Therefore, it is well-suited for capturing complex interactions among nodes. 13 For instance, studies on shear wall analysis have shown the capacity of GNNs to work with graph-based data and interpret spatial layouts.17,18 Despite this strength, many GNN applications still require extensive labelling for training data. Consequently, we implement the VGAE model to improve learning efficiency from BIM data. VGAE is a type of neural networks that have the ability to learn and predict graph-structured data. It comprises an encoder and a decoder, allowing self-supervised learning of BIM data by recognising nodes’ features and their arrangement in the graph and creating a new graph. 19 Recent research has demonstrated how VGAE can represent molecular structures, learn edges and their attributes, and generate new molecular configurations. 20 The VGAE has also recently been widely used to learn floor plan layout generation in architecture. 21 This indicates that VGAE has the potential to learn 3D building spatial layouts encoded as graph-based data, capturing how different building components are positioned with one another. Therefore, we aim to identify effective Graph-BIM encoding strategies and predict building layouts by evaluating VGAE’s performance in spatial reconstruction.
Nevertheless, using Graph-BIM encoding and VGAE to produce detailed, layered 3D buildings directly from simple geometries remains challenging. Simple geometric volumes do not provide the internal spatial nodes and edges necessary for VGAE learning and generation. Some approaches use voxel-based methods for partitioning space into regular grids.22–24 These grids can represent beams and columns 25 or approximate objects with variable quadrilaterals. 6 Using grid segmentation, new nodes and edges are obtained, providing extensive spatial information for GNNs. This grid-based approach lays the foundation for learning the generative relationship from building volumes to complex building layouts, facilitating the prediction of nodes and edges to fundamental building components such as walls, columns, doors, and windows. We define the spatial structure of building volumes divided by the 3D grid as the 3D spatial grid structure. Here, nodes represent points in space, such as an intersection or endpoint of a wall, while edges represent possible connections, such as walls, doors, or structural elements. The 3D spatial grid structure overcomes the limitations of geometrical empty inner spatial information by inserting richer spatial details into the graph representation, allowing the VGAE to learn and predict the classification of edges.
Graph-BIM encoding and the 3D Spatial Grid Structure and VGAE compose the Building-Graph-AI, an AI generative framework designed for learning and generating detailed, layered 3D building models (Figure 1). Compared to current AI generative models in architecture design, Building-Graph-AI can encode 3D building models into graph-structured data suitable for training GNNs and utilise the trained models for generating layered 3D models at the detailed building component level. This framework can learn spatial layout relationships between design constraints and building volumes, enabling the generated building models to retain precise control over design details at the same component layer. By bridging the gap between geometric design and AI-based training and generation, Building-Graph-AI enhances the adaptability and efficiency of AI applications in architecture design. Building-graph-AI learning and generation framework.
Related work
Graph-structured encoding method
Nauata et al.15,16 develop the Housegan model, which generates housing layouts aligned with graph constraints. Housegan dataset contains approximately 117,000 real floorpans sourced from a Japanese real-estate information service, offering an extensive data about spatial properties in real-world architectural practice. It presents rooms as nodes, and their adjacencies are represented as edges. Its high-level abstraction, however, overlooks detailed elements such as columns, walls, and windows. Building-GNN encodes voxels as nodes and captures spatial interrelations through edges, thus supporting precise spatial semantics at the voxel level. 7 Despite this, existing methods have difficulty managing irregular architectural layouts and generating architectural components within voxel space. For instance, a shear wall layout approach using GNNs encodes walls and windows as edges, with component intersections serving as nodes.17,18 It successfully predicts shear wall layouts and demonstrates that GNNs can control and predict specific architectural elements. Nonetheless, it mainly concentrates on shear wall attributes and does not address the entire spatial layout. The limitations in these works involve predicting detailed building components, accommodating various data types, and handling overall spatial layouts. Graph-BIM Encoding tackles these issues by treating building components like columns and walls as graph nodes and edges, assigning them spatial coordinates and other positional properties such as component orientation and placement. In addition to capturing the relationships among architectural elements, we suggest adding room function connections as special attributes within our encoding implementation.
VGAE models
Simonovsky and Komodakis 26 articulate the use of GraphVAE to produce a predefined probabilistic, fully connected graph in one pass through its decoder. GraphVAE is a type of variational autoencoder generates graphs by creating a probabilistic, fully connected graph in a single pass through its decoder. Adjusting its probability thresholds controls the likelihood of the existence of nodes and edges, thereby creating molecular graphs. (graphical representations of molecules where atoms and chemical bonds are modelled as nodes and edges, respectively) This inspires us to test various threshold values to maintain more comprehensive reconstructions of spatial layouts. In another study, the Permutation-Invariant Variational Autoencoder addresses graph-level learning by resolving the graph reordering problem and clarifying how node sequence uncertainty can be handled. 27 This offers practical advice for managing node sequences in architectural layouts using VGAE. Additionally, Shi et al. 28 propose a masking label prediction approach, randomly hiding a segment of input label data before making predictions. Following this concept, we hide parts of the spatial layout data to examine the overall space, assessing how well the VGAE model can learn spatial layout properties when paired with Graph-BIM Encoding. Building on these theoretical and applied findings, we use VGAE’s graph reconstruction capacities to evaluate suitable Graph-BIM Encoding methods while examining VGAE’s capabilities for reasoning about building spatial layouts with encoded BIM data.
3D space partitioning
The 3D Spatial Grid Structure is developed to resolve simple geometric empty interior spatial information limitations by embedding rich spatial detail into graph representations. It draws on previous work in spatial grid partitioning. For example, Hübner et al. 24 use regular voxel grids to reconstruct indoor models at the building component level from unstructured triangle meshes. Chen et al. 22 apply binary space partitioning (BSP) to form structural grids, showing how partitioning strategies could represent architectural elements. Chang and Cheng 25 investigate using graphs to model structural grids, while Shen et al. 6 employ variable quadrilaterals to adapt to complex 3D objects. These studies highlight how grid-based methods captured spatial relationships between components, forming the basis for our proposed 3D Spatial Grid Structure.
Based on these studies, the 3D Spatial Grid Structure partitions geometries into a uniform grid where nodes represent spatial points, wall intersections or endpoints, and edges represent connections, walls, doors, or structural elements. This method enhances spatial representation by generating latent nodes and edges, allowing the model to accurately predict edge class. By mapping a 3D grid onto building volumes, the 3D Spatial Grid Structure bridges the gap between abstract geometry and detailed architectural models. This approach enables the Building-Graph-AI framework to learn any BIM model based on linear wall layout through Graph-BIM encoding and to predict 3D building models with detailed layers directly from simple geometries.
Methods and results
Graph-BIM encoding test
The workflow of the Graph-BIM encoding experiment is shown in Figure 2. We test three different encoding methods and compare the loss between the building models generated from trained VGAE and the original dataset to determine which encoding approach achieved the best learning performance. A higher loss indicates that the encoding method failed to effectively capture the spatial layout features present in the dataset. Conversely, a lower loss suggests that the encoding method successfully captures the spatial relationships between architectural components and can reconstruct the spatial layout more accurately (Figure 2(a)). The tested encoding methods include three combinations of node features and edge attributes, examining how spatial and environmental information affected the GNN’s ability to learn the 3D building spatial layout (Figure 2(b)). From the perspective of Graph-BIM encoding, the endpoints of walls are encoded as nodes, along with non-overlapping columns, including spatial positions and environmental data. Meanwhile, walls are encoded as edges, with edge attributes representing wall types. Graph-BIM encoding test experiments workflow.
Graph-structured data
We implement the Graph-BIM Encoding method in Revit Dynamo, which is a visual programming platform that can automatically convert the Revit 3D models into graph-structured data stored in Microsoft Excel, as shown in Figure 3. We modify the housegan dataset and transform it into 3D BIM models in Revit. The 3D BIM models are saved in two Excel files, one for node features and another one including edge index and edge attribution. Revit’s built-in component IDs and essential attributes facilitate efficiently converting the 3D models into the graph-structured data required by GNNs. Thus, Graph-BIM Encoding enables data conversion and model training from BIM models to GNNs, improving the efficiency of the generative AI mode in learning from BIM data. Graph-BIM encoding transforming 3D BIM dataset into graph-structured data.
VGAE 3D building models learning
The VGAE architecture for learning 3D building model workflow is shown in Figure 4. First, we transform the 3D BIM models into graph-structured data, including node features, edge index, edge class label and adjacency matrix, and input the graph data into the VGAE encoder. Then, the encoder learn the building spatial layouts and samples a feature vector(z) from the latent space, according to the input graph data information. The decoder receives the feature vector and decodes it into the reconstructed edge class matrix representing the building components class, which is then compared to the original 3D BIM models dataset to compute the loss. The lower loss means that the encoding method accurately learns the building spatial layout features. Finally, the predicted edge class can be rebuilt in Revit with detailed building components. We investigate three encoder neural networks, GCNConv,
29
GINEConv
13
and TransformerConv,
28
across the three encoding methods to identify the most suitable Graph-BIM encoding. Subsequently, we used Graph-BIM encoding to train the VGAE for spatial reasoning by masking some inner walls in the dataset and retaining only the boundary walls. GNNs-VGAE 3D building models learning workflow.
Graph-BIM encoding test results
The encoding test results show that encoding method 3 has the best learning performance for the building model (Figure 5(a)). Figure 5(b) illustrates the 3D model reconstruction corresponding to the loss of 3D model learning. Encoding Method 3 presents each neural network’s lowest spatial layout reconstruction loss. We applied TransformerGCN for the test, which is a neural network integrating Graph Convolutional Networks (GCNs) with Transformer models. Transformer is a deep learning architecture-based on self-attention mechanisms, weighing the importance of different input data parts. Under the TransformerGCN, the predicted wall positions and types are almost identical to the original model, with a training loss of approximately 0.098, indicating that this method enables the VGAE to effectively capture architectural components and spatial layouts. Encoding Method 1 results are similar to Method 3, offering a comprehensive reconstruction of wall layouts and types. With TransformerGCN, it achieves a reconstruction loss of 0.184, missing only a few walls. In the GINE model, the reconstructed spatial layout is missing more walls, resulting in a loss of 0.574. By contrast, Encoding Method 2 is the least effective. It predicts irrelevant diagonal walls or misses wall information. In the GINE model experiment, the test dataset reconstruction loss reaches 0.756, suggesting that only relying on environmental information is insufficient for learning building spatial layouts. These results show that while spatial information is crucial, environmental information also captures local features. Ultimately, Encoding Method 3 is selected as the Graph-BIM Encoding. GNNs-VGAE 3D building models learning results, training loss and 3D model.
3D spatial grid and Graph-BIM encoding
After evaluating the learning capability of the Graph-BIM encoding method, we experiment with the 3D model generation ability of Building Graph AI. We integrate 3D spatial grid to convert the 3D BIM model dataset into graph-structured data, which is then used to train a VGAE model capable of reasoning from simple geometries and design constraints to generate detailed building models (Figure 6). The bounding box of the building is converted into a 3D spatial grid structure, with edges denoting different wall types and nodes representing wall or column endpoints. The external wall types among the edges also formed floors and roofs. In addition to the spatial attributes of nodes and edges, design constraints are involved, such as the location of the main entrance. This allows it to adjust the entrance orientation and assess whether the trained model could dynamically adapt to generate detailed building models based on the door location. Graph-BIM encoding transforms the dataset into graph-structured data, including node features, edge features and edge class. Graph-BIM encoding and 3D spatial grid structure applied in 3D BIM dataset.
Dataset processing
The dataset is derived from modifications to the Housegan dataset.15,16 Firstly, we filter and extract the 27,965 original Housegan dataset to select plan data, including the main functional rooms, entrance corridors, kitchens, living rooms, and bedrooms. Subsequently, the walls of these datasets are aligned onto the grid, which is in units of 1 m. These two-dimensional grid plans are transformed into 3D Revit models that conform to the 3D spatial grid structure, as illustrated in Figure 7. These 3D BIM models are then further converted into graph-structured data within Revit through Graph-BIM Encoding. transforming 2D Housegan dataset into 3D BIM training dataset.
VGAE 3D building models generation
For the generation component of Building Graph AI, we trained a VGAE model for spatial reasoning by utilizing the Graph-BIM encoding method combined with the 3D spatial grid. This model is capable of predicting detailed, hierarchical 3D building models based on the input of simple geometric shapes and entrance locations (Figure 8). During training, we intentionally mask the interior wall layouts of the 3d BIM models, removing part of the edge information, to encourage the model to infer the internal spatial wall arrangements based solely on the basic spatial geometry and the position of the main entrance. We encode the geometry and design constraints of door locations into the graph-structured data, node features, adjacency matrices, and input data to the model. To find the best building generation performance, we investigate various encoder neural networks, GCNConv,
29
SAGEConv,
30
GATConv
31
and TransformerConv.
28
The VGAE predicts specific classes of edges of the 3D spatial grid structure, modelled into 3D detailed building models. During model training, we use the predicted building models, generated from the masked internal spatial layout datasets, as targets and compare them with the original complete datasets to compute the loss. The trained VGAE model is evaluated through room accessibility assessments to verify its generalisation ability in generating detailed 3d architectural models under different geometric inputs. GNNs-VGAE 3D building models generation workflow.
3D Building models generation results
As shown in Figure 9(a), the VGAE model utilising the SAGEConv Encoder generated the best training results, improving notably as the data volume increased, with the model loss reaching 0.45. The SAGEConv Encoder is a part of Graph Sample and Aggregate (GraphSAGE) framework, a convolutional neural network designed to generate embeddings for nodes by aggregating features from local neighborhoods. In our test, it demonstrates a better understanding of the dataset’s spatial relationships between building components. Figure 9(b) illustrates that the VGAE model can accurately predict building layouts and components based on simple geometries and design constraints. We define room accessibility as an evaluation metric. In Figure 9(b) (1), when the main entrance is placed on the west, east, or south sides, the generated room arrangements and component positions associated with the entrance have accessibility above 75%. Under irregular building volumes, it sometimes produced enclosed spaces without doors, such as in the first layout of Figure 9(b) (2), reducing accessibility to 45%. However, the other two layouts reached 100% accessibility, indicating robust generalisation. Figure 9(b) (3) demonstrates all the generated building layouts with accessibility above 80%. The VGAE using the SAGEConv network successfully transferred learned spatial layout features to new design tasks, improving the efficiency of creating final building models. GNNs-VGAE 3D building models generation results, training loss and 3D model.
Editable building components in layered structure
Our results show that modifying components of the same type could easily edit the generated building models. In Figure 10(a), exterior walls are replaced with glass curtain walls, or external and internal walls can be switched to different brick walls with various materials. Window components on these external walls are similarly replaceable with alternative types (Figure 10(c)). In design tasks, this layered structure significantly reduces the time architects spend adjusting detailed components, providing flexibility similar to manually created end models. Generated layered 3D building models replaced with different types of building components.
Limitations and future
The dataset utilised in this study primarily consists of spatial layouts defined by walls and columns, which are the foundations for modelling. However, these datasets lack representation of more detailed building components such as isolated windows, doors, and furniture. Incorporating these elements into the Graph-BIM encoding method would enhance the model’s ability to learn and generate more comprehensive 3D models. The dataset, derived from the Housegan dataset, also presents several shortcomings, including inaccuracies and a lack of diverse room functions. These issues pose challenges during model training, limiting the framework’s ability to comprehend spatial relationships between nodes and edges. Additionally, the constraints and applicablility of the Housegan dataset result in orthogonal layouts, limiting the ability to model irregular or non-orthogonal building components. The accuracy of generated spatial layouts depends on grid resolution: smaller grid units can improve layout precision but increase the computational cost, making high-resolution models computationally intensive.
Future work should prioritise improving dataset quality and diversity. We encourage researchers to explore the generation of non-orthogonal and curved shapes and achieve higher grid resolution while balancing computational efficiency. In the experiments, we employ different VGAE encoder neural networks to aggregate node information. However, integrating edge attributes into node features remained challenging. The neural network model capable of precisely aggregating node and edge attributes would deepen the framework’s understanding of spatial layouts, enhancing its graph reconstruction and generation capabilities. Future work could explore incorporating a transformer 32 or recurrent neural networks (RNNs)33,34 to infer and generate additional nodes and edges.
During the graph generation phase, the Building-Graph-AI framework predicts edge types based on the original graph’s node features. While effective, this process does not generate new nodes or infer overall spatial layouts from minimal input data. Training AI models to generate nodes autonomously could enable the inference of entire spatial layouts, enhancing the framework’s generative capabilities. The framework processes building volumes with straight edges into 3D spatial graph structures, making it well-suited for grid-based building layouts. Nevertheless, it is limited in representing and fitting complex volumes, such as those with intricate surfaces or curved geometries. Future research could integrate neural implicit fields 35 and topologically extendable quadrilateral networks, enabling the model to learn 3D structures of varying geometries more precisely.
In the current study, we evaluate the encoding methods’ ability to learn spatial layouts using loss metrics, and validate the effectiveness of the building models through room accessibily assessments. This approach partially simulates the evaluation process used in practical architectural design. However, to more closely align with real-world design practices, we plan to introduce more sophisticated evaluation strategies in future work, such as involving professional architects and industry stakeholders to assess the generated results. These assessments would focus on specific parameters such as accessibility, functional layout, and space utilization, helping to clearly define directions for model optimization and improvement.
Moreover, the current training dataset, derived from the HouseGAN model based on Japanese residential layouts, incorporates extracted design constraints, specifically the orientation of the main entrance. By setting the main entrance location and geometric form, pretrained VGAE can predict detailed 3D spatial layouts. This has inevitably limited the trained model to the spatial characteristics of the current dataset. As more diverse architectural datasets become available through open-source initiatives, we plan to incorporate more complex design constraints and spatial considerations into future models, including ventilation, daylighting, surrounding green spaces, transportation access, and functional bubble diagrams. Integrating these multiple constraints into the Building-Graph-AI framework would allow a better simulation of practical architectural design tasks.
Conclusion
This paper introduces a 3D AI building learning and generation framework, Building-Graph-AI, comprising a Graph-BIM Encoding method that automatically converts any BIM model based on straight wall layout into graph-structured data and a Variational Graph Autoencoder (VGAE) to learn and generate 3D building models. Graph-BIM Encoding can accurately and automatically learn spatial layouts from any BIM dataset based on straight wall layout. Utilising this encoding, the VGAE can generate high-detail, layered 3D building models from geometries, where the generated building models are based on components such as walls, windows, and columns. Furthermore, the layered models allow for component-level editing. Compared to existing AI generative models applied in architecture, Building-Graph-AI offers a novel 3D building learning and generating framework and better integration into the design workflows of human architects. By bridging the gap between geometric design and AI-based training and generation, this framework enhances the adaptability and efficiency of AI applications in architecture design. Allowing architects to obtain precise 3D models that meet design requirements rapidly, Building-Graph-AI establishes AI as a collaborative design partner in architectural workflows. In conclusion, Building-Graph-AI lays the groundwork for promoting automated architectural design and acquiring high-precision, detail-rich 3D models generated by artificial intelligence.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
