Abstract
The surge in generative AI poses a twofold challenge for architecture: crafting specialized algorithms and curating top-tier databases for machine learning. This article presents PUBLICPLAN, a database featuring social housing floor plans from recent competitions. Architecturally curated databases transcend randomness, embodying coherent compilations with recurring patterns. Historical architectural series like Mies Van der Rohe’s courtyard houses, John Hejduk’s Diamond series, and William J. Mitchell’s Palladian Grammar are explored, highlighting the relevance of systematic approaches. Despite digital nuances, both authorial series abstraction and automated learning converge, reshaping architectural authorship paradigms. Architectural databases for AI training tap into collective intelligence, contrasting individual architect authorship. An experiment suggests assessing AI-generated floor plans’ authenticity using PUBLICPLAN, seeking expert input on their human-like quality.
Keywords
Introduction
Following the surge in the use of generative artificial intelligence in recent years, a double challenge arises in the field of architecture: to develop algorithms specifically trained to carry out specific architectural projects and to generate high-quality databases necessary for machine learning processes.
Recent studies 1 prove that the quality of data in a database is fundamental for a better functioning of predictive models of supervised learning enabled by artificial intelligence. Only with a sufficient number of qualified cases that allow learning the inherent emerging patterns, can machine learning algorithms make relevant predictions. Obtaining a sufficient amount of data and ensuring its architectural quality makes the design of databases and their accessibility one of the most pressing fields of development in the use of artificial intelligence in architectural design.
In the context of this urgent need, this article presents PUBLICPLAN, an unprecedented database documenting floor plans of social housing typologies developed in public competitions in Spain in recent years. These floor plans have been used to train artificial intelligence machine learning algorithms.
An effective and architecturally qualified database is not simply a random collection of elements, but a coherent compilation in representation with certain binding patterns among elements; that is, it is a series, understood as a group of objects from which relational rules can be extracted to generate specific architectural organizations.
The notion of series in architecture dates back long before the emergence of artificial intelligence. In search of a broader historical and conceptual foundation, this article explores the role of series in design methodology, analysing three cases that we find paradigmatic: Mies Van der Rohe’s courtyard houses, John Hejduk’s Diamond series, and William J. Mitchell’s Palladian Grammar proposals. These historical precedents establish a disciplinary framework that helps us better understand the potential role of databases in generative AI-based architecture. Although they are not directly comparable from a methodological or quantitative perspective, these pre-digital authors laid the groundwork for the conceptual and methodological principles of seriality and systematicity that are emerging today and are fundamental to AI-based design.
From the analysis of these serial strategies, categories, and project techniques, conceptual frameworks emerge that enhance our understanding of how databases function in the context of AI, particularly the PUBLICPLAN database, from a broader perspective. In serial design processes, each new case updates the implicit primitive pattern and organizational diagram within the series structure. Similarly, in machine learning processes, the trained model operates in a comparable manner. When exposed to the database, it learns from the series, identifies its latent patterns (often invisible or not evident), and, upon request, updates specific versions and new cases of the series, generating virtual models that respond to the learned process.
The complexity of data management necessary to address algorithmic training revitalizes the importance of previous serial approaches, analysed in the context of the new sensitivity generated by the use of machine learning. The architectural coherence of these works and their systemic condition extend the capacity of projects beyond a specific proposal, and their value shifts, focusing on the procedure and process rather than on a particular iteration of it.
Naturally, there are numerous characteristics of the digital process that distance it and distinguish it from the human process. Despite these differences, the abstraction of authorial series and automated learning processes share a substrate. The review of disciplinary historical material and the cultural significance provided by artificial intelligence procedures are two vectors that feed into the expanded understanding of architectural authorship. While the series of the modern architect builds an individual authorship that conveys research and a particular signature, architecture databases for training learning algorithms appear as latent structures of collective intelligence. To test their authorship capability, an experiment has been designed where, once various generative algorithms have been trained with the PUBLICPLAN database, different experts judge the plausibility of whether the obtained floor plans could have been designed by humans.
Serial architects: Three historical cases
Mies Van der Rohe
Among the Masters of architecture, it is likely that Mies van der Rohe is the one who most prominently displays a serial logic. In fact, his entire career can be conceived as the exploration of different series, with the most explicit being the courtyard houses that he developed as mere academic exercises without a client, from the early 1930s, during his years as a professor at the Bauhaus. This exploration materialized in a collection of floor plan drawings, characterized by a very uniform representation. 2
The sequence of courtyard houses is based on the repetition and combination of various elements, grouped according to fairly strict rules: An enclosure 3 defined by orthogonal walls about 3 m high that form a rectangular space; courtyards arranged peripherally touching the boundaries of the enclosure; the imposition of a strict square grid that forms large pavement pieces at the intersections where freestanding pillars are located; exterior enclosures consisting of large glass planes, except for those in service areas where more opaque facades are found; flat roof; interior partitions, isolated or intersected in a T-shape, almost always arranged orthogonally, are freed from any structural function; a generous domestic program, evidenced by predominantly placed furniture to allow lateral light entry, and which basically includes a spacious living room, a kitchen, a bathroom, a bedroom, and sometimes a second room conceived as a study or auxiliary room. By omission, the complete absence of doors stands out. In the latest cases of the series, a voluminous fireplace arranged in the living-dining room also appears as a relevant element.
The elements combine to form different cases within the series (Figure 1). The position of the courtyards determines the interior shape of the house, whether L-shaped, T-shaped, or I-shaped with a courtyard on either side. As the series progresses, the perimeter walls become more hermetic, enclosing the house and isolating it from the exterior. Simultaneously, the amount of glass enclosures increases, practically encompassing the entire perimeter, and the independence of internal partitions intensifies. With their twists, these partitions determine the functionality of the spaces, allowing for situations of greater privacy without breaking the continuity of the space. Thus, as the house closes off from the surrounding environment, it opens up internally, generating a more fluid space projected towards the controlled nature of the courtyards, through the prevalence of horizontal ceiling and floor planes. Diagram of the sequence of courtyard houses. Drawing by the authors.
In the series of courtyard houses, a pattern emerges that operates in other projects by Mies himself, such as the Barcelona Pavilion of 1929, the House for a Childless Couple at the German Construction Exhibition of 1931, or the Ulrich Lange and Margarett Hubbe houses. 3 Likewise, the different implicit strategies in the series inspire the work of other architects, from Richard Neutra 4 to Palinda Kannangara, 5 passing through Josep Lluís Sert, 6 forming a clear example of how the underlying structure of a series, its pattern of relationship, becomes a driving force for generating new proposals.
John Hejduck
The Architectural League exhibition in New York in 1967 showcased three projects that Hejduk developed over 6 years, focusing on the configuration of form and the implications of the diamond in space organization: The Diamond House A, the Diamond House B, and the Diamond Museum C (1962-1967). The Diamond Projects represent the continuation of the series called Nine-Square (Texas), which Hejduk began around 1962. The architect identified a breakpoint between the series formed by the Nine-Square and the Diamonds: “Suddenly, a change occurred, a change in direction, and the Diamond Projects emerged”. 7
According to Hejduk, the Diamond Project emerges as a purely formal exploration that extends “the diamond canvases of Mondrian for architects of today”.
7
From the diamond, or the rotation of the perimeter as opposed to the horizontality of the grid, spatial organizations structured by columns (House A), by planes (House B), and by biomorphic forms (Museum C) emerge. (Figure 2) “The new relationships of form have at least two major consequences: peripheric tensions of the edge and field extensions beyond the building volume that render an expanding space”.
7
Diagram of the Diamond Project layouts. Drawing by the authors.
House A and House B are both four-story buildings. House A explores the implications of the square floor plan, structured around 13 round columns arranged in a grid of eight bays. The circular columns accentuate the lack of direction in relation to the perimeter of the plan, the arrangement of internal elements, and consequently reinforce the centre.
In all variations, a column occupies the centre of the plan. Despite the reinforced centrality and the square shape, both the architectural elements (stairs, walls, etc.) and the objects occupying the plan do not favour symmetry or zenithal organization. The different versions of House A present proposals where, in some cases, partitions and elements respond to the directionality of the grid, while in others they are organized independently of it.
In House B, Hejduk experiments with the spatial organization of planes in relation to the rotated square perimeter. Although it maintains the bays of House A, this series introduces directionality and hierarchy through different thicknesses of the planes.
Finally, Diamond Museum C constitutes a synthesis of all the elements that structure the space in House A and House B: columns, walls, and stairs. The grid densifies and consists of 25 columns distributed in six bays. A single type of column is used (circular pillars of the same size), as well as straight and curved walls that vary in their configuration: open, closed, and combined walls. As indicated by Hejduk in one of the drawings with the annotation “center compression,” the plan densifies in some central areas. The homogeneous expansion of the elements that make up the internal field is tensioned by the densification in the centre.
The projects that followed the Diamond series represent an extension of the fundamental problems identified, such as spatial compression, frontality, or centrifugal force. Any of these projects would be incomprehensible if studied in isolation, as they build on research through the variation that composes the series. These series resonate in periods of progressive research, acquiring various degrees of intensity and concreteness. “I cannot do a building without building a new repertoire of characters of stories of language and it's all parallel. It's not just building per se. It's building worlds.”
8
Shape grammars
The history of seriality in architecture also strongly manifests in formalist approaches, finding one of its most interesting examples in Shape Grammars. 9 A Shape Grammar is a system of shape production based on algorithmic rules that indicate how geometric figures can be transformed or composed. From its conception, Shape Grammars are manifestly serial as they involve a model of object generation through the combination of repeated elements following pre-established rules.
One of the most relevant Shape Grammars is developed by William J. Michell 10 for generating Palladian villas, consisting of 69 compositional rules. Among the most fundamental are those enabling the insertion of squares within an orthogonal coordinate system, their displacement, and merging with each other. This generates a tartan-like grid where rooms occupy the interior of squares and the spaces between them represent walls. Once this distribution is established, rules come into play allowing the addition of porticos and columns, as well as generating windows and doors.
In this way, it is possible to algorithmically compose not only those villas that have been effectively constructed but also a potentially infinite set of variations on them; that is, a series encompassing all possible Palladian villas(Figure 3). Diagram of Series of Palladian Villas. Drawing by the authors.
In the elaboration of Shape Grammars, the implicit mathematical basis of the notion of architectural series emerges, as its geometric operability is defined as an ordered set of numbers related by a function, enabling the generation of new elements, that is, operating as a mathematical sequence.
The series of Palladian villas by Michell and Shape Grammars take as a fundamental precedent the analyses of Rudolf Wittkover,
11
who reduces the villas to a series of diagrammatic drawings formed by lines, circles, and stairs that evidence their purely geometric order (Figure 4). In this way, Wittkover inaugurates the formalist approach to the history of architecture, a perspective that would be cultivated with mastery by his disciples such as Colin Rowe
12
or Peter Eisenman.
13
Diagrams of Palladian Villas by Wittkover. Drawing by the authors.
The pattern that emerges from Shape Grammars forms the basis for the development of parametric architecture, which, in a way, can be conceived as a pragmatic extension of the same theoretical idea: the rules of transformation between forms are defined by parameters easily mailable by users, allowing greater control over the result. Once again, we observe how, behind a series like that of the Palladian villas, hides a generation pattern, in this case more related to a computational-geometric approach than to a specific formal type, and which serves as a basis for the development of new design strategies.
The various serial approaches presented help us address with historical foundation how current databases focused on automated learning in architecture work.
To develop the hypothesis that databases constitute an updated version of serial architectural logic, we present PUBLICPLAN; a database designed from the social housing proposals of public competitions in Catalonia and the Balearic Islands in recent years. With this database, we have trained algorithms to experiment with their ability to generate floor plans that are perceived as plausibly part of a series.
PUBLICPLAN: An architecturally qualified dataset
As mentioned before, the configuration of a database can conceptually resemble those of series created by Mies, Hejduk, or Michell. However, in the case of databases, the demand for consistency is even greater than in the cases of series we have mentioned: many cases with a clear structure of representation and case selection are required for the AI algorithms to assimilate them consistently.
Moreover, while the series mentioned so far are authored by a single indivi dual, databases used for training automated learning algorithms are multi-authorial, thus leveraging these algorithms’ ability to extract patterns from common features of different architects, thereby constructing a virtual collective authorship.
Description of the technique and the domain
The public housing bidding system in Spain provides a well-established and publicly available source of projects of certain quality, ideal for shaping an adequate database for machine learning.
Based on ideas competitions for each project, architects are tasked with submitting proposals that include an overall description of the building and detailed distribution of unit types. Only the winner gets built, and the rest remain unused, with most cases remaining unpublished.
The economic and legal framework is very similar, resulting in a range of layouts with limited differences in terms of dimension and programmatic resolution of the typologies. Typically, plans range between 50 and 90 square meters on average, and normal units consist of two to three bedrooms, with some exceptions of single-bedroom units. While limited in the variability of the types, each entry produces a set of solutions that collectively build a body of knowledge and information.
In summary, the public competitions for subsidized housing in Spain provide us with sufficiently homogeneous and high-quality data to generate the PUBLICPLAN database of housing typologies, with which we can train neural networks capable of producing new interior layouts.
Building the PUPLICPLAN database
Our first step in this process was to reach agreements with three of the main public housing agencies in Catalonia and the Balearic Islands (Impsòl, Incasòl, and Ibavi) to gain access to the public tenders conducted over the past 3 years. We redraw and labelled 2446 different layouts (84 from Ibavi, 753 from Impsòl, 1609 from Incasòl), extracted from 1284 competition entries (36 from Ibavi, 353 from Impsòl, 895 from Incasòl).
Labels of the pixels in every channel of RPLAN.
To ensure accuracy and precision, we chose to redraw each case individually. For each floor plan, we redraw the perimeter walls, the entrance door, interior partitions, interior doors, and the different functional areas of each typology, including: the living room, dining room, kitchen, bathroom, master bedroom, children’s room, second bedroom, guest room, balcony, entrance, hallways, and storage areas or cleaning rooms.
Each floor plan was converted into an image of 256 × 256 pixels with four channels. (Figure 5) Channel 0 contains perimeter information, channel 1 defines room types, channel two distinguishes rooms within the same type, and channel three masks interior and exterior areas. Due to the rigorous control of information sources, all layouts included in our dataset comply with current regulations and standards for social housing and are viable candidates for construction. Sample of redrawn competition plan with four channels.
After completing the labelling and redrawing of all plans (Figure 6), we have easy access to significant statistics on public housing competitions in Spain. There are many relevant questions that can be asked about the developed database, stemming from the ability to mathematize the different represented elements, thus opening up a precise and innovative form of analysis in architecture. Samples of PUBLICPLAN layouts.
Thus, we start from the most evidently quantifiable characteristics such as the area or proportions occupied by a type of room, the number of interior doors, linear meters of walls, the number of bedrooms or bathrooms, or whether the floors contain certain programmatic elements.
Once these obvious characteristics are indexed, we can begin to combine them to conduct a more detailed and qualitative numerical analysis of the floor plans in the database. For example, by dividing the perimeter and interior area, we obtain the degree of compactness of the proposals. By relating the perimeter to the number of bedrooms, bathrooms, or doors, we can understand how optimized they are. Studying the number of pixels in contact between the kitchen, dining room, and living room, we can rank them based on the degree of integration of the daytime area. Analysing the number of bedrooms surrounding the living room or balcony, we can establish the degree of centrality of these spaces. Likewise, we can determine if the bedrooms are directly connected to a bathroom or if there are in-suite bathrooms, that is, only accessible from a bedroom. We can also assess how close the kitchen is to the entrance, how long the diagonals connecting the rooms are, giving us an idea of how visually unified the layout is, or evaluate if kitchens and bathrooms are adjacent, forming wet cores. And although these statistics go beyond the scope of this work, they provide significant information available for future research. For example, in Figure 7 we can see a chart with the statistics of number of units with same number of bedroom types. Sample of statistics of PUPLICPLAN. Number of units with same number of bedroom types.
Experiment: Serial authorship potential of PUPLICPLAN
This experiment was conducted among others in a more in-depth study about the impact of qualified data in deep learning methods for automatic generation of housing layouts. 1 We trained three models: Graph2Plan, 15 Deeplayout, 14 which are specific layout generators originally trained on RPLAN, and Pix2Pix, 16 which is a general image-to-image translation model used for various tasks and datasets. The selection of these models is based on an exhaustive literature analysis performed by R. E. Weber, C. Mueller, and C. Reinhart 17 on methods for automated housing design. In the selection process, we identified and included all models that specifically emphasized collective housing and were trained using the RPLAN dataset.
Graph2Plan converts plan perimeters into floor plans using a layout graphs database and a floor plan database. The network takes a layout graph and a boundary as input and outputs refined room boxes and a raster image of the floor plan. The primary process involves Graph Neural Networks (GNN) and Convolutional Neural Networks (CNN), along with a BoxRefine Network.
Deeplayout utilizes sequential logic to locate rooms, starting with the living room and then continuing with others. After all rooms are located, it determines walls with an encoder-decoder network and post-produces the plans by refining the results.
Pix2Pix is a conditional adversarial network designed as a general-purpose solution to image-to-image translation problems, learning not only the mapping from input image to output image but also a loss function to train this mapping.
To study the ability of the three trained algorithms to learn the original series of social housing floor plans collected in the PUBLICPLAN database, we designed the following experiment, addressing the plausibility of whether the plans generated from PUBLICPLAN were created by humans. To evaluate this, we designed a set of pairs and asked architecture students to distinguish between human-generated layout plans and machine-generated layout plans. We paired human plans with designs generated by different models creating a total of 15 different pairs (see Figure 8). Each unit had the same perimeter. Social housing layouts are highly constrained and typically have similar dimensions (between 60 and 90 sqm) and between 1 and three rooms maximum. Within those ranges, we chose 15 cases that covered all the different architectural types (corridor-based, matrix-based, with and without kitchen integrated into the living area, and so on). We mixed five designs generated by each of the three models used—Graph2Plan, Deeplayout, and Pix2Pix—, which were shown randomly without revealing the model that generated each case. Each question was timed to be resolved within 20 s. For fairness, the plans were standardized in colour coding and graphic standards, as shown in Figure 9. To ensure legibility and the capacity for analysis while maintaining the agility of the questionnaire, a preparatory exercise was conducted internally within the research group with architects and PhD students, which validated the graphic layout design and the time to be resolved. The image shows all the different designs included in the questionnaire, with a total of 15 pairs. From left to right and top to bottom, the pairs were organized as follows, with H representing human-generated and M representing machine-generated designs: 1: H/M, 2: H/M, 3: M/H, 4: H/M, 5: M/H, 6: M/H, 7: H/M, 8: H/M, 9: H/M, 10: M/H, 11: H/M, 12: M/H, 13: M/H, 14: M/H, 15: M/H). Colour-program code for all layouts and sample of one couple for Experiment. Same perimeter and entrance point for both plans to be compared. One was designed by humans and the other one was designed by the machine.

The results show an overall performance 48% of layouts that were designed by machines, but users identified them as humans.
Conclusions
Upon analyzing the process of developing and utilizing databases to train AI algorithms in architecture, we realize that the notion of modern authorship is being modified as there is a blurring of boundaries between the typical intentionality of the individual author, the ability to incorporate proposals from multiple authors, and the immeasurable analytical capacity of neural networks.
Indeed, firstly, there is a selection of cases that make up the database, where there is a will, whether explicit or implicit, of an author. However, the selection may gather the work of multiple authors and therefore dissolve the individual perspective. On the other hand, the choice of representation form embedded in the database is clearly intentional, again reflecting the purpose of the database author. Next, the training and future productive-predictive capacity of the new cases lies in the immeasurable structure of the neural network, capable of establishing a common pattern with the multiple cases in the database. And finally, we return to an authorial perspective with the analysis and possible modification of the different described phases.
In this sense, databases entail a new paradigm in the architectural discipline: a design process that incorporates authorship from a position and function different from that of serial authors, but not without it, rather it expands it. While the mentioned modern authors serialized their individual authorship, the use of AI and the design of databases complexify the process. The assembly between author, editor, and algorithm is shifting towards a new augmented authorship where it is increasingly difficult to distinguish between analog and digital roles.
The new series are formed through graphs or pixels and identify patterns or styles that allow them to expand the series in different directions, operating as an expansion of analog authorship, mutating human synthetic-creative capacity towards a form of collective synthesis that is no longer solely human, introducing into the discipline a collective intelligence from which collective authorship derives.
Footnotes
Acknowledgements
We are thankful to Impsòl, Incasòl, Ibavi for providing access to the public competitions they organized and to all the participants in the experiments we conducted.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was primarily sponsored by a State R+D+I Program Oriented to the Challenges of Society. Ministry of Science and Innovation. Spain. AEI/10.13039/501100011033.
