Abstract
As automated retail environments continue to expand globally, their façade design has emerged as a critical factor influencing consumer perception, psychological comfort, and visual appeal. At the same time, current design practices often prioritize technological efficiency over visual wellness. This study proposes a generative AI-assisted design methodology grounded in the Environment-Based Design (EBD) framework. The approach emphasizes visual dimensions of the WELL Building Standard and integrates biophilic design principles to enhance façade aesthetics in automated retail contexts. This study has four research objectives: (1) to extract WELL-aligned visual design variables for façade design through literature, certification mapping, and case analysis; (2) to develop structured prompt strategies and ControlNet-based image generation workflows using Stable Diffusion XL; (3) to evaluate perceptual outcomes through Learned Perceptual Image Patch Similarity (LPIPS) metrics and expert scoring across wellness-relevant dimensions; and (4) to analyze design trade-offs and limitations, and identify opportunities for recursive improvement within the EBD framework. Eight façade images, consisting of the original and seven AI-generated variants, were evaluated by five experts using a 7-point Likert scale across five perceptual criteria. The results show that the "Material + Pattern" strategy received the highest ratings in perceived material quality and natural features; "Color + Material + Pattern" showed the most balanced overall performance. Perceptual similarity was quantitatively assessed using LPIPS, confirming that multidimensional interventions led to greater visual deviation from the original design. Expert comments emphasized the warmth and affinity created by natural textures, while cautioning against excessive decorative complexity. Open-ended feedback was subjected to thematic analysis, revealing nuanced perceptions of design richness, comfort, and realism. This study demonstrates the feasibility of operationalizing health-focused visual principles within an AI-assisted design pipeline. The proposed approach offers a scalable and reproducible method for enhancing the emotional and aesthetic quality of automated retail façades. Future research should extend the scope of visual dimensions, including form, signage clarity, and transparency, and incorporate multimodal user experience evaluations to better reflect real-world engagement.
Keywords
Introduction
Background
As automated retail stores spread worldwide, their spatial environment and visual design have become an important topic of public concern. These cashierless, self-check-out facilities employ artificial intelligence, sensor technologies, and advanced automation systems to provide continuous 24-hour service with minimal human intervention (Nam et al., 2025). Despite their significant operational and technological advantages, the architectural design of such stores, particularly the façade, is often neglected or oversimplified, resulting in environments that lack visual appeal, material expression, and psychological comfort (Majid, 2022). Prior research by Yun et al. (2024) has confirmed that integration of biophilic elements in automated stores positively affects visual attention and emotional response, as demonstrated through eye-tracking experiments and self-reporting methods. However, those findings were largely perceptual and lacked implementation through generative design strategies.
As contemporary design thinking places increasing emphasis on spatial quality and user well-being, health-oriented design, and wellness-focused frameworks such as the WELL Building Standard provide structured principles for improving both physical and psychological aspects of the built environment. Developed by the International WELL Building Institute (IWBI), the WELL v2 standard comprises 10 core concepts that target air, water, light, sound, materials, and mind, among others, to promote holistic health in architectural spaces (IWBI, 2020). Although primarily applied to healthcare, workplace and residential environments, specific principles pertaining to visual comfort, natural connection, and material clarity are equally relevant to the façade design of automated retail facilities. This trend reflects a broader shift in architecture and facility planning, where built environments are increasingly recognized as key determinants of human health, safety, and well-being (Marberry et al., 2022). However, despite the growing application of these standards in architectural practice, their integration into automated retail environments remains limited.
At the same time, recent advances in image-generative AI, such as Stable Diffusion XL with ControlNet, have created new opportunities for rapid visual exploration in architectural ideation (Cao et al., 2025; Podell et al., 2023). Yet, despite the acknowledged relevance of WELL and biophilic principles, few studies have systematically translated these frameworks into façade design strategies for automated retail stores. Architects and designers lack structured workflows that integrate wellness frameworks, generative design tools, and perception-based evaluation, creating a methodological gap in health-centered façade design research (Ali & Lee, 2023; Sourek, 2024).
To address this gap, this study proposes an Environment-Based Design (EBD) framework that integrates WELL and biophilic principles into a generative façade design strategy. Building on selected visually oriented criteria—color, material, and pattern—derived from WELL guidelines and biophilic design concepts (Kellert & Calabrese, 2015), this study applies the EBD logic in three phases: 1) extracting WELL-aligned visual variables; 2) constructing AI prompts for generative design; and 3) conducting expert-based perceptual evaluation (Zeng & Cheng, 1991).
The aim of this study is to establish a health-centered, AI-assisted façade design workflow by translating visual aspects of WELL and biophilic design concepts into generative parameters and evaluation structures for automated retail environments.
Research Objectives
To extract WELL-aligned visual design variables for façade design through literature, certification mapping, and case analysis. To develop structured prompt strategies and ControlNet-based image generation workflows using Stable Diffusion XL. To evaluate perceptual outcomes through LPIPS metrics and expert scoring across wellness-relevant dimensions. To analyze design trade-offs and limitations, and identify opportunities for recursive improvement within the EBD framework.
Research Questions
RQ0 (Integrative): How can WELL and biophilic design principles be operationalized—within an EBD framework—into promptable façade variables that guide generative AI toward health-aligned, attractive concepts for automated retail stores? RQ1: How can health-oriented façade design principles (WELL + biophilic) be translated into core visual variables suitable for AI-driven design of automated retail stores? RQ2: What are the visual impacts of AI-generated façade variations under different prompt strategies (e.g., Color, Material, Pattern)? How can they be evaluated? RQ3: What trade-offs and perceptual patterns emerge when combining multiple design variables? How can they inform recursive prompt adjustments in future EBD-guided workflows? RQ4: What methodological limitations exist in using expert-based and AI-driven workflows for WELL-aligned design? How can future research address these limitations?
Literature Review
The Rise of Automated Retail and Spatial Experience Issues
Automated retail stores, which operate without human staff using digital interfaces and AI systems, have evolved into diverse forms such as smart convenience stores, vending modules, and uncrewed supermarkets (Nam et al., 2025). These systems optimize transaction efficiency and labor costs, but often embody a “design vacuum”—a lack of human-centered aesthetic consideration (Majid, 2022). Most façades emphasize branding or technological functionality, often with repetitive, industrialized visual expressions (Jo et al., 2024).
In a recent experiment, Nam et al. (2025) found that consumer preferences for uncrewed stores are significantly influenced not only by operational factors, but also by environmental qualities such as spatial aesthetics and perceived safety. These findings suggest that physical design remains a crucial factor in shaping user acceptance, even in highly automated environments.
Meanwhile, architectural studies have revealed that spatial cues—such as natural light, texture, and form—are critical in establishing environmental legibility and emotional response. However, in automated retail design, literature has focused on interior layout, user-device interaction, and UX/UI optimization (Kim & Lee, 2021), at the expense of façade-related perceptual and psychological concerns.
Recent studies have emphasized the importance of physiological and behavioral responses in evaluating spatial environments. For instance, Kim and Kim (2022) applied biometric tools to quantify emotional responses to architectural stimuli, offering a data-driven perspective on spatial affect. Kim and Lee (2021) assessed consumer attention and arousal using eye-tracking technology in virtual retail environments, demonstrating how visual cues shape user perception and engagement. Similarly, Kim (2024) employed VR-based eye-tracking to capture initial gaze attraction in branded spaces, highlighting the influence of spatial composition on early user attention. These studies reinforce the value of integrating perceptual data into the evaluation of visual design features in commercial and retail spaces.
Environment-Based Design (EBD) in Architectural Reasoning
EBD did not emerge as a formal methodology until 2011 (Zeng, 2011), but its theoretical foundation was laid earlier in Zeng and Cheng's (1991) proposal of recursive logic as the logic of design. Unlike deductive or inductive reasoning, recursive logic frames design as a process in which problems, solutions, and knowledge evolve simultaneously, and always in relation to the environment (Zeng, 2002).
Building on this foundation, Zeng (2011) introduced EBD as a structured methodology premised on the idea that “design starts from the environment, functions for the environment, and brings changes to the environment.” This orientation positions the environment not as a backdrop but as both the source of inspiration and the locus of transformation.
EBD operationalizes this orientation through a recursive cycle of environment analysis, conflict identification, and solution generation, with each new solution immediately reintegrated into the environment for subsequent iterations until no further undesirable conflicts remain. Asking questions is the core of the EBD methodology (Wang & Zeng, 2009), the primary mechanism through which designers probe the environment to elicit hidden requirements, uncover implicit constraints, and generate the knowledge necessary to identify and resolve conflicts. By systematically asking both generic and domain-specific questions, designers expand and refine their understanding of the environment, ensuring that subsequent solutions are grounded in contextual realities rather than abstract assumptions. Finally, Zeng (2015) characterized this process as an environmental evolution, highlighting the co-development of problems, solutions, and knowledge as the environment changes.
In the context of automated retail design, where human presence is limited and operations are predominantly mediated through digital interfaces, the EBD framework lays a compelling foundation for the reintroduction of environmental quality and user perception into architectural considerations. By emphasizing environmental legibility, material articulation, and affective response, EBD encourages designers to reconceptualize façades not merely as branding surfaces, but also as perceptual interfaces that mediate the relationship between users and spatial environments In this study, three visual dimensions—color, material, and pattern—are the key perceptual mediators within this framework.
To consolidate the theoretical foundation, these dimensions can be explicitly linked to constructs from the EBD framework. EBD emphasizes the translation of environmental cues into cognitive representations that guide design reasoning (Zeng, 2015). According to this view, color has been shown to influence emotional states and stress regulation (Küller et al., 2006), reflecting EBD's concern with affective responses. Pattern relates to biophilic principles of complexity and order, aligning with EBD's recursive reasoning about environmental coherence (Yang et al., 2023). Material corresponds to the sensory attributes of the built environment, serving as inputs that shape perception and design decisions (Nguyen & Zeng, 2012; Yang et al., 2022). These links clarify the integration of visual dimensions within EBD while suggesting that future research may extend this mapping to additional WELL variables.
Furthermore, the EBD approach demonstrates conceptual alignment with generative design workflows, especially those characterized by iterative visual output and evaluation processes. In both frameworks, the design evolves through a recursive feedback cycle in which environmental requirements or design objectives are translated into visual representations, subsequently assessed and refined based on contextual fit. This structural correspondence underscores the applicability of EBD as a theoretical foundation for guiding the formulation of design instructions and the evaluation of visual outcomes in computational façade design practices.
Biophilic Design and WELL Building Standard
Biophilic design emphasizes humans’ innate tendency to connect with nature by integrating natural patterns, materials, and spatial forms into the built environment (Kellert & Calabrese, 2015). Numerous empirical studies have demonstrated that biophilic elements can reduce stress, support attention restoration, and enhance users’ spatial identity. These effects are especially valuable in high-frequency commercial environments, where functional efficiency often takes precedence over psychological comfort. Recent neuroarchitecture research has substantiated these findings. For instance, Kim and Gero (2022) showed that biophilic features can elicit measurable neurophysiological responses, reinforcing the relevance of biophilic principles in wellness-oriented architectural frameworks. Similarly, Jung et al. (2023) demonstrated in a virtual reality hospital patient room that introducing biophilic design elements such as plant walls and digital nature walls improved users’ emotional state. Questionnaire results indicated that plant walls reduced negative affect, while digital nature walls enhanced positive affect. EEG analysis revealed that biophilic design increased relaxation-related low-frequency activity and decreased tension-related high-frequency activity. These findings provide convergent psychological and neural evidence for the positive impact of biophilic interventions on human well-being (Jung et al., 2023)
The WELL Building Standard integrates the biophilia into multiple categories—including Mind, Light, Materials, and Biophilia I/II—highlighting its value in promoting visual wellness and user well-being (IWBI, 2020; Tabassum & Park, 2024). WELL's Biophilia I feature encourages the integration of nature through environmental elements and patterns; Biophilia II promotes a deeper, sustained human-nature connection.
Although WELL and biophilic strategies have been widely adopted in residential and workplace contexts (Kim & Park, 2025), their application to retail environments, particularly automated stores, remains underexplored. Recent bibliometric reviews indicate that WELL-related research is heavily skewed toward office and residential buildings, with commercial sectors receiving much less attention (Kokatnur et al., 2025). This issue is important because spatial design in these settings has a direct impact on user behavior, attentional focus, and emotional responses.
Recent research also highlights the importance of user-centered perception and affective experience in spatial design. Biometric-based methods are increasingly employed to measure emotional responses to architectural stimuli (Kim & Kim, 2022), offering insights beyond superficial aesthetics. These methods support the operationalization of WELL and biophilic concepts not only as theoretical ideals but also as practical design tools. In parallel, recent studies have explored how text-to-image generative AI tools can capture and reinterpret the visual language of biophilic design, offering new expressive possibilities in AI-assisted workflows (Thampanichwat et al., 2025).
AI-Assisted Design and Façade Generation Tools
Recent advances in generative artificial intelligence have introduced new opportunities for architectural design, particularly through image-to-image (img2img) generation workflows. Among these, Stable Diffusion XL (SDXL) has emerged as a powerful open-source tool that enables high-quality visual outputs guided by both textual and visual inputs (Cao et al., 2025). In contrast to closed commercial platforms such as Midjourney, DALL·E, and Imagen, SDXL supports extensible control modules like ControlNet and provides transparent parameterization, which is advantageous for reproducibility and method disclosure in architectural research. When paired with ControlNet, SDXL facilitates structure-preserving img2img workflows particularly suitable for façade contexts where geometric fidelity must be maintained (Podell et al., 2023).
AI approaches to façade design can be compared along several categories. GAN-based methods have been applied to façade and style generation, producing compelling transformations but requiring curated datasets and extensive training, while offering limited semantic controllability (Wang et al., 2022). Commercial diffusion tools such as Midjourney deliver high-quality visualizations and rapid outputs but operate as closed systems, restricting access to intermediate control signals and limiting their integration with wellness-oriented semantics (Jo et al., 2024; Kim & Park, 2025). Parameter-efficient fine-tuning techniques like LoRA provide lightweight adaptation and allow style-specific training with small data requirements; however, they mainly inject stylistic priors and do not independently guarantee geometry preservation or the mapping of certification standards into visual outputs (Petráková & Šimkovič, 2023; Ali & Lee, 2023). In contrast, the SDXL + ControlNet configuration used in this study enables both localized control and transparent prompt-level manipulation of WELL-aligned visual dimensions (Color, Material, Pattern), offering methodological clarity and semantic rigor that other approaches lack.
These capabilities are well-suited for early-stage design ideation. Designers can utilize text- or image-guided diffusion models to explore stylistic variations and reinterpret architectural elements with improved semantic fidelity, as demonstrated by recent advancements in semantic image synthesis (Ali & Lee, 2023; Petráková & Šimkovič, 2023; Wang et al., 2022). Recent studies have also explored the integration of local identity and biophilic design principles into the generative pipeline, thereby enhancing the cultural and psychological relevance of AI-generated façades (Jo et al., 2024; Kim & Park, 2025; Thampanichwat et al., 2025). However, few investigations have aligned these outputs with wellness certification systems such as the WELL Building Standard—an area addressed by this study.
Beyond image generation, researchers have examined how these tools affect design thinking and creativity.) Veloso (2025) discusses the use of multimodal large language models and precedent-based reasoning in architectural education, reflecting a growing shift toward collaborative and adaptive design paradigms (Jun & Jia, 2025).
Evaluation methods for generative outcomes are also advancing. LPIPS is widely used to measure visual deviations from baseline designs, while expert-based evaluations provide valuable insight into experiential quality and conceptual fit (Sourek, 2024). Recent studies assess the consistency and usability of synthetic datasets produced by Stable Diffusion (Stöckl, 2023), supporting the development of reliable, quantifiable metrics for AI-assisted façade design.
Taken together, these comparisons clarify the rationale for the method selection in this paper. GAN-based approaches demand high training costs and offer weak semantic alignment, Midjourney provides closed but visually appealing outputs, and LoRA improves adaptability yet focuses mainly on stylistic adaptation. Against this backdrop, SDXL + ControlNet emerges as the most proportionate and reproducible choice for early-phase façade ideation under WELL/EBD constraints, ensuring that generative outputs are both visually compelling and aligned with wellness- and biophilia-oriented design logic.
Methodology
This study adopts the EBD theory (Zeng, 2011; Zeng & Cheng, 1991) as the guiding methodological framework to develop an AI-assisted, health-oriented façade design process for automated retail stores. EBD emphasizes designing from, for, and to the environment—advocating a recursive logic that connects environmental inputs, design intentions, and feedback outcomes in a dynamic loop. Unlike function-based or purely formal approaches, EBD begins by analyzing and modeling environmental conditions across three domains—human, built, and natural—and transforms them into operable design requirements from the earliest stage.
In the context of automated retail stores—typically characterized by minimal human presence and utilitarian aesthetics—façade design often lacks emotional resonance, natural integration, and perceptual appeal. To fill this gap, this study begins with WELL Building Standard principles and translates selected features into actionable visual dimensions (Color, Material, Pattern). These dimensions serve as the basis for semantic prompt construction, guiding generative design via Stable Diffusion and ControlNet. The aim is to visualize biophilic design intentions within technically constrained, high-frequency retail façades.
To establish a recursive and closed-loop design logic that progresses from environmental intention through AI-generated expression to perceptual evaluation, this study adopts a three-phase methodological structure.
Visual Dimension Framework Development Through literature synthesis, WELL feature analysis, and representative case studies, this phase identifies three perceptually salient design dimensions—Color, Material, and Pattern. These dimensions form the semantic control structure for constructing prompts in the AI generation stage. AI Image Generation and Technical Evaluation Using Stable Diffusion XL with ControlNet, the study performs guided image-to-image inpainting of baseline façades. All generated images are quantitatively compared with the original façade using the LPIPS metric to assess the degree of visual variation and perceptual deviation. Expert Evaluation and Framework Validation Expert reviewers in architecture and wellness design assess the images based on pre-defined perceptual criteria. Their feedback is analyzed to trace prompt effectiveness and design coherence. Finally, the overall design logic is formalized using a Recursive Object Model (ROM) to visualize the structure of the EBD-informed generative process.
This three-stage approach proposes a novel, environment-led design framework that uses AI tools not merely for image synthesis, but also for translating certification logic into perceptual design quality. It offers an operational model for applying EBD to façade development in human-absent, health-sensitive spaces such as automated retail stores.
To illustrate the recursive design logic central to this study, Figure 1 presents a ROM diagram that visualizes the structured relationship between the environmental context, prompt design, generative image production, and perceptual feedback.

ROM of the AI-Assisted WELL-Based Façade Design Process).
Identified Visual Dimensions from Literature Review
This phase establishes the theoretical foundation for AI-assisted design by identifying three core visual dimensions—Color, Material, and Pattern—through a systematic literature review of biophilic design, architectural visualization, and health-oriented building frameworks.
The selection of these dimensions was guided by three criteria: (1) relevance in existing design and environmental psychology research; (2) controllability in prompt-based image generation; and (3) perceptual clarity for expert evaluation. Among the range of visual attributes found in literature, these three were determined to be the most suitable for façade-level interventions in automated retail stores, where space is limited, interaction time is short, and branding needs are strong.
Color refers to the use of natural, calming, or psychologically supportive hues in façade elements such as signage, frames, or lighting. It is critical for conveying emotional tone, environmental harmony, and spatial legibility in compact urban retail contexts.
Material focuses on the visual representation of surface textures and finishes, including wood, metal, or patterned glass. These influence perceived warmth, safety, and biophilic authenticity—qualities essential in building trust and comfort in unattended commercial spaces.
Pattern encompasses geometric rhythm, modular layering, and ornamental repetition within the composition of façade components. Pattern contributes to visual richness and cognitive engagement, especially important in standardized and visually competitive environments like street-facing retail stores.
Together, color, material, and pattern offer a perceptually grounded and technically operable framework for AI-assisted design generation. They allow for structured prompt development, image variation control, and expert-based evaluation. Most importantly, this dimension system acts as a conceptual bridge between design generation and health-oriented goals, aligning visual aesthetics with principles from biophilic design and WELL-based wellness frameworks.
Although visual dimensions such as façade shape articulation or form are also frequently discussed in WELL-related and biophilic design literature, this study focuses on Color, Material, and Pattern due to their high frequency across biophilic and wellness-oriented design frameworks, as well as their practical controllability within current AI image generation workflows. Other dimensions were excluded at this stage to maintain experimental focus and prompt clarity. Table 1 summarizes the identified visual dimensions and their theoretical grounding in biophilic and wellness-oriented design frameworks.
Visual Dimension Analysis.
Certification Standards Analysis Based on WELL
To ensure that the AI-generated façade designs are aligned with health-oriented and biophilic principles, this study adopts the WELL Building Standard as its primary evaluative framework. A comprehensive review was conducted on the ten core concepts of WELL v2 (IWBI, 2020), in addition to biophilic-related entries from WELL v1—specifically Feature 88 Biophilia I—Qualitative and Feature 100 Biophilia II – Quantitative—to determine their applicability to the visual aspects of façade design.
While categories such as Air, Water, and Sound offer minimal direct relevance to external visual expression, other WELL concepts—such as Light, Materials, Mind, and Community—provide meaningful guidance on how built environments can support human health and perception through design. These features are therefore mapped to the three core visual dimensions identified in Section 3.1.
This mapping enables the translation of abstract certification principles into tangible visual variables that can be operationalized in AI prompt formulation and image evaluation. The Color dimension reflects WELL features that address daylight autonomy, lighting quality, and visual harmony, all of which influence user comfort and emotional response. The Material dimension aligns with WELL requirements for transparency, non-toxicity, and natural sourcing, supporting healthful and biophilic material application. The Pattern dimension draws from WELL's emphasis on wayfinding, spatial rhythm, and community identity, guiding the application of geometric and culturally resonant design elements.
The detailed correspondence between WELL's features and the three visual dimensions is summarized in the following tables:
Table 2-A presents WELL features associated with Color, emphasizing light, emotional tone, and nature-inspired palettes.
Mapping of WELL Features to the Visual Dimension: Color.
Table 2-B organizes features for Material, highlighting traceability, texture, and biophilic authenticity.
Mapping of WELL Features to the Visual Dimension: Material.
Table 2-C outlines entries linked to Pattern, focusing on form rhythm, visual orientation, and cultural symbolism.
Mapping of WELL Features to the Visual Dimension: Pattern.
These structured mappings provide a conceptual and methodological foundation for AI-driven façade generation that prioritizes visual wellness. They also enhance expert evaluation by offering clearly defined references rooted in a globally recognized certification system.
Furthermore, this mapping also reflects the conceptual logic of the EBD framework. Prior studies have demonstrated the adaptability of EBD across domains, such as its application to quality management systems (Sun et al., 2011). By linking color, material, and pattern to WELL variables within an EBD perspective, this study situates visual façade design in a recursive reasoning process, where environmental cues are translated into cognitive representations that guide design decisions. This cross-theoretical connection underscores the broader validity of integrating WELL-based standards into computational façade design.
In addition, this study deliberately limits its scope to visual WELL concepts. Non-visual categories such as Air, Water, and Sound, while essential to occupant well-being, were excluded because they cannot be represented in façade imagery or assessed through visual perception methods. In contrast, the dimensions of Color, Material, and Pattern are directly observable in generative outputs, correspond to WELL principles related to Light, Materials, and Mind, and are consistent with biophilic design theory (Kellert & Calabrese, 2015; Yun et al., 2024). This methodological narrowing ensures consistency between the design variables, the capabilities of AI-based façade generation, and the expert evaluation process, thereby reinforcing the academic rigor of the dimension selection.
Case Analysis of Automated Retail Stores
To support the development of a WELL-based façade design framework for automated retail stores, this study analyzes five cases: Amazon Go (USA), Bingo Box (China), 7-Eleven Shop & Go (Singapore), Lawson Digital Store (Japan), and Super Swift (Korea). Unlike previous research that focuses on interior layouts and operational systems, this study investigates façade expression and organizes the analysis according to visual elements aligned with WELL criteria.
As discussed in Sections 3.1 and 3.2, a preliminary set of façade design dimensions—color, material, pattern, shape, texture, and transparency—was extracted through literature review and mapped to relevant WELL v1/v2 certification entries. Through further observation of actual design trends in selected cases, Color, Material, and Pattern emerged as the most frequently expressed and perceptually impactful dimensions. These elements consistently conveyed spatial atmosphere, directed user attention, and enhanced natural perception across diverse store types.
The key characteristics of each case are as follows.
Amazon Go (USA): A glass curtain wall with dark metallic frames and warm lighting accents; clean modular divisions and light-dark contrast enhance visibility and night recognition. Bingo Box (China): Bold orange-white color blocks, steel container materials, and illuminated signage; modular LED strip arrangements introduce a rhythmic visual composition. 7-Eleven Shop & Go (Singapore): Silver-toned aluminum cladding and frameless sensors; subtle horizontal panel lines suggest a calm and minimalistic visual texture. Lawson Digital Store (Japan): Light matte materials and soft wood-textured panels; subtle repetition of vertical planks and warm signage evoke locality and coherence. Super Swift (South Korea): Translucent glass and film combined with warm wood-like tones; clean layering and soft graphical decals support biophilic comfort within a transparent layout.
Notably, while the “Pattern” dimension in this case analysis includes compositional rhythm, segmentation, and surface articulation—reflecting common façade design logic in real-world automated stores—this study intentionally narrows its interpretation in the generative design phase. For the purpose of prompt control and AI-based image generation, “Pattern” is defined as two-dimensional surface graphics or decorative motifs, such as organic overlays or geometric ornaments. This ensures consistency in variable manipulation while aligning with WELL's emphasis on visual coherence and biophilic harmony.
These comparative observations are summarized in Table 3, which visualizes the Color, Material, and Pattern characteristics of each case along with corresponding façade images for reference. The analysis validates the practical relevance of the three dimensions in addressing WELL-related façade considerations, thus finalizing them as core input parameters for the design experiment.
Visual Dimensions × Case Study Visual Element Matrix.
Prompt Strategies as Design Direction
To enable the effective integration between architectural certification standards and AI-assisted façade design, this study establishes a structured three-layered prompt structure aligned with three core visual dimensions—Color, Material, and Pattern—as defined in prior visual analysis. The three-layered prompt structure unifies the vocabulary—WELL-based and descriptive keywords—with the grammar of sequencing and weighting. This integration provides a controllable and replicable foundation for image generation.
Each prompt group is systematically constructed using the following layered logic:
Core Elements (Nouns): Represent architectural features such as “glass storefront,” “wood-clad façade,” or “entrance wall.” These serve as the semantic anchors of the generated image. Descriptive Modifiers (Adjectives): Describe visual characteristics including texture, tone, materiality, or lighting, such as “earth-toned,” “modular,” or “leaf-inspired. WELL-Based Labels: Directly map to WELL Building Standard (v1/v2) and biophilic design concepts such as “visual comfort,” “material health,” or “spatial identity.”
This prompt logic is not merely descriptive but operational, allowing prompts to act as design direction modules for AI image generation, rather than fixed instructions. Combined with ControlNet execution, this layered vocabulary enables partial inpainting of source images, preserving spatial context while selectively altering the appearance of the façade under different visual intentions.
The system is summarized in Table 4, which outlines how each visual dimension is translated into prompt keywords.
System Prompt Examples by Dimension Category.
Image Generation Process
To generate façade images of automated retail stores that align with the WELL Building Standard and reflect differentiated design intentions across the core visual dimensions of Color, Material, and Pattern, this study adopts a structured workflow based on the Stable Diffusion XL (SDXL) model using img2img local inpainting, combined with ControlNet control modules.
Building upon the three-layered prompt structure introduced in Section 3.4, this system consists of three components: core elements such as façade types and architectural components, descriptive modifiers including textural and chromatic attributes, and WELL-based labels that reflect principles of health, comfort, and biophilic design. Together, these components translate abstract design goals into concrete semantic inputs for generative AI.
Unlike conventional text-to-image generation approaches, this study uses real-world façade photographs as structural seed images to retain the spatial logic and contextual layout of the original buildings within their urban environments. Through localized image-to-image generation, only specific regions of the façade are transformed in style or visually enhanced, thereby preserving the overall structure and avoiding unintended distortions.
In practice, the original façade image of the Super Swift automated retail store was selected as the base input. A Depth-based ControlNet module (using the control_v11f1p_sd15_depth model with the MiDaS v3 preprocessor) was applied to extract a depth map of the façade (see Table 7 in Section 4.1.1), which provided edge-preserving control during inpainting. This approach ensured that architectural boundaries and spatial proportions were maintained, while enabling targeted manipulation of visual attributes. For the sake of transparency and reproducibility, the standardized parameter configuration of the ControlNet Depth module—including resolution, weight, and conditioning steps—is reported in detail in Section 4.1.2, together with the complete negative prompts list used to suppress artifacts.
These control maps work in tandem with the prompt strategies to guide generation in a structured and replicable manner. Each image undergoes iterative refinement through prompt modification, ControlNet parameter adjustment, and manual visual inspection. ChatGPT is used to assist in semantic restructuring and prompt recomposition, helping ensure that the resulting images reflect WELL-oriented design intentions more precisely.
This study emphasizes that the goal of image generation is not to deliver finalized design outputs. Rather, it serves as an AI-assisted tool that offers direction and inspiration aligned with healthy-building principles. The AI-generated images present designers with diverse expressive possibilities—such as natural materials, warm tones, and rhythmic compositions—within localized areas, encouraging exploratory and adaptable design thinking.
To ensure the representativeness of the images used for evaluation, approximately 20 images per visual condition are generated. A panel of five experts with backgrounds in architecture and spatial design participate in an initial selection process. Within each image set, they reach a consensus to select the image that best exemplifies the intended visual dimension, which is then included in the expert evaluation phase.
Overall, under the guidance of WELL certification logic and prompt-controlled visual semantics, this study constructs a façade design workflow that integrates generative AI with structural control. This approach enables precise manipulation of visual dimensions and provides a systematic and assessable design support tool for future architectural practices. Figure 2 illustrates the workflow of AI-assisted façade design for automated retail stores, integrating visual dimension identification, prompt-based image generation, and expert evaluation.

Workflow of AI-Assisted Façade Design for Automated Retail Stores.
Image Evaluation Methods
To assess the performance of AI-generated images in façade design, this study adopts a dual-method evaluation system combining quantitative technical indicators and expert-based scoring with a Likert scale. This integrated approach ensures comprehensive evaluation from both visual quality and design applicability perspectives.
Technical Evaluation Indicators
The following metric is used in this study.
LPIPS score:
Used to evaluate the perceptual similarity between image variants under the same design dimension. It is more sensitive to structural and semantic differences than pixel-level indicators, and is well-suited for comparing images modified via SD with ControlNet. A lower LPIPS score indicates higher visual similarity between images with minimal unintended noise or distortion.
Expert Evaluation Using Visual Dimension Scoring Table
To evaluate the subjective visual quality of the AI-generated façade images, a structured evaluation form was developed based on key visual dimensions. These criteria—façade material, transparency, color, design complexity, and natural features—were derived from research on the impact of exterior design on retail business performance and customer attraction (Majid, 2022). In addition, elements from the WELL Building Standard and biophilic design theory were referenced to ensure alignment with established wellness-related architectural principles (Browning et al., 2014; Kellert, 2008;). The retained evaluation dimensions and their justification are summarized in Table 5, mapping visual aspects to corresponding WELL features.
Evaluation Dimension Scoring Table.
The structure of the evaluation framework was also informed by previous façade design assessment studies to ensure theoretical rigor and practical applicability (Kim & Park, 2025; Shan & Junghans, 2023). An expert panel of five professionals, each with over ten years of experience in architectural, interior, or biophilic design, independently evaluated the generated images using the standardized assessment table. All panel members held professional backgrounds in architecture but represented diverse areas of expertise: two professors specializing in architectural design and spatial planning with a focus on biophilic design, one WELL certification expert, and two PhD researchers specializing in environmental psychology and spatial perception in WELL-oriented design. This breadth of experience ensured that the evaluation incorporated expertise in design, WELL-related knowledge, and user-centered perceptual insights.
A 7-point Likert scale evaluation form was developed based on the retained visual dimensions aligned with WELL standards and biophilic design principles. Experts scored each image based on the following criteria, using a scale where: −3 = Very Poor, −2 = Poor, −1 = Slightly Poor, 0 = Neutral, + 1 = Slightly Good, + 2 = Good, + 3 = Excellent. The higher the score, the more closely the image aligns with the described design feature. The final set of evaluation criteria and scoring framework is presented in Table 6. In addition to the quantitative Likert-scale evaluations, this study collected open-ended responses from the five expert participants to capture more nuanced perceptions and personalized design suggestions. These qualitative responses were analyzed thematically to identify common evaluative criteria, emotional impressions, and critiques that are not fully represented by numerical ratings. This mixed-method approach aims to enrich the evaluation framework and arrive at deeper insights into the semantic impact of AI-generated façade designs.
Evaluation Dimension Scoring Table.
Data Analysis Methods
To evaluate whether the AI-generated façade images demonstrate effective visual differentiation under WELL-based design strategies, this study employed a two-level mixed-methods analysis.
Quantitative analysis was conducted as follows.
Perceptual similarity between original and generated images was assessed using the LPIPS metric. This helped quantify the visual divergence across different prompt strategies and control conditions.
Expert evaluations were collected through a 7-point Likert scale ranging from −3 (Very Poor) to +3 (Excellent), across five dimensions: Façade Material, Transparency, Color, Design Complexity, and Natural Features. Descriptive statistics, including mean scores, standard deviations, and trend charts, were first computed to compare the performance of each image category.
Prior to inferential testing, Shapiro–Wilk tests with Q–Q plot inspection was performed for each Image × Dimension cell (n = 5 per cell) to assess the normality assumption. Normality was evaluated at the α = .05 level, and potential deviations were further checked by false discovery rate (FDR) correction and visual inspection of distributions. Sphericity was examined using Mauchly's test, and when violated, Greenhouse–Geisser corrections were applied. In addition, repeated measures ANOVAs were conducted to test for statistically significant differences across conditions and dimensions. All statistical analyses were performed using IBM SPSS Statistics version 27.
To complement the numerical scores and explore more nuanced design perceptions, this study then conducted thematic analysis on the experts’ open-ended responses, following the six-step approach proposed by Braun and Clarke (2006):
Familiarization with the data: All textual feedback from experts was read several times to gain an initial understanding of recurring perceptions. Generating initial codes: Key phrases related to visual comfort, naturalness, material perception, and biophilic effects were systematically coded. Searching for themes: Codes were clustered into broader themes such as “natural integration,” “visual coherence,” “material authenticity,” and “aesthetic inconsistency.” Reviewing themes: The emergent themes were cross-checked against the raw data and adjusted for consistency and distinctiveness. Defining and naming themes: Each theme was refined and given a clear operational definition to reflect its design relevance and connection to WELL-based visual dimensions. Producing the report: The finalized themes were integrated into the results discussion to explain expert preferences and design implications beyond quantitative scores.
This multi-layered analytical approach enables the systematic examination of visual differentiation and perceptual responses to WELL-based AI design strategies.
Results
Façade Image Generation Results
Local Redrawing and Structural Control
To preserve the spatial integrity and real-world context of the automated retail store “Super Swift” façade, this study uses a photographic image of the original building as the structural seed input for AI-based regeneration. Given the functional nature of automated retail stores, safety and visibility are critical considerations, making transparency an essential design requirement. Therefore, when redesigning the façade, we incorporate transparency by retaining key elements such as the entrance door and clear glass panels, while modifying other aspects to enhance visual quality and aesthetic appeal integrating WELL and biophilic design principles.
A two-step generation strategy is adopted: Local inpainting masks are manually applied to designate editable façade regions. ControlNet modules are then used to guide structural constraints and ensure consistent alignment with the base image during style transformation.
Three modification zones are defined based on architectural features and design flexibility (see Table 7):
Signage Zone (Green Area): This upper section includes the storefront signage and adjacent wall surfaces. It accommodates adjustments in both material and color, such as wood panels, biophilic overlays, or soft tone finishes, along with potential pattern applications to enhance rhythm and identity. Framing System (Yellow Area): Covering the window and door frames, this zone supports the replacement or enhancement of materials (e.g., wood or aluminum) and color accents, allowing modulation of warmth, contrast, or tactile appeal. Glazing Zone (Blue Area): Representing the transparent glass façades and doors, this area is vital for ensuring interior visibility and retail openness. Only minimal and non-obstructive interventions—such as light-permeable patterns or ambient lighting effects—are introduced to maintain transparency and user trust.
Region-Specific Inpainting for Localized Generation Based on ControlNet Structure.
The entrance zone remains unmodified across all generation conditions to safeguard accessibility and spatial legibility. These targeted redrawing strategies help ensure that WELL-based design intentions are precisely mapped onto appropriate façade components without disrupting the overall architectural coherence.
Visual Outcomes Under Different Prompt and Dimension Controls
This study evaluates the visual effectiveness of WELL-based façade design strategies by comparing two approaches. The first is a Baseline Design with a conventional commercial appearance, while the second is a WELL-Based Design guided by WELL v2 principles across the visual dimensions of Color, Material, and Pattern.
Both strategies are applied to the same base structure—the Super Swift store—using Stable Diffusion XL with localized inpainting and structure-preserving control through ControlNet. The WELL-Based strategy involves seven controlled generation conditions: three single-dimension cases (Color only, Material only, Pattern only), three dual-dimension combinations (Color + Material, Color + Pattern, Material + Pattern), and one full integration (Color + Material + Pattern).
The prompt formulation follows a structured three-layered logic:
Core Elements: Key architectural targets such as “storefront,” “glass façade,” or “signage band” Descriptive Modifiers: Adjectives expressing visual qualities, including “natural,” “textured,” “modular,” or “warm-lit” WELL-Based Labels: Semantic phrases derived from WELL principles, such as “visual comfort,” “natural material,” or “biophilic expression”
These components are compiled into operative prompts with selective emphasis weights, guiding AI-based image generation while maintaining spatial consistency. For example, a Color + Pattern prompt may emphasize natural tones and curved botanical graphics applied to signage or glazing areas.
Table 8 summarizes the seven façade generation strategies and their associated evaluation dimensions.
Evaluation Dimension Scoring Table.
Table 9 presents a consolidated overview of each conditions prompt structure, emphasis phrases, generation parameters, and the representative output image. The “Prompt Highlights” field illustrates a representative example of the composed input prompt used in the generation process, reflecting the integrated logic of the three-layer structure. Meanwhile, the “Selection Logic” summarizes the rationale behind the expert-based selection of each output. Unless otherwise stated, all images were generated using the following standardized configuration: Stable Diffusion XL model, DPM++ SDE sampler, 20 sampling steps, CFG scale of 7, and denoising strength between 0.5 and 0.6. The output resolution was fixed at 1125 × 844 pixels. To maintain spatial alignment and architectural structure, ControlNet (Depth) modules were employed to extract edge-preserving features from the original Super Swift façade image for controlled inpainting.
Original Image vs. Generated Images by Dimension Display.
The ControlNet module was configured using the control_v11f1p_sd15_depth model with the MiDaS v3 preprocessor. Unless otherwise stated, all parameters were kept at their default settings. The control resolution was fixed at 512, which is the default balance between feature extraction accuracy and computational efficiency. The control weight was set to 1.0, and the guidance start/stop steps followed the default values of 0 and 1, ensuring full-step conditioning across the generation process. The control mode was configured as Balanced, and the resize mode was left at the default Crop and Resize option, which preserves spatial consistency between the control map and the input image. No additional ControlNet variants were applied, since only the Depth module was employed in this study. In all generation conditions, a standardized set of negative prompts was applied to suppress visual artifacts and enhance image clarity:
over sharpening, dirt, bad color matching, graying, wrong perspective, distorted person, Twisted Car, NSFW, (worst quality:2), (low quality:2), (normal quality:2), lowres, (monochrome), (grayscale), blurry, signature, drawing, sketch, text, word, logo, cropped, out of frame, nsfw.
This negative prompt configuration was consistently used to minimize undesired visual outputs and ensure higher fidelity aligned with architectural realism.
No fixed seed was manually applied during generation, but the consistency of visual intent across attempts was ensured through carefully controlled prompt content and spatial constraints such as ControlNet masking.
Visual Comparison with Original Designs
To assess the visual perceptual differences between the AI-generated façades and the original baseline design, this study applies LPIPS analysis. LPIPS measures perceptual similarity by computing feature distances from a pre-trained neural network, allowing for a more human-aligned assessment of visual changes.
The analysis compares each generated image—based on seven WELL-based visual control conditions—with the original Super Swift façade. A higher LPIPS score represents a greater visual deviation from the baseline; a lower score suggests closer resemblance.
Table 10 summarizes the LPIPS scores for each condition, highlighting how design strategies affect the perceptual outcome. Notably, the Material only condition achieved the lowest score (0.1601), indicating minimal change and high visual similarity with the original. Conversely, the Color + Material + Pattern condition produced the highest score (0.3807), reflecting the most significant transformation across all visual aspects.
LPIPS Scores and Interpretation of Visual Deviation.
To illustrate this trend, Figure 3 presents the LPIPS scores in a bar chart. It clearly shows a progressive increase in perceptual deviation as more visual dimensions (Color, Material, Pattern) are combined. This trend quantitatively supports the idea that multidimensional WELL-based façade modifications lead to higher visual impact and distinguishability. These findings reinforce the controllability and sensitivity of prompt-based AI design strategies in modulating the perceptual outcomes.

LPIPS Scores Between Original and Generated Façade Images.
Beyond numerical comparison, the LPIPS results also offer design-relevant insights. Conditions driven primarily by color adjustments, such as the Color-only variant, generated relatively high perceptual deviations but did not lead to meaningful improvements in nature-related qualities. This indicates that noticeable visual change alone is insufficient to enhance restorative potential if the intervention lacks biophilic grounding. In contrast, experts perceived material- and pattern-based modifications, which produced moderate LPIPS deviations, as more coherent and beneficial. These findings suggest that façade design practice should not equate higher perceptual deviation with better outcomes. Instead, effective WELL-oriented interventions require a calibrated balance: introducing enough change to differentiate the design from a conventional baseline, while maintaining perceptual coherence and selectively enhancing natural attributes. Further details on how these deviations align with expert perceptions are elaborated in Section 4.3.
Expert Evaluation Results
To complement the perceptual similarity analysis, this study conducted expert evaluations on eight façade images. These included one original and seven AI-generated variants. The evaluations were based on five key visual dimensions: façade material, transparency, color, design complexity, and natural features. The evaluation employed a 7-point Likert scale ranging from −3 (Very Poor) to +3 (Excellent), and was completed by five experts with backgrounds in architecture and environmental psychology. The internal consistency of the expert evaluation was verified using Cronbach's alpha, which yielded a value of 0.837, indicating a high level of reliability across the five dimensions.
In addition, Kendall's W tests were conducted to assess inter-rater agreement within each visual dimension. Results showed moderate and statistically significant agreement in Material (W = 0.373, p = 0.018) and Naturalness (W = 0.405, p = 0.011), and high agreement in Transparency (W = 0.599, p < 0.001). In contrast, agreement in Color (W = 0.170, p = 0.246) and Design Complexity (W = 0.190, p = 0.194) was low and not statistically significant. These findings suggest that while experts held relatively consistent views on materiality, naturalness, and transparency, their perceptions of color and complexity were more divergent.
Table 11 presents the mean scores and standard deviations for each image across the five evaluation dimensions. The results show that the baseline image (Image 1) received a relatively high score in Transparency (M = 2.40, SD = 0.894) but a notably low score in Natural Features (M = -2.60, SD = 0.548) and Color (M = -0.80, SD = 1.924), indicating a lack of biophilic qualities in the original real-world design. This underscores the necessity of integrating WELL-based and biophilic design principles into the visual enhancement of automated retail façades.
Expert Evaluation Scores by Visual Control Condition.
Among the AI-generated images, the Material + Pattern condition (Image 7) achieved the highest ratings in both Façade Material (M = 1.80, SD = 0.837) and Natural Features (M = 2.00, SD = 0.707), suggesting that the combination of natural textures and biomorphic patterns is particularly effective in enhancing expert-perceived design quality. At the same time, experts noted that the strong visual presence of this combination may risk “visual overload,” highlighting the importance of balancing biophilic richness with visual simplicity to maintain comfort and prevent cognitive strain. In contrast, the Color + Material + Pattern condition (Image 8), while aiming for comprehensive optimization, did not achieve the highest scores in any individual dimension. This result implies that overloading multiple visual interventions may introduce perceptual trade-offs or cognitive strain.
Normality checks using Shapiro–Wilk tests with Q–Q plots indicated no systematic violations after FDR correction, supporting the use of repeated-measures ANOVAs. To evaluate whether the observed differences across conditions and dimensions were statistically significant, a series of repeated measures ANOVAs was conducted with Condition (8 levels) and Dimension (5 levels) as within-subject factors. The analysis revealed a significant main effect of Dimension, F(4,16) = 3.46, p = 0.032, indicating that experts’ ratings differed substantially across perceptual dimensions. The main effect of Condition alone was not significant, F(7,28) = 1.51, p = 0.205. Importantly, the Condition × Dimension interaction was significant, F(28,112) = 3.62, p < 0.001, suggesting that the influence of design interventions varied depending on the evaluation dimension. Follow-up one-way ANOVAs for each dimension revealed that significant effects of Condition were present for Naturalness, Transparency, and Material, but not for Color or Complexity. These results confirm that design strategies strongly influenced perceptions of restorative qualities, materiality, and transparency, whereas evaluations of color and complexity were more divergent. Given the small expert sample (n = 5), these inferential results should be interpreted as exploratory. Taken together, the normality checks, and the repeated-measures ANOVA results provide a consistent basis for interpreting expert evaluations. Table 12 reports the per-dimension one-way repeated-measures ANOVA statistics and exact p-values, indicating which dimensions exhibited significant condition effects.
Results of Repeated Measures ANOVAs for Expert Evaluations Across Five Dimensions.
Note: n.s. = not significant; * p < 0.05, ** p < 0.001. Tests are based on one-way repeated measures ANOVAs with Condition (8 levels) as the within-subject factor for each dimension (n = 5 experts).
Figure 4 provides a radar chart visualization of the expert ratings, offering a comparative perspective across all eight design strategies. The chart highlights the distinctive perceptual impacts of each visual control. For instance, the Color-only image (Image 2) shows a notable improvement in the Color dimension, while exhibiting minimal influence on Natural Features. The baseline image excels in Transparency but demonstrates significantly lower performance in nature-related dimensions.

Expert Evaluation Trend Across Visual Strategies.
When considered together with the LPIPS results reported in Section 4.2, clear patterns emerge. The Material-only condition showed the smallest deviation and provided little improvement in wellness-related qualities, indicating that minimal change rarely enhances restorative attributes. In contrast, the Color + Material + Pattern condition produced the largest deviation but was not rated most effective by experts, suggesting that excessive transformation may undermine coherence and risk visual overload. The Material + Pattern condition lay between these extremes: it showed a moderate LPIPS score yet received the highest ratings for naturalness and material quality, demonstrating that balanced modifications can enhance restorative potential while preserving legibility.
These correspondences translate into actionable guidance for façade practice.
Prioritize material–pattern synergies to introduce biophilic cues with controlled complexity. Designers should adjust the scale and density of patterns to enrich restorative qualities while avoiding the risk of visual overload. Use color strategically to convey warmth and harmony. Color interventions should ideally be combined with authentic materials or restrained patterning,since color alone can produce noticeable visual change but contributes little to restorative potential. Protect transparency and legibility by placing interventions in secondary façade zones, such as signage bands or frames, or by applying partial or fritted treatments instead of large opaque overlays. Adopt balanced interventions that create perceptible distinction from the baseline while maintaining contextual coherence. In this study, the Material + Pattern condition exemplified such balance and achieved the highest ratings in wellness-related attributes.
Taken together, these findings confirm that WELL-oriented visual prompt strategies can substantially shape expert perception of AI-generated façade designs. The results also suggest that the most effective façade interventions may not necessarily involve maximal visual richness but instead require a careful balance between biophilic attributes and perceptual simplicity. The results also support the use of structured environmental cues as a basis for prompt engineering, reinforcing the potential of an EBD approach in guiding visually and emotionally responsive generative outputs.
Qualitative Insights from Expert Feedback
In addition to the quantitative Likert-scale evaluations, the open-ended responses from experts provided valuable qualitative insights into the perceived strengths and limitations of the AI-generated façade designs. Thematic analysis was used to extract recurring patterns and opinions from the responses (Braun & Clarke, 2006). Three key themes emerged:
Theme 1: Appreciation for Natural Features
Several experts emphasized the positive impact of natural elements, using expressions such as “green textures,” “natural materials,” and “biophilic imagery.” One expert remarked that “adding biophilic features makes the space feel more harmonious and welcoming,” affirming the effectiveness of AI-generated designs that integrate nature-based elements in enhancing visual appeal and user comfort.
Theme 2: Caution Toward Overuse of Patterns
While the use of patterns contributed to visual richness, some experts expressed concerns about visual overload. One noted, “The pattern in [Image 6] feels too intense, which may lead to sensory fatigue if applied extensively.” This suggests that pattern dimensions, especially when combined with material modifications, require careful modulation to avoid disrupting visual balance.
Theme 3: Feedback on Semantic Clarity and Realism
Several experts pointed out inconsistencies or ambiguities in the AI-generated outputs. Phrases such as “some images lack realistic material textures” and “the design intent is unclear in certain façades” were mentioned. These comments highlight current limitations in AI's ability to fully capture architectural detail and intent, underscoring the need for continued refinement of prompt strategies and post-processing techniques.
Taken together, the open-ended responses deepen our understanding of expert perceptions and provide qualitative validation for the visual strategies implemented in the study. These insights complement the quantitative findings and support the iterative development of WELL-aligned, biophilic façades using generative AI tools.
Discussion
Key Findings
This study proposes a structured AI-assisted design workflow for generating façade designs of automated retail stores by integrating WELL Building Standard principles and biophilic design principles. Through the identification and application of three core visual dimensions—Color, Material, and Pattern—the study operationalizes abstract wellness-oriented concepts into controllable, generative visual features.
The image generation process employed Stable Diffusion XL (SDXL) with ControlNet for localized modification of real-world façade images, enabling precise integration of health-related design cues. The evaluation framework included a 7-point Likert scale applied to five perceptual dimensions—material, transparency, color, design complexity, and natural features—assessed by a panel of design experts.
In addition to quantitative scores, qualitative feedback was collected through open-ended expert responses. Using thematic analysis, the study extracted deeper insights into expert perceptions, emotional impressions, and practical concerns, enriching the understanding of AI-generated design effects.
The analysis revealed the following key findings.
Color emerged as the most semantically aligned and visually effective dimension. AI-generated outputs focusing on color adjustment consistently received higher scores for aesthetic warmth and environmental harmony, indicating that color is a powerful medium for conveying WELL-aligned design intent.
Material-focused prompts introduced elements like wood and other nature-resembling textures, which enhanced perceptions of naturalness and tactile warmth. However, their visual integration with the original architectural context varied, suggesting that material changes require careful calibration to avoid inconsistencies or visual detachment.
Pattern, which included biophilic motifs such as leaves and waves, introduced the greatest variability across generated images. While this dimension enhanced the natural feature ratings, it was more sensitive to over-decoration and sometimes introduced visual clutter, requiring balanced deployment to preserve coherence.
The combined strategy of Color + Material + Pattern did not yield the highest scores in any single dimension but performed consistently across all, suggesting a trade-off between expressive richness and perceptual clarity. By contrast, the Material + Pattern strategy achieved the highest scores in both the Material (M = 1.80, SD = 0.837) and Natural Feature (M = 2.00, SD = 0.707) dimensions, demonstrating its effectiveness in enhancing biophilic and health-related qualities without overloading visual complexity.
The baseline (original) image scored relatively high on Transparency but received low ratings in Natural Features and Material, reinforcing the research rationale: The absence of biophilic and WELL-aligned cues in existing façade designs underscores the need for design intervention.
Importantly, qualitative expert insights validated these findings. Experts praised the incorporation of natural elements, while also cautioning against visual overload due to excessive patterning. Some also pointed out semantic ambiguity and unrealistic rendering in certain images, highlighting areas for prompt refinement and post-processing improvement.
Overall, these findings support the feasibility of using AI prompt engineering to infuse WELL and biophilic principles into façade design, while also highlighting the perceptual impact of individual and combined visual dimensions. The results validate the effectiveness of a layered prompt structure and expert-in-the-loop evaluation—both quantitative and qualitative—as a method for refining and guiding AI-generated architectural solutions.
Implications
This study offers both theoretical and practical implications. From a theoretical perspective, this study extends the theory of EBD, originally proposed by Zeng and Cheng (1991), by applying it to the architectural scale and advancing its implementation through the integration of AI-based generative design tools. EBD emphasizes a recursive design reasoning process grounded in environmental understanding. This study operationalizes it through three iterative stages: visual variable extraction, generative prompt construction, and expert-based perceptual evaluation. However, this process remains conceptually recursive rather than algorithmically recursive, since the generative loop is not yet technically closed or automated. In doing so, the research not only reaffirms the adaptability of EBD across design domains but also demonstrates its compatibility with emerging computational workflows in the context of wellness-oriented façade design.
Building upon this theoretical foundation, the study also establishes connections between computational design tools and architectural certification systems. It links computational design workflows to architectural certification standards by establishing a mapping mechanism that translates WELL principles into visual design dimensions: Color, Material, and Pattern. This structured approach operationalizes wellness-oriented strategies, such as biophilic connections and material health, into prompt-level design interventions that can be visually expressed and evaluated. Our previous study (Yun & Kim, 2025) explored biophilic façade strategies in the context of urban infrastructure, using eye-tracking and subjective evaluation methods. While that research focused on energy-related public facilities, this study extends the scope to automated retail environments and introduces the WELL Building Standard as a structured evaluation framework for wellness-driven design. This expansion allows for a more standardized and certifiable approach to façade enhancement in commercial settings, bridging user-centered perception studies with AI-assisted generative workflows.
Taken together, aligning EBD with WELL establishes a dual framework in which environmental reasoning and health-oriented benchmarks jointly guide the three visual design variables. This integration clarifies how environmental cues can be systematically translated into perceptual prompts for façade generation, thereby reinforcing methodological validity and enhancing the framework's adaptability to broader architectural contexts. Nevertheless, as EBD validation often benefits from linking environmental attributes to measurable perceptual and physiological responses, future integration of behavioral and physiological outcome metrics will be essential to further substantiate the empirical grounding of this framework.
In practice, the proposed AI-assisted image generation process—based on Stable Diffusion XL and ControlNet—offers a flexible and low-cost tool for early-phase façade ideation. It enables designers to visualize directional design alternatives that reflect WELL-aligned features without relying on finalized design proposals. The use of img2img local redraws helps retain the original architectural context while selectively enhancing façade components, making the process especially suitable for adaptive renovation or incremental design tasks. These capabilities are particularly relevant in the context of rapidly expanding uncrewed retail spaces, where spatial appeal, user comfort, and psychological engagement are critical to attracting foot traffic and supporting user well-being.
In addition to the quantitative evaluations, qualitative feedback from experts enriched the interpretation of perceptual outcomes. Experts frequently emphasized the importance of material authenticity, natural detail subtlety, and façade-context harmony when assessing the images. Comments also pointed to issues such as excessive visual complexity and lack of realism in certain AI-generated outputs. These insights reinforced the strengths observed in the color and material dimensions while highlighting the need for refined prompt strategies that balance innovation with environmental coherence. The inclusion of such expert perspectives demonstrates the value of integrating both quantitative and qualitative judgment into the iterative AI design process.
Limitations and Suggestions for Future Research
This study presents a preliminary exploration into applying generative AI tools—guided by selected visual aspects of the WELL Building Standard—for the façade design of automated retail stores. While the findings suggest promising potential, several methodological and contextual limitations were identified during the research process, pointing to future directions for refinement and expansion.
First, the current design strategies did not adopt the full scope of the WELL Building Standard. Instead, this study selectively focused on visual aspects of color, material, and pattern that are particularly relevant to façade appearance. While these dimensions were operationalized through EBD reasoning and WELL alignment, the framework has not yet encompassed other health-related variables. Moreover, given the high transparency and spatial openness required by automated retail formats, only small, localized regions of the original façade were modified. These constraints inevitably limit the transformative impact of the redesign. Future studies should extend the mapping to additional WELL-related dimensions such as lighting ergonomics or spatial proportions, thereby enhancing theoretical rigor and broadening the applicability of the framework. This extension could work in parallel with the immersive and multimodal evaluation strategies outlined later in this section, ensuring that theoretical mapping and empirical validation advance together.
Second, the image generation process produced only seven façade variations corresponding to distinct prompt control strategies. Although these conditions were sufficient to demonstrate visual differentiation across design dimensions, the limited sample size restricts the generalizability and richness of findings. Expanding the visual dataset to include more diverse prompt combinations, structural variations, and context-specific adaptations could strengthen the design repertoire and support more nuanced analysis.
Third, although this study frames its workflow as a recursive reasoning process, the implementation remains linear and expert-driven. Future work should incorporate algorithmic recursion by integrating user feedback into automated prompt regeneration, potentially through reinforcement learning, LLM-based prompt tuning, or human-in-the-loop optimization. Such developments would more rigorously operationalize the recursive principles that lie at the core of EBD and computational design.
Fourth, the expert evaluation primarily drew on the expertise of five professionals in the fields of architecture and wellness design. Their insights were constructive and aligned with the study's objectives. Nevertheless, the relatively small panel size (n = 5) may introduce bias and limit the generalizability of the findings. Future research could expand the panel to include a larger and more diverse set of experts, and where appropriate, complement expert judgment with input from actual users to capture a wider range of perceptual and emotional responses. Moreover, experts noted that certain combinations, such as Material + Pattern, while highly rated, may approach the threshold of “visual overload.” This suggests the need for future studies to investigate the complexity threshold at which biophilic enrichment transitions from being perceived as positive to becoming visually overstimulating. Identifying such a balance point would provide valuable guidance for calibrating façade design strategies to maximize restorative qualities while avoiding cognitive strain.
Fifth, although this study included open-ended comments in addition to quantitative ratings, applying more robust qualitative methods such as semi-structured interviews or evaluation workshops may reveal deeper insights into participants’ aesthetic judgments and design preferences. These methods are particularly helpful in interpreting subtle or ambiguous visual outcomes.
Sixth, the use of static images limits the ability to assess real-world user engagement. Immersive scenarios can be generated through virtual reality (VR) or augmented reality simulations. When combined with physiological and behavioral measures such as eye-tracking, galvanic skin response, and facial expression analysis, these methods offer a more comprehensive understanding of spatial perception and emotional responses. Prior research has shown that immersive eye-tracking can effectively capture attention patterns in virtual environments (Kim, 2024), indicating promising applications for future architectural design studies. Future research should situate these psychophysiological methods more explicitly within existing human—computer interaction and architectural psychology paradigms, so that measures of attention, cognitive load, and affective state can be directly interpreted as indicators of wellness-oriented design impact. In practice, insights from eye-tracking, electrodermal activity and heart rate variability can be mapped onto higher-level perceptual constructs such as comfort, complexity, and restorative potential. These constructs can then be employed as feedback to refine prompt-generation strategies within recursive algorithmic processes.
Seventh, although LPIPS was employed as a perceptual similarity metric to quantify deviations between the generated façades and the original baseline, it presents several limitations. This study implemented the metric with the VGG backbone, which is commonly used in computer vision tasks for its sensitivity to feature-level perceptual differences. However, while LPIPS effectively measures image divergence at the visual level, it does not directly indicate architectural quality, semantic fidelity, or functional appropriateness. A façade design with a higher LPIPS score may simply represent a more visually distinct alternative, rather than a failure in design logic. Future work should therefore complement LPIPS with additional evaluation metrics—such as semantic segmentation accuracy, realism ratings, or functional alignment measures—to capture both perceptual change and architectural validity.
Lastly, while this study focused on three visual dimensions, future research could extend to additional WELL-related visual features such as form, shape articulation, light quality, façade transparency, signage clarity, or natural view access, depending on the architectural context. At the same time, the methodology relied exclusively on Stable Diffusion XL (SDXL) with fixed parameters, which ensured internal consistency but also limited the exploration of potential variability across other generative models. Since prompt interpretation, controllability, and visual consistency may differ between architectures such as SD 1.5, SD 2.1, or fine-tuned variants, expanding future investigations to multiple models and parameter settings would enhance the robustness and generalizability of the findings. Such methodological diversification would further improve the adaptability and practical impact of prompt-based AI tools across various façade typologies. Future research should also reflect more deeply on the balance between design controllability and AI creativity. While WELL-based prompts ensure semantic rigor and alignment with certification standards, excessive control may suppress generative diversity. Exploring hybrid strategies that safeguard both semantic integrity and creative exploration will add conceptual depth to the emerging discourse on AI-assisted design.
In summary, although the proposed methodology demonstrates structured integration of WELL-inspired design prompts with generative AI tools, further development is necessary. Key next steps are expanding visual variables, diversifying evaluation perspectives, and leveraging immersive and physiological assessment methods to build a more comprehensive and adaptive framework for AI-assisted façade design in automated retail environments.
Conclusion
This study proposes and validates a generative AI-assisted design methodology that translates health-oriented architectural standards into tangible visual strategies for façade design in automated retail environments. Grounded in the EBD framework, the research integrates principles from the WELL Building Standard and biophilic design to establish an iterative workflow that links environmental intention, prompt-based image generation, and expert visual evaluation.
Focusing on the three key visual dimensions of Color, Material, and Pattern, the study constructs structured prompt strategies that guides AI-generated designs toward wellness-aligned and visually engaging outcomes. The generation process employs Stable Diffusion XL and ControlNet, enabling localized modifications of real façade images while preserving architectural context and spatial continuity.
Expert evaluations validate the effectiveness of this approach. Among the AI-generated variations, the Material + Pattern strategy achieved particularly high ratings in perceptual qualities such as Natural Features and Façade Material, illustrating the visual value of incorporating natural textures and biomorphic patterns. In contrast, the original design, while performing well in Transparency, lacked biophilic attributes, reinforcing the importance of health-driven visual integration. Qualitative feedback from experts contextualized these results. Despite praising naturalistic materials for enhancing warmth and approachability, some participants cautioned against excessive visual complexity. These insights reiterate the importance of visual balance in applying wellness-oriented design elements.
Overall, this research demonstrates a structured and scalable pathway for enhancing the visual and psychological qualities of automated retail façades through generative AI. Future work should broaden the design scope by incorporating additional WELL-related dimensions such as form, signage clarity, light quality, and transparency, and it should also extend toward multisensory experiences in real-world settings. Furthermore, evaluation methods need to be advanced through multimodal user experience assessments, including neuroimaging, eye-tracking, electrodermal activity, and other physiological measures. These extensions will help ensure that AI-generated designs are not only aligned with expert aesthetic standards but also resonate with users across diverse spatial and cultural contexts. Beyond its empirical findings, the study contributes by articulating a methodological framework that operationalizes health-oriented design principles through generative AI, offering a replicable process for both design research and practice. Although demonstrated in the context of automated retail, the proposed framework holds potential for broader application in workplaces, learning environments, healthcare facilities, and public spaces, where visual quality and user well-being remain central to architectural performance.
Footnotes
Acknowledgments
The authors would like to express their sincere gratitude to Dr. Yong Zeng for his valuable advice on the application of Environment-Based Design (EBD) theory. The authors also thank the anonymous reviewers for their insightful comments and constructive suggestions, which have significantly improved the quality of this paper.
Funding
This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF) grant (NRF-2025S1A5A8007949), and by the Korea government (Ministry of Science and ICT) (RS-2025-23523874).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data supporting the findings of this study are available upon request.
