Abstract
Generative artificial intelligence (AI) applications in structural design face persistent challenges due to training data limitations, particularly datasets that lack compliance with critical physical and material requirements. This study proposes a structural optimization-based data enhancement method to address quality deficiencies in generative AI training data, specifically targeting shear wall layout design through diffusion-based generative models. The proposed method introduces three key innovations: (1) A novel generative AI design workflow incorporating data enhancement phase and modifying data preparation and model evaluation phases; (2) Regression formulas enabling data enhancement with incomplete design information through feature distribution analysis; (3) A shear wall layout optimization method for simultaneously improving physical and material metrics. Experimental validation reveals marked improvements in design outcomes through enhanced training data. Specifically, physically non-compliant structural designs show a 67% reduction in occurrence frequency, while material costs for compliant designs decrease by 0.5%. Additionally, performance consistency improves significantly. By addressing data quality limitations through structural optimization, this approach enhances the practical viability of diffusion models in structural engineering applications while preserving adaptability to advanced AI algorithms. The proposed method is modular and scalable, offering potential for extension to other structural systems (e.g., steel frames, composite structures) and design challenges, thereby advancing AI-driven innovation in structural engineering.
Keywords
Introduction
Generative artificial intelligence (AI) is capable of creating new and original content, and its powerful capabilities have profoundly transformed industries such as natural language processing, computer vision, and design (Bandi et al., 2023). Typical generative AI algorithms include generative adversarial networks, variational autoencoders, and diffusion models (Ho et al., 2020). With the continued accumulation of data in the engineering field, generative AI is gradually being applied to the design tasks of engineering structures in general (Yüksel et al., 2023), and to building structures in particular (Liao et al., 2024). Research on the generative AI design of building structures has already encompassed various aspects, including structural layout (Fu et al., 2023; Zhou et al., 2024), component section sizes (Fei et al., 2022; Feng et al., 2023), and seismic isolators (Liao et al., 2023).
A workflow for generative AI design of building structures typically involves three phases: (1) Data preparation: Collecting design drawings manually completed by structural engineers, preprocessing them to obtain labeled data, and dividing them into training and test sets. (2) Model training: Training a generative AI model using the training set. (3) Model testing and evaluation: Designing the test set cases using the trained generative AI model and evaluating the model performance based on the similarity between the AI design and the one created by engineers. It is evident that such a workflow heavily relies on the quality of data (i.e., engineer designs). If the quality of the engineer designs is not high enough, then: (1) the training set cases for AI to learn would be unreliable, which affects the AI’s performance; (2) the ground truth used for evaluating AI designs would also be unreliable, leading to unreasonable evaluation results. High-quality data is, therefore, crucial for producing successful generative AI design outcomes.
High-quality data, in other words, high-class structural design, typically requires two types of design metrics to be met: (1) All physical responses must comply with the design code requirements to ensure structural safety; (2) Material costs for concrete, steel rebar, etc., should be minimized to achieve an economical design. In reality, however, the obtained datasets often cannot fully meet these requirements, as low-quality data are inevitably included due to various reasons, such as: (1) The collected design drawings, either being unverified preliminary designs or designed by unqualified engineers, are of poor quality. (2) Incomplete design drawings are collected, for example, some key structural design information is missing. (3) The preprocessing of the design drawings might cause errors due to the highly non-standardized nature of the drawings. As well recognized, low-quality data can adversely affect the performance of generative AI (Whang et al., 2023) and this issue needs to be rectified urgently.
Currently, existing research primarily focuses on the development of novel AI algorithms (Xu et al., 2022; Xu and Guo, 2025), with less in-depth exploration of how to enhance data quality (Stonebraker and Rezig, 2019). In the domain of machine learning, data cleaning is usually performed to improve data quality, including uncertainty-based approaches (Rottmann and Reese, 2023), loss-based approaches (Liu et al., 2021), counterfactual approaches (Flokas et al., 2022), and outlier-based approaches (Lee et al., 2018). These data cleaning techniques can remove erroneous values from the dataset and thereby improve the model performance. However, they primarily focus on the cleaning of generic data, for example, image, text, table, and time-series data (Côté et al., 2024), instead of the specialized structural design data, and are therefore not directly applicable to the generative AI design of building structures. Structural design tasks require complex evaluation criteria and involve various feasible solutions, making it very difficult to find the error and fix it. An enhancement of the overall design quality of the entire dataset is needed from multiple aspects with domain knowledge of structural design, instead of merely removing apparent errors like existing data cleaning methods.
To enhance the quality of structural design data, structural optimization can be adopted, which is a method commonly used in building structure design practice that can comprehensively improve the performance of design schemes (Afzal et al., 2020). Indeed, numerous studies have attempted to integrate AI methods with optimization techniques (Shirgir et al., 2024; Shirgir and Farahmand-Tabar, 2025). However, existing research has rarely focused on leveraging optimization approaches to enhance AI training data. Recently, Fei et al. (2023) proposed a data augmentation method based on structural optimization, which improves the performance of generative AI on long-tailed datasets. Yet, the method was developed with a primary emphasis on data distribution without exploring the impact of data quality on AI performance. To advance the generative AI design of building structures, it is necessary and desirable to develop a data enhancement method based on structural optimization and to investigate the influence of data quality on the structural design outcomes of generative AI. In the context of this article, data enhancement places emphasis on comprehensively improving the quality of data, data augmentation focuses on expanding the scale and increasing the diversity of the dataset, and data cleaning is centered around removing errors from the data.
In response to the aforementioned challenges, this study proposes a data enhancement method based on structural optimization to address the issue of low data quality encountered by the generative AI design of shear wall structures, particularly shear wall layout. Shear walls are critical load-bearing elements in high-rise buildings, their layout directly influencing structural safety, material efficiency, and architectural usability. Unlike beams or columns, shear walls involve higher computational complexity and design requirements. The rest of this paper is organized as follows. Section ‘Workflow of generative AI design with data enhancement' proposes a workflow of generative AI design with data enhancement. Section ‘Dataset and regression analysis' presents the dataset of shear wall structures and the regression analysis of necessary design information. Section ‘Structural optimization of shear wall layout' introduces the structural optimization method for shear wall layout design. Section ‘Diffusion model-based structural design with data enhancement' illustrates the generative AI design based on diffusion models and the evaluation of AI designs. Section ‘Case study' provides two typical case studies. Section ‘Conclusions' summarizes the main conclusions of this study.
Workflow of generative AI design with data enhancement
The integration of data enhancement into a traditional workflow of generative AI design yields a new workflow depicted in Figure 1, which consists of four phases: (1) Data preparation (Section ‘Dataset and regression analysis'): Collect design drawings manually produced by structural engineers, preprocess them to obtain labeled data, and divide them into training and test sets. When necessary design information is absent, establish regression formulas based on the distribution characteristics of design information in the collected dataset to supplement the missing information. (2) Data enhancement (Section ‘Structural optimization of shear wall layout'): For each shear wall structure in the training set, establish a detailed finite element (FE) model through parametric modeling, derive its design metrics (physical response and material cost) through physical analysis and reinforcement design, and improve the design scheme through structural optimization. (3) Model training (Section ‘Diffusion model-based structural design with data enhancement'): Train a generative AI model using the optimized training set. (4) Model testing and evaluation (Section ‘Case study'): Design the test set cases using the trained generative AI model and ascertain the necessary design information using regression formulas. Acquire the design metrics of the AI design through the FE model, thereby evaluating the quality of the AI design and the performance of the generative AI. Workflow of generative AI design with data enhancement.

The advantages of the proposed workflow are: (1) The structural optimization enhances the data quality of the manually-designed dataset by adjusting the design schemes to be safer and more economical, thereby improving the performance of the generative AI. (2) The evaluation of AI designs using design metrics is not affected by the quality of ground truth data, allowing for a more independent and rational assessment of the AI’s performance and consideration of the non-uniqueness of structural design tasks. The proposed workflow is highly modular and can be extended to various structural components, diverse load cases, and different materials by changing the parametric model and optimization objectives.
This study takes the shear wall layout design task as an example to verify the effectiveness of the proposed data enhancement method. The shear wall structure is a popular structural system, often used in high-rise residential buildings in earthquake-prone areas. Shear walls are a key structural component of a shear wall structure, resisting both horizontal and vertical loads. The layout of shear walls significantly affects the physical response and material cost of shear wall structures, and the layout design is an essential task during the schematic design phase (Liao et al., 2021).
To successfully accomplish the proposed workflow, it is crucial to automatically model, analyze, and optimize the design data of shear wall structures, which faces two key technical challenges: (1) Challenge one: Modeling, analysis, and optimization of shear wall structures require complete design information, including structural layout, component section sizes, material grades, etc. Existing generative AIs typically can only accomplish an essential part of the structural design task and are unable to generate all necessary design information (Liao et al., 2024). For instance, when designing shear wall structures, generative AI only focuses on the shear wall layout, which is the most critical part (Gu et al., 2024), but cannot generate other design information, such as component section sizes and material grades. Although it is possible to train the AI algorithms to predict missing design information, it can unfortunately complicate the design problem and in turn increase the computational demand. Consequently, this study directly employs explicit regression formulas, which are established based on the collected dataset as detailed in Section ‘Dataset and regression analysis', to determine the necessary design information beyond the shear wall layout. (2) Challenge two: There is a need for efficient multi-objective optimization of shear wall structures. The design of a shear wall structure requires consideration of multiple physical and material metrics, leading to a multi-objective optimization problem. Additionally, efficient computational time for structural optimization must be considered for data enhancement. Single-time modeling and analysis of a shear wall structure generally take several minutes, but hundreds of modeling and analysis iterations may be required in an entire optimization process. For that reason, the optimization of hundreds of design cases would necessitate tens of thousands of such iterations. Accordingly, a balance must be sought between optimization time and effectiveness by determining the appropriate optimization parameters as detailed in Section ‘Structural optimization of shear wall layout'.
Dataset and regression analysis
Shear wall structure dataset
To train a generative AI suitable for structural design, it is necessary to collect a large amount of structural design data. A total of 364 shear wall structure design drawings are collected from several renowned architectural design institutes in China. The architectural design, structural design, and design conditions of the shear wall structures are extracted from the collected drawing dataset. Specifically, architectural design includes the planar layout of partition walls, doors, and windows. Structural design includes the planar layout of shear walls (for training the generative AI), as well as such design information as the component section sizes, the material grades, and the number and division of standard stories (for establishing regression formulas). Design conditions include the seismic, site, and height conditions, characterized respectively by the design seismic acceleration
The collected dataset is divided into a training set (303 cases) and a test set (61 cases) in a 5:1 ratio. Typical cases are illustrated in Figure 2. To enrich the diversity of the dataset and thereby enhance the generalization performance of the generative AI, data augmentation methods such as flipping and mirroring (Liao et al., 2021) are employed during training. Typical cases of the established shear wall structure dataset.
Regression analysis of necessary design information
Using the shear wall structure dataset outlined in Subsection ‘Shear wall structure dataset', regression formulas are established for necessary design information, namely the number and division of standard stories, the material grade, the story height, the shear wall thickness, and the beam height. These regression formulas are then used to supplement the necessary design information for modeling, analysis, and optimization of shear wall structures. The aforementioned design information can be categorized into two types:
The first type of design information, which has a relatively minor correlation with the layout of shear walls, includes the number and division of standard stories, the material grade, the story height, and the shear wall thickness. During the schematic design phase, these design information is primarily determined based on engineering experience and design conditions. During structural optimization, it is reasonable to keep this type of design information consistent with the initial design (manual design by engineers). For cases without an initial design, that is, the training set cases with missing design information (15 out of 303) and all the 61 test set cases, the methods described in Subsections ‘Number and division of standard stories' to ‘Shear wall thickness' are used to fill the missing information.
The second type of design information is closely related to the layout of shear walls, specifically the beam heights. Although the collected design drawings provide information on the beam heights, when the layout of the shear walls changes, the beam spans will also change accordingly, making the original beam heights potentially unrealistic. Therefore, for all the cases involved in structural optimization (training set) and those without beam height information (test set), all the beam heights are determined using the method described in Subsection ‘Beam height'. When this approach is taken, two issues may arise. First, some shear wall structures may fail to meet physical requirements. Second, unnecessary material consumption may occur. The root cause is that the original beam heights specified in the design drawings are not being used. Nevertheless, this does not affect the findings of this study, because after performing structural optimization described in Section ‘Structural optimization of shear wall layout', the physical performance of all shear wall structures can be significantly improved and the material cost saved.
Additionally, to rationalize the regression formulas to not be heavily influenced by any particular engineering projects, the collected dataset is deduplicated. Shear wall structures from the same project that have similar building heights and planar layouts are excluded from the dataset. Finally, 102 shear wall structures with significant differences in structural layout, building height, and design seismic acceleration are considered. These structures are located in eight provinces in China, covering all height ranges below 100 m, and their design seismic accelerations (
Number and division of standard stories
In structural design practice in China, to facilitate the design of high-rise buildings, stories with identical structural designs are commonly categorized as a standard story, as illustrated in Figure 3. Illustration of the number and division of standard stories.
The number of standard stories is influenced by both horizontal loads (primarily seismic loads in inland areas) and vertical loads (primarily gravity loads). Therefore, a linear regression is conducted with the design seismic acceleration
Comparing the engineer designs with the regression results from Equation (1), 53% of the cases have an error of 0, 93% of the cases have an error within ±1, and 100% of the cases have an error within ±2. Therefore, the regression accuracy of equation (1) is confirmed.
When the number of standard stories is larger than 1, it is necessary to further determine the division ratio of each standard story, by equation (2):
Regression results of
While Figure 3 shows a simplified example with nine stories for clarity, the regression model is mathematically applicable to buildings with fewer than 33 stories, as the shear wall structures in China are usually less than 100 m or 33 stories tall.
Material grade
Statistical patterns of concrete grade.
Upon validating the statistical patterns in Table 2 with the engineer designs, it is found that more than 90% of the standard stories have an error in concrete grades within ±5 MPa. The MAE of concrete grade for shear walls and coupling beams is 1.7 MPa, while that for frame beams and slabs is 0.8 MPa. This indicates that the statistical patterns in Table 2 are representative and accurate for predicting the concrete grades used in shear wall structures.
Story height
For shear wall structures, the first story may sometimes serve as a lobby or commercial space, which may lead to a greater story height compared to the other stories. Separate statistics are collected on the heights of the first story and the other stories, with the results depicted in Figure A1(c) and (d) of Appendix A. It is evident that, for the first story and the remaining stories, the respective story heights of 3.0 m and 2.9 m are mostly common, which can be utilized for the design of story heights.
Shear wall thickness
The distribution of shear wall thickness is shown in Figure A1(e) in Appendix A. The thicknesses of shear walls are affected by seismic loads, so the shear wall structures are first classified according to the design seismic acceleration
Regression results of shear wall thickness.
The process for determining the thickness of a shear wall is as follows: (1) For each standard story, determine the gravity-load height
The thickness design results are compared with the engineer designs. The MAE and mean absolute percentage error of the shear wall thickness are found to be 17 mm and 8.4%, respectively. The standard deviations of absolute error and absolute percentage error are 17 mm and 7.4%, respectively. The validation results demonstrate high precision in the regression formulas. Additionally, the dataset’s high diversity ensures strong generalizability, while the low-dimensional, simple structure of the formulas (e.g., Figure A1(f)–(j)) minimizes overfitting potential.
Beam height
A total of 3486 beams are collected from the dataset. Due to differences in physical behavior and computational models between the coupling and frame beams, it is necessary to analyze the heights of different beams separately. The distributions of beam heights are shown in Figure A1(k) and (l) of Appendix A. It can be found that 92.0% of the coupling beams have heights ranging between 400 mm and 1000 mm, while 97.1% of the frame beams have heights ranging between 300 mm and 600 mm. Considering the indoor net height requirement of 2.1 m to 2.2 m, and given that the story height is generally 2.9 m (as discussed in Subsection ‘Story height'), the range for the height of coupling beams can be set between 400 mm and 1000 mm, and for frame beams, it is between 300 mm and 600 mm during the schematic design phase.
From a physical perspective, the height of a beam is primarily influenced by its span. In engineering practice, a large number of coupling beams have a span-to-height ratio of less than 2.5 (Qian et al., 2018). According to the statistical results of Zhao et al. (2022), the average span-to-height ratio for coupling beams is 2.74, which is also close to 2.5. While 2.74 reflects the dataset average, 2.5 was chosen as a conservative proxy that aligns with engineering practices. Therefore, within the height range of 400 mm to 1000 mm, the span-to-height ratio for coupling beams is set to 2.5. The height of coupling beams can be determined by equation (4):
Given that the height of a frame beam is generally between 1/18 to 1/10 of its span (Qian et al., 2018), an intermediate value of 1/12 can be taken in the schematic design phase. The height of frame beams can be determined by equation (5):
During the schematic design phase, beam height is less critical than shear wall configuration. A fixed span-to-height ratio, bounded by statistically derived minimum/maximum values (to avoid unreasonable beam heights), balances simplicity and practicality, ensuring design feasibility without excessive computational overhead.
Structural optimization of shear wall layout
The design of shear wall structures is based on architectural design, and the shear wall layout shall not interfere with the functions of the architectural spaces. Meanwhile, structural design should aim at saving material costs as much as possible while meeting the physical requirements specified by the design codes (Dehnavipour et al., 2021). In the design of a shear wall layout, if the lengths of the shear walls are too short, the structural design may not meet the code-specified physical requirements; if they are too long, the design may lead to a serious waste of materials (Lou et al., 2021; Tafraout et al., 2019; Zhou et al., 2022). Therefore, this study will adopt the shear wall layout manually designed by engineers as the initial design and adjust the lengths of the shear walls through structural optimization to improve the design quality.
Note that the generative AI utilized in this study is specifically focused on the shear wall layout (to be discussed in Section ‘Diffusion model-based structural design with data enhancement') without considering other design information, such as shear wall thickness and beam height. Consequently, adjusting other design information during the process of structural optimization will not change the training data quality of the generative AI. In this study, the optimization of the shear wall layout is achieved exclusively through the adjustment of the shear wall lengths. Other design information is either kept identical to the initial design or ascertained using the regression formulas outlined in Subsection ‘Regression analysis of necessary design information'.
Design variables and penalty functions
A typical shear wall structure, as shown in Figure 2(a), commonly includes dozens of shear walls, which can only be positioned within the partition walls. To reduce the number of design variables and thus ease the optimization difficulty, connected shear walls are defined as shear wall groups, and the K-Means algorithm is used to cluster the shear wall groups (Qin et al., 2024).
Clustering is performed based on the shape, dimension, and positional features of shear wall groups. Specifically, the first three features characterize the geometric forms of shear wall groups, the 4th feature represents their total length, and the last two features describe their relative positions. The K-Means algorithm tends to put shear wall groups into the same cluster when they share similar shapes (e.g., both L-shaped or T-shaped), comparable total lengths, and symmetric layouts (along the X or Y axis). Following the recommendation (Qin et al., 2024), an initial cluster number of eight was selected, with additional cluster numbers tested for validation. Numerical experiments on typical cases revealed that increasing cluster numbers (e.g., 14 and 18) led to negligible improvements in optimization results when maintaining constant computational time, with the additional decrease in the final penalty function remaining below 1%. This phenomenon arises because while higher cluster numbers enhance design flexibility for shear wall optimization, they simultaneously expand the search space and problem complexity—factors that may not necessarily benefit the optimization algorithm’s solution efficiency. Therefore, a cluster number of eight is recommended for common cases; for scenarios where the standard story area significantly exceeds that of the typical cases, moderate increases in cluster numbers can be considered.
Based on the eight clusters, a total of eight design variables Flowchart for structural optimization.
At the same time, the wall lengths are also subject to certain design rules: (1) The layout of shear walls shall not be beyond the location of the partition walls so as not to interfere with the architectural spaces; (2) According to the Chinese design codes, the length-to-thickness ratio of shear walls shall not be less than 4, and the length of a single shear wall shall not exceed 8 m (MOHURD, 2010). After adjusting the shear wall layout, the beam spans can be correspondingly adjusted without changing the topological relationship between the beams and the walls. The beam heights are then determined according to the regression formula in Subsection ‘Beam height'.
Design metrics and penalty functions of shear wall structures.
A total penalty function
In the design of shear wall structures, the design scheme is considered safe only upon the physical responses meet the design code requirements. On this basis, further reducing material costs makes the design scheme economical. According to the definitions given in Appendix B, when the physical responses do not meet the design code requirements, the penalty function will increase dramatically. Therefore, the optimization algorithm will prioritize the design procedure to ensure that the physical responses meet the design code requirements first, and then seek to reduce material costs.
Compared with other mathematical forms of penalty functions such as summation, the penalty function in Equations (6) and (7) offers the following advantages. First, it has a clear design rationale:
Discussion on optimization parameters
To obtain the penalty functions described in Subsection ‘Design variables and penalty functions', a detailed FE model of the shear wall structure is automatically created through parametric modeling in the commercial design software YJK-GAMA, and the required design metrics are obtained through FE analysis (GAMA, 2025). Furthermore, the online learning algorithm provided by YJK-GAMA is used to perform structural optimization. Online learning is a surrogate-assisted evolutionary algorithm, which is more effective than traditional evolutionary algorithms such as genetic algorithms and simulated annealing (Fei et al., 2025). Specifically, online learning utilizes several heterogeneous surrogate models for promoting ensemble diversity, such as polynomial regression, truncated Fourier series, support vector machine, and radial basis function networks. It updates the surrogate models to improve prediction accuracy using the new data points collected during the optimization, as the name “online” indicates. The workflow of online learning is illustrated in Figure 4.
Online learning requires the setting of two parameters, that is, the number of FE analysis iterations (which corresponds to the number of fitness evaluations) and the initial sample size (related to the surrogate model). According to the recommendation from the official documentation, the initial sample size is set to 1/10 of the number of the FE analysis iterations. Obviously, the more FE analyses are performed, the longer the optimization process will take, and generally, the better the optimized design will be. This study requires performing structural optimization on the entire training set (303 cases), which induces a massive computational cost. Therefore, it is necessary to find a balance between the optimization time and effectiveness by selecting an appropriate number of FE analysis iterations. To understand the relationship between the optimization efficiency and effectiveness, optimization experiments are conducted on eight typical shear wall structures.
Design conditions of typical cases.
For three scenarios with the number of FE analysis iterations being set to 50, 100, and 200, the average decreases in penalty functions for Groups A1 and B1 are shown in Figure 5(a) and (b), respectively. For the cases in Group A1, both the total penalty function and the physical penalty function show a significant decrease at 50 iterations and remain stable at 100 and 200 iterations. For the cases in Group B1, the total penalty function and the material penalty function show a significant decrease at 50 and 100 iterations and a slight decrease at 200 iterations. Therefore, the choice of 100 iterations is shown to achieve a good balance between the optimization time and effectiveness, and higher iteration numbers will not induce a substantial reduction in the penalty function. The corresponding initial sample size is 100/10 = 10. Penalty function reductions under different optimization parameters. (a) Group A1 (b) Group B1.
Additionally, it can be observed from Figure 5(a) that the structural optimization can significantly improve the physical performance of Group A1 without substantially increasing the material cost. From Figure 5(b), it can be seen that structural optimization can greatly reduce the material cost of Group B1 while slightly improving the physical performance. These outcomes suggest that the proposed structural optimization method can effectively achieve the goal of multi-objective optimization.
Optimization results of training data
Using the optimization parameters described in Subsection ‘Discussion on optimization parameters', data enhancement is carried out on the training set (303 cases) as described in Subsection ‘Shear wall structure dataset'. Concurrently, structural optimization is performed on three computers, with six threads in parallel on each, taking approximately 1 month in total to complete.
Within the training set (303 cases), 154 cases do not fully comply with the physical requirements and are referred to as Group A2. The remaining 149 cases meet the physical requirements and are referred to as Group B2. Various reasons may result in the structural designs not meeting the physical requirements, including (1) The quality of the design drawings is not high enough, for example, they are preliminary designs that have not yet been verified. (2) The number and division of standard stories, material grade, story height, and shear wall thickness for 15 cases, as well as the beam height for all 303 cases, are derived using the regression formulas in Subsection ‘Regression analysis of necessary design information'. (3) There might be inevitable errors during the preprocessing of the design drawings due to the highly non-standardized nature of these drawings. The phenomenon of structural designs not meeting physical requirements highlights the necessity of enhancing data quality through structural optimization.
Figure 6 shows the comparison of shear wall structure design before and after structural optimization, with major differences marked with circles. Case 1 and Case 2 are typical cases of Group A2. Case 1 has insufficient lateral stiffness before optimization, and the inter-story drift ratio ( Typical cases in the training set before and after optimization.
Mean and standard deviation of penalty functions before and after data enhancement.
The decrease in the mean of the penalty functions indicates that after optimization, Groups A2 and B2 have achieved significant quality improvements, primarily in terms of physical response and material cost, respectively. The reduction in the standard deviation of the penalty functions suggests that the design quality of the training set has become more consistent after optimization. In Section ‘Diffusion model-based structural design with data enhancement', the diffusion model will be trained using the training sets before and after data enhancement, and the model performances will be compared.
Diffusion model-based structural design with data enhancement
Diffusion model
The concept of diffusion originates from the field of non-equilibrium thermodynamics and has attracted wide interest due to its superior mathematical properties (Song et al., 2021). Diffusion models accomplish the generation task by progressively removing minor noise from Gaussian noise (Ho et al., 2020). Recently, a diffusion model for structural design tasks, called Struct-Diffusion, has been proposed and shown excellent performance on the shear wall layout task (Gu et al., 2024). In this work, the shear wall layout design has been transformed into an image inpainting task.
In the domain of image inpainting, diffusion models have emerged as a transformative approach, outperforming traditional generative models like generative adversarial networks (GANs) and variational autoencoders (VAEs) in several critical aspects. Unlike GANs, which often struggle with mode collapse and training instability, especially when reconstructing complex structural details or coherent textures, diffusion models leverage a probabilistic latent space to generate high-fidelity, diverse outputs that maintain semantic consistency with the existing image context. They also avoid the VAE’s inherent compromise between reconstruction accuracy and generative expressivity, as their iterative denoising process allows for fine-grained control over the synthesis of missing regions, enabling superior handling of intricate patterns and multi-modal distributions. While diffusion models entail higher computational costs, techniques like knowledge distillation can substantially accelerate inference speed, making them more practical for real-world applications.
The core strength of diffusion models in inpainting lies in their two-stage denoising framework. First, the forward diffusion process gradually adds Gaussian noise to the original image until it becomes a pure noise tensor, mapping the data distribution to a simple prior (e.g., standard normal distribution). In the reverse denoising process, the model learns to iteratively reverse this diffusion by predicting the noise added at each step and refining the latent noisy tensor back into a clean image. For inpainting specifically, the model is conditioned on the known regions of the masked image, allowing it to focus on synthesizing plausible content for the missing areas while respecting spatial and semantic dependencies. This iterative refinement ensures that the generated content seamlessly integrates with the surrounding pixels, producing results with higher structural coherence and perceptual quality compared to GANs’ adversarial training paradigm or VAE’s approximate posterior inference.
The basic workflow of the diffusion model-based structural design is shown in Figure 7. Initially, the architectural design drawings and design conditions are semantically processed to obtain input data. Specifically, colors are used to represent different components, and floating-point numbers are used to represent design conditions. Then, a pure Gaussian noise is sampled and a trained convolutional neural network model is used to progressively remove minor noise from it, thereby generating the layout of shear walls. Finally, a semantic structural design image is obtained as the output. The mathematical principles and the implementation details of the aforementioned diffusion model are not the focus of this study, and readers are referred to the published literature (Gu et al., 2024). Shear wall layout design based on diffusion models.
Configurations of the diffusion model.
Evaluation metrics
In traditional generative AI designs for shear wall structures, the similarity between the AI design and the engineer design (e.g., Intersection over Union) is commonly used to evaluate the performance of the AI models (Liao et al., 2024). This evaluation method is advantageous in its high evaluation efficiency, but it also has some limitations: (1) The engineer design, which serves as the ground truth, may not be of high quality, leading to potential misjudgment of high-quality AI designs. (2) Structural design is a creative task with no single correct answer. The difference between the AI design and the engineer design does not necessarily mean that the AI design is unreasonable. (3) In the model application phase, when the engineer design is not available, similarity metrics cannot be used to evaluate the AI design.
Therefore, this study adopts an evaluation method based on design metrics, which involves establishing a detailed FE model to obtain the physical response and material cost of the AI design and then evaluating the performance of the AI design. Firstly, based on computer vision techniques and design information, the coordinates of shear walls are extracted from the semantic structural design images generated by the diffusion model (Fei et al., 2023). Secondly, based on the regression formulas given in Section ‘Regression analysis of necessary design information', other necessary design information required for FE analysis is obtained, including component section sizes, material grades, etc. Finally, the physical response and material cost (listed in Table 4) of the AI design are obtained according to the FE results, and
Evaluation results of design outcomes
Mean and standard deviation of design metrics with and without data enhancement.
After data enhancement, the mean of the total penalty function for Group A3 decreases by 21.1%, and the standard deviation (δstd) decreases by 19.0%; the mean and the δstd of the physical penalty function decrease by 23.1% and 41.7%, respectively; the mean of the material penalty function increases by 1.5%, with the corresponding δstd increasing by 7.9%. Meanwhile, out of the 15 cases of Group A3, 10 cases can now meet the physical requirements. The remaining 5 cases, although still not meeting the physical requirements, see the mean of the physical penalty function decreasing by 18.5% and the δstd decreasing by 61.6%, indicating that they can be more easily adjusted to meet the physical requirements in the subsequent detailed design phase. For Group B3, the mean of the total penalty function decreases by 2.3%, and the δstd decreases by 26.5%; the mean of the physical penalty function decreases by 1.9%, and the δstd decreases by 45.5%; the mean and the δstd of the material penalty function decrease by 0.5% and 10.0%, respectively. Additionally, all cases in Group B3 still meet the physical requirements.
By enhancing the data quality of the training set, the number of AI designs that do not meet the physical requirements has been reduced by 67%, and the remaining 33% can be more easily adjusted to meet the physical requirements; the material cost for the AI designs that meet the physical requirements has been reduced by 0.5%. The proposed data enhancement method can make the AI designs significantly safer and marginally more economical. Since meeting the physical requirements is a prerequisite for reducing the material costs in structural design tasks, it is reasonable that the proposed data enhancement method has a more significant effect on improving the physical performance while the effect on reducing the material costs is expectedly minor. At the same time, the decrease in the standard deviation of the total penalty function of the AI designs indicates that the consistency of the AI design quality has also been improved.
Case study
To intuitively demonstrate the improvement in the design quality of the diffusion models after data enhancement, two shear wall structure cases from the test set are presented herein, as presented in Figure 8 and Table 9. These two cases are geographically close, and thus share the same seismic and site design conditions, both with a design seismic acceleration Design outcomes of two case studies. Design metrics of two case studies with and without data enhancement.
The architectural design of Case A is shown in Figure 8(a). Prior to data enhancement, the shear wall layout designed by the diffusion model is shown in Figure 8(c). According to Table 9, the designed shear wall structure shows an excessive inter-story drift ratio (
The architectural design of Case B is depicted in Figure 8(b). Before data enhancement, the shear wall layout designed by the diffusion model is shown in Figure 8(d). According to Table 9, the physical metrics of this design meet the design code requirements, with a material cost
It is evident that data enhancement improves the shear wall layouts designed by the diffusion model, making them safer and more economical.
Conclusions
This study proposes a data enhancement method based on the combination of structural optimization and diffusion models to address the issue of low data quality exhibited in the generative AI design of shear wall structures. The core contributions of this study are: Firstly, a generative AI design workflow incorporating data enhancement is proposed, which adds a data enhancement phase to the traditional workflow and modifies the data preparation and model evaluation phases. Secondly, a series of regression formulas are established based on the distribution characteristics of the design information, enabling model evaluation and data enhancement with missing design information. Thirdly, a structural optimization method for shear wall layout is presented, capable of simultaneously improving multiple design metrics of shear wall structures. The specific conclusions drawn from this study are as follows: (1) The established regression formulas for necessary design information fit the collected dataset well and can be used to supplement missing information for training cases during data preparation, as well as to determine the necessary information for test cases during model evaluation. (2) The proposed shear wall layout optimization method effectively achieves multi-objective optimization, significantly improving the engineer designs and thereby enhancing the data quality of the training set. After 100 FE analysis iterations, for Group A2, the mean of the physical penalty function decreases by 14.1%, and the standard deviation decreases by 44.8%; for Group B2, the mean and the standard deviation of the material penalty function decrease by 3.0% and 6.1%, respectively. This indicates that the data quality of the enhanced training set is higher and more consistent. (3) The proposed data enhancement method significantly improves the performance of the diffusion models in designing test set cases. The number of AI designs with inadequate physical performance is reduced by 67% and the remaining 33% can be easily adjusted to meet the physical requirements. For AI designs already meeting the physical requirements, their average material cost is reduced by 0.5%. Additionally, the standard deviations of the total penalty function of Groups A3 and B3 decrease by 19.0% and 26.5%, respectively. This demonstrates that the proposed method can effectively enhance the physical performance of the shear wall layouts designed by the diffusion models, marginally reduce their material costs, and improve the consistency of design quality.
This study has the following limitations: Firstly, the applicability of the proposed data enhancement method to design tasks other than those presented in this study and the associated generative AI algorithms needs further verification. Secondly, more design metrics could be considered in structural optimization and model evaluation to better facilitate engineering applications. Lastly, a key limitation is the reliance on Chinese design codes. Adapting the method to other regions would require redefining penalty functions to align with local codes, a straightforward modification that preserves the core method’s validity.
In the future, machine learning methods (support vector machines, gradient boosting trees, etc.) could be employed to supplement missing design information, potentially providing a more reasonable design scheme due to their capacity to model complex nonlinear relationships. Furthermore, the integration of pre-trained universal surrogate models into the data enhancement pipeline would lower the computational demand.
Footnotes
Acknowledgment
This work is supported by the Sichuan Science and Technology Program (2025ZNSFSC1312), the Beijing Municipal Natural Science Foundation (8252008), and the National Natural Science Foundation of China (52408348). The authors would like to acknowledge Ms Yuanxin Liu from Shaodong Jianye Engineering Technology Co., Ltd, Mr Hongjing Xue from Beijing Institute of Architectural Design Institute Co., Ltd, and Mr Shulu Zhang from China Southwest Architectural Design & Research Institute Co., Ltd for providing structural design blueprints used in this work and giving valuable advice on design metrics of shear wall structures.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Sichuan Science and Technology Program (2025ZNSFSC1312), the Beijing Municipal Natural Science Foundation (8252008), and the National Natural Science Foundation of China (52408348).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Appendix
Statistical distribution of design information. (a) Story number (b) Design seismic acceleration (c) Story height of the first story (d) Story height of other stories (e) Shear wall thickness (f) Line bearing capacity (
