Data enhancement for generative AI design of shear wall structures incorporating structural optimization and diffusion models

Abstract

Generative artificial intelligence (AI) applications in structural design face persistent challenges due to training data limitations, particularly datasets that lack compliance with critical physical and material requirements. This study proposes a structural optimization-based data enhancement method to address quality deficiencies in generative AI training data, specifically targeting shear wall layout design through diffusion-based generative models. The proposed method introduces three key innovations: (1) A novel generative AI design workflow incorporating data enhancement phase and modifying data preparation and model evaluation phases; (2) Regression formulas enabling data enhancement with incomplete design information through feature distribution analysis; (3) A shear wall layout optimization method for simultaneously improving physical and material metrics. Experimental validation reveals marked improvements in design outcomes through enhanced training data. Specifically, physically non-compliant structural designs show a 67% reduction in occurrence frequency, while material costs for compliant designs decrease by 0.5%. Additionally, performance consistency improves significantly. By addressing data quality limitations through structural optimization, this approach enhances the practical viability of diffusion models in structural engineering applications while preserving adaptability to advanced AI algorithms. The proposed method is modular and scalable, offering potential for extension to other structural systems (e.g., steel frames, composite structures) and design challenges, thereby advancing AI-driven innovation in structural engineering.

Keywords

data enhancement generative AI structural optimization regression analysis diffusion model shear wall structure

Introduction

Generative artificial intelligence (AI) is capable of creating new and original content, and its powerful capabilities have profoundly transformed industries such as natural language processing, computer vision, and design (Bandi et al., 2023). Typical generative AI algorithms include generative adversarial networks, variational autoencoders, and diffusion models (Ho et al., 2020). With the continued accumulation of data in the engineering field, generative AI is gradually being applied to the design tasks of engineering structures in general (Yüksel et al., 2023), and to building structures in particular (Liao et al., 2024). Research on the generative AI design of building structures has already encompassed various aspects, including structural layout (Fu et al., 2023; Zhou et al., 2024), component section sizes (Fei et al., 2022; Feng et al., 2023), and seismic isolators (Liao et al., 2023).

A workflow for generative AI design of building structures typically involves three phases: (1) Data preparation: Collecting design drawings manually completed by structural engineers, preprocessing them to obtain labeled data, and dividing them into training and test sets. (2) Model training: Training a generative AI model using the training set. (3) Model testing and evaluation: Designing the test set cases using the trained generative AI model and evaluating the model performance based on the similarity between the AI design and the one created by engineers. It is evident that such a workflow heavily relies on the quality of data (i.e., engineer designs). If the quality of the engineer designs is not high enough, then: (1) the training set cases for AI to learn would be unreliable, which affects the AI’s performance; (2) the ground truth used for evaluating AI designs would also be unreliable, leading to unreasonable evaluation results. High-quality data is, therefore, crucial for producing successful generative AI design outcomes.

High-quality data, in other words, high-class structural design, typically requires two types of design metrics to be met: (1) All physical responses must comply with the design code requirements to ensure structural safety; (2) Material costs for concrete, steel rebar, etc., should be minimized to achieve an economical design. In reality, however, the obtained datasets often cannot fully meet these requirements, as low-quality data are inevitably included due to various reasons, such as: (1) The collected design drawings, either being unverified preliminary designs or designed by unqualified engineers, are of poor quality. (2) Incomplete design drawings are collected, for example, some key structural design information is missing. (3) The preprocessing of the design drawings might cause errors due to the highly non-standardized nature of the drawings. As well recognized, low-quality data can adversely affect the performance of generative AI (Whang et al., 2023) and this issue needs to be rectified urgently.

Currently, existing research primarily focuses on the development of novel AI algorithms (Xu et al., 2022; Xu and Guo, 2025), with less in-depth exploration of how to enhance data quality (Stonebraker and Rezig, 2019). In the domain of machine learning, data cleaning is usually performed to improve data quality, including uncertainty-based approaches (Rottmann and Reese, 2023), loss-based approaches (Liu et al., 2021), counterfactual approaches (Flokas et al., 2022), and outlier-based approaches (Lee et al., 2018). These data cleaning techniques can remove erroneous values from the dataset and thereby improve the model performance. However, they primarily focus on the cleaning of generic data, for example, image, text, table, and time-series data (Côté et al., 2024), instead of the specialized structural design data, and are therefore not directly applicable to the generative AI design of building structures. Structural design tasks require complex evaluation criteria and involve various feasible solutions, making it very difficult to find the error and fix it. An enhancement of the overall design quality of the entire dataset is needed from multiple aspects with domain knowledge of structural design, instead of merely removing apparent errors like existing data cleaning methods.

To enhance the quality of structural design data, structural optimization can be adopted, which is a method commonly used in building structure design practice that can comprehensively improve the performance of design schemes (Afzal et al., 2020). Indeed, numerous studies have attempted to integrate AI methods with optimization techniques (Shirgir et al., 2024; Shirgir and Farahmand-Tabar, 2025). However, existing research has rarely focused on leveraging optimization approaches to enhance AI training data. Recently, Fei et al. (2023) proposed a data augmentation method based on structural optimization, which improves the performance of generative AI on long-tailed datasets. Yet, the method was developed with a primary emphasis on data distribution without exploring the impact of data quality on AI performance. To advance the generative AI design of building structures, it is necessary and desirable to develop a data enhancement method based on structural optimization and to investigate the influence of data quality on the structural design outcomes of generative AI. In the context of this article, data enhancement places emphasis on comprehensively improving the quality of data, data augmentation focuses on expanding the scale and increasing the diversity of the dataset, and data cleaning is centered around removing errors from the data.

In response to the aforementioned challenges, this study proposes a data enhancement method based on structural optimization to address the issue of low data quality encountered by the generative AI design of shear wall structures, particularly shear wall layout. Shear walls are critical load-bearing elements in high-rise buildings, their layout directly influencing structural safety, material efficiency, and architectural usability. Unlike beams or columns, shear walls involve higher computational complexity and design requirements. The rest of this paper is organized as follows. Section ‘Workflow of generative AI design with data enhancement' proposes a workflow of generative AI design with data enhancement. Section ‘Dataset and regression analysis' presents the dataset of shear wall structures and the regression analysis of necessary design information. Section ‘Structural optimization of shear wall layout' introduces the structural optimization method for shear wall layout design. Section ‘Diffusion model-based structural design with data enhancement' illustrates the generative AI design based on diffusion models and the evaluation of AI designs. Section ‘Case study' provides two typical case studies. Section ‘Conclusions' summarizes the main conclusions of this study.

Workflow of generative AI design with data enhancement

The integration of data enhancement into a traditional workflow of generative AI design yields a new workflow depicted in Figure 1, which consists of four phases:

(1) Data preparation (Section ‘Dataset and regression analysis'): Collect design drawings manually produced by structural engineers, preprocess them to obtain labeled data, and divide them into training and test sets. When necessary design information is absent, establish regression formulas based on the distribution characteristics of design information in the collected dataset to supplement the missing information.

(2) Data enhancement (Section ‘Structural optimization of shear wall layout'): For each shear wall structure in the training set, establish a detailed finite element (FE) model through parametric modeling, derive its design metrics (physical response and material cost) through physical analysis and reinforcement design, and improve the design scheme through structural optimization.

(3) Model training (Section ‘Diffusion model-based structural design with data enhancement'): Train a generative AI model using the optimized training set.

(4) Model testing and evaluation (Section ‘Case study'): Design the test set cases using the trained generative AI model and ascertain the necessary design information using regression formulas. Acquire the design metrics of the AI design through the FE model, thereby evaluating the quality of the AI design and the performance of the generative AI.

Figure 1.

Workflow of generative AI design with data enhancement.

The advantages of the proposed workflow are: (1) The structural optimization enhances the data quality of the manually-designed dataset by adjusting the design schemes to be safer and more economical, thereby improving the performance of the generative AI. (2) The evaluation of AI designs using design metrics is not affected by the quality of ground truth data, allowing for a more independent and rational assessment of the AI’s performance and consideration of the non-uniqueness of structural design tasks. The proposed workflow is highly modular and can be extended to various structural components, diverse load cases, and different materials by changing the parametric model and optimization objectives.

This study takes the shear wall layout design task as an example to verify the effectiveness of the proposed data enhancement method. The shear wall structure is a popular structural system, often used in high-rise residential buildings in earthquake-prone areas. Shear walls are a key structural component of a shear wall structure, resisting both horizontal and vertical loads. The layout of shear walls significantly affects the physical response and material cost of shear wall structures, and the layout design is an essential task during the schematic design phase (Liao et al., 2021).

To successfully accomplish the proposed workflow, it is crucial to automatically model, analyze, and optimize the design data of shear wall structures, which faces two key technical challenges:

(1) Challenge one: Modeling, analysis, and optimization of shear wall structures require complete design information, including structural layout, component section sizes, material grades, etc. Existing generative AIs typically can only accomplish an essential part of the structural design task and are unable to generate all necessary design information (Liao et al., 2024). For instance, when designing shear wall structures, generative AI only focuses on the shear wall layout, which is the most critical part (Gu et al., 2024), but cannot generate other design information, such as component section sizes and material grades. Although it is possible to train the AI algorithms to predict missing design information, it can unfortunately complicate the design problem and in turn increase the computational demand. Consequently, this study directly employs explicit regression formulas, which are established based on the collected dataset as detailed in Section ‘Dataset and regression analysis', to determine the necessary design information beyond the shear wall layout.

(2) Challenge two: There is a need for efficient multi-objective optimization of shear wall structures. The design of a shear wall structure requires consideration of multiple physical and material metrics, leading to a multi-objective optimization problem. Additionally, efficient computational time for structural optimization must be considered for data enhancement. Single-time modeling and analysis of a shear wall structure generally take several minutes, but hundreds of modeling and analysis iterations may be required in an entire optimization process. For that reason, the optimization of hundreds of design cases would necessitate tens of thousands of such iterations. Accordingly, a balance must be sought between optimization time and effectiveness by determining the appropriate optimization parameters as detailed in Section ‘Structural optimization of shear wall layout'.

Dataset and regression analysis

Shear wall structure dataset

To train a generative AI suitable for structural design, it is necessary to collect a large amount of structural design data. A total of 364 shear wall structure design drawings are collected from several renowned architectural design institutes in China. The architectural design, structural design, and design conditions of the shear wall structures are extracted from the collected drawing dataset. Specifically, architectural design includes the planar layout of partition walls, doors, and windows. Structural design includes the planar layout of shear walls (for training the generative AI), as well as such design information as the component section sizes, the material grades, and the number and division of standard stories (for establishing regression formulas). Design conditions include the seismic, site, and height conditions, characterized respectively by the design seismic acceleration $a_{d}$ (which is the peak ground acceleration of the design basis earthquake), the site characteristic period $T_{g}$ (which is related to the site geological conditions, and the magnitude and epicenter distance of the design basis earthquake), and the number of stories $n_{s}$ (Zhao et al., 2023).

The collected dataset is divided into a training set (303 cases) and a test set (61 cases) in a 5:1 ratio. Typical cases are illustrated in Figure 2. To enrich the diversity of the dataset and thereby enhance the generalization performance of the generative AI, data augmentation methods such as flipping and mirroring (Liao et al., 2021) are employed during training.

Figure 2.

Typical cases of the established shear wall structure dataset.

Regression analysis of necessary design information

Using the shear wall structure dataset outlined in Subsection ‘Shear wall structure dataset', regression formulas are established for necessary design information, namely the number and division of standard stories, the material grade, the story height, the shear wall thickness, and the beam height. These regression formulas are then used to supplement the necessary design information for modeling, analysis, and optimization of shear wall structures. The aforementioned design information can be categorized into two types:

The first type of design information, which has a relatively minor correlation with the layout of shear walls, includes the number and division of standard stories, the material grade, the story height, and the shear wall thickness. During the schematic design phase, these design information is primarily determined based on engineering experience and design conditions. During structural optimization, it is reasonable to keep this type of design information consistent with the initial design (manual design by engineers). For cases without an initial design, that is, the training set cases with missing design information (15 out of 303) and all the 61 test set cases, the methods described in Subsections ‘Number and division of standard stories' to ‘Shear wall thickness' are used to fill the missing information.

The second type of design information is closely related to the layout of shear walls, specifically the beam heights. Although the collected design drawings provide information on the beam heights, when the layout of the shear walls changes, the beam spans will also change accordingly, making the original beam heights potentially unrealistic. Therefore, for all the cases involved in structural optimization (training set) and those without beam height information (test set), all the beam heights are determined using the method described in Subsection ‘Beam height'. When this approach is taken, two issues may arise. First, some shear wall structures may fail to meet physical requirements. Second, unnecessary material consumption may occur. The root cause is that the original beam heights specified in the design drawings are not being used. Nevertheless, this does not affect the findings of this study, because after performing structural optimization described in Section ‘Structural optimization of shear wall layout', the physical performance of all shear wall structures can be significantly improved and the material cost saved.

Additionally, to rationalize the regression formulas to not be heavily influenced by any particular engineering projects, the collected dataset is deduplicated. Shear wall structures from the same project that have similar building heights and planar layouts are excluded from the dataset. Finally, 102 shear wall structures with significant differences in structural layout, building height, and design seismic acceleration are considered. These structures are located in eight provinces in China, covering all height ranges below 100 m, and their design seismic accelerations ( $a_{d}$ ) range from 0.05 g to 0.3 g. The distributions of story numbers and design seismic accelerations are shown in Figure A1(a) and (b) in Appendix A.

Number and division of standard stories

In structural design practice in China, to facilitate the design of high-rise buildings, stories with identical structural designs are commonly categorized as a standard story, as illustrated in Figure 3.

Figure 3.

Illustration of the number and division of standard stories.

The number of standard stories is influenced by both horizontal loads (primarily seismic loads in inland areas) and vertical loads (primarily gravity loads). Therefore, a linear regression is conducted with the design seismic acceleration $a_{d}$ and the number of stories $n_{s}$ as independent variables, and the number of standard stories as a dependent variable. The regression outcome is presented in equation (1):

n_{std} = Max (Round (7.95 a_{d} + 0.105 n_{s} - 0.577), 1)

(1)

where

n_{std}

is the number of standard stories,

a_{d}

and

n_{s}

are previously defined, the Round (·) function rounds the regression result into an integer, and the Max (·) function ensures that the number of standard stories is not smaller than 1.

Comparing the engineer designs with the regression results from Equation (1), 53% of the cases have an error of 0, 93% of the cases have an error within ±1, and 100% of the cases have an error within ±2. Therefore, the regression accuracy of equation (1) is confirmed.

When the number of standard stories is larger than 1, it is necessary to further determine the division ratio of each standard story, by equation (2):

r_{std, i} = \frac{n_{std, i}^{high}}{n_{s}} i = 1, 2, 3, 4

(2)

where

r_{std, i}

is the division ratio of the i-th standard story, and

n_{std, i}^{high}

is the highest story number of the i-th standard story. For the scenario shown in Figure 3,

r_{std, 1} = 2 / 9 = 0.22

and

r_{std, 2} = 5 / 9 = 0.56

For cases with 2, 3, 4, and 5 standard stories, the regression results for

r_{std, i}

are presented in Table 1. When dividing the standard stories according to Table 1, the mean absolute error (MAE) of

n_{std, i}^{high}

compared to the engineer designs is less than three stories, indicating that the accuracy of the regression formulas is acceptable.

Table 1.

Regression results of $r_{st d, i}$ .

Number of standard stories	i-th standard story	Regression result of $r_{std, i}$	MAE of $n_{std, i}^{high}$
2	1	0.29	2.00
3	1	0.24	1.58
3	2	0.51	2.05
4	1	0.28	2.11
	2	0.50	1.89
	3	0.70	1.94
5	1	0.27	2.25
	2	0.48	2.75
	3	0.65	2.50
	4	0.81	2.25

While Figure 3 shows a simplified example with nine stories for clarity, the regression model is mathematically applicable to buildings with fewer than 33 stories, as the shear wall structures in China are usually less than 100 m or 33 stories tall.

Material grade

It is observed from the dataset that the rebar grade commonly used in shear wall structures is HRB400, corresponding to a standard yield strength of 400 MPa, while the concrete grade is often variable. Additionally, shear walls and coupling beams typically use the same concrete grade, while frame beams and slabs generally have the same concrete grade. The variations of the concrete grades for standard stories are given in Table 2, where the number following ‘C' represents the standard cubic compressive strength, in MPa.

Table 2.

Statistical patterns of concrete grade.

Number of standard stories	i-th standard story	Shear wall/coupling beam	Frame beam/slab
1	1	C30	C30
2	1	C35	C30
2	2	C30	C30
3	1	C40	C30
	2	C35	C30
	3	C30	C30
4	1	C45	C30
	2	C40	C30
	3	C35	C30
	4	C30	C30
5	1	C50	C30
	2	C45	C30
	3	C40	C30
	4	C35	C30
	5	C30	C30

Upon validating the statistical patterns in Table 2 with the engineer designs, it is found that more than 90% of the standard stories have an error in concrete grades within ±5 MPa. The MAE of concrete grade for shear walls and coupling beams is 1.7 MPa, while that for frame beams and slabs is 0.8 MPa. This indicates that the statistical patterns in Table 2 are representative and accurate for predicting the concrete grades used in shear wall structures.

Story height

For shear wall structures, the first story may sometimes serve as a lobby or commercial space, which may lead to a greater story height compared to the other stories. Separate statistics are collected on the heights of the first story and the other stories, with the results depicted in Figure A1(c) and (d) of Appendix A. It is evident that, for the first story and the remaining stories, the respective story heights of 3.0 m and 2.9 m are mostly common, which can be utilized for the design of story heights.

Shear wall thickness

The distribution of shear wall thickness is shown in Figure A1(e) in Appendix A. The thicknesses of shear walls are affected by seismic loads, so the shear wall structures are first classified according to the design seismic acceleration $a_{d}$ . For cases under the same seismic load, the thickness of a shear wall is primarily governed by the axial compression ratio. Therefore, the thickness is directly proportional to the gravity load on the shear wall and is inversely proportional to the concrete grade of that shear wall. The line bearing capacity of a shear wall is defined in equation (3) as follows:

c_{line} = t_{wall} \times s_{conc}

(3)

where

c_{line}

is the line bearing capacity of a shear wall (unit: 10³ kN/m),

t_{wall}

is the wall thickness (unit: m), and

s_{conc}

is the concrete grade of the shear wall (unit: MPa).

The gravity load is reflected by defining the gravity-load height

h_{above}

as the building height above the bottom of a standard story (unit: m). For the scenario shown in Figure 3, the gravity-load height

h_{above}

for the first, second, and third standard stories are 26.2 m, 20.3 m, and 11.6 m, respectively. The relationship between

c_{line}

and

h_{above}

in the dataset can be found in Figure A1(f)-(j) of Appendix A, indicating a significant positive correlation between the two. After comparing linear, polynomial, logarithmic, and exponential regressions, the linear regression is adopted for the best fitting results and explainability. The regression results of

c_{line}

and

h_{above}

can be found in Table 3, with a goodness of fit (R²) ranging from 0.522 to 0.666.

Table 3.

Regression results of shear wall thickness.

Design seismic acceleration $a_{d}$	Regression results	Goodness of fit (R²)
0.05	$c_{line} = 0.083 h_{above} + 3.774$	0.666
0.1	$c_{line} = 0.036 h_{above} + 5.193$	0.522
0.15	$c_{line} = 0.021 h_{above} + 5.318$	0.532
0.2	$c_{line} = 0.060 h_{above} + 5.068$	0.536
0.3	$c_{line} = 0.087 h_{above} + 5.178$	0.645

The process for determining the thickness of a shear wall is as follows: (1) For each standard story, determine the gravity-load height $h_{above}$ and the concrete grade $s_{conc}$ (Table 2). (2) Using the regression formulas from Table 3, the line bearing capacity $c_{line}$ of the shear wall can be obtained based on $h_{above}$ . (3) Using equation (3), the thickness of the shear wall $t_{wall}$ can be calculated based on $c_{line}$ and $s_{conc}$ , which is then rounded to the nearest standard measurement (180/200/220/250/300 mm). (4) The shear wall thicknesses of the lower standard stories are enlarged if they are smaller than the higher standard story ones.

The thickness design results are compared with the engineer designs. The MAE and mean absolute percentage error of the shear wall thickness are found to be 17 mm and 8.4%, respectively. The standard deviations of absolute error and absolute percentage error are 17 mm and 7.4%, respectively. The validation results demonstrate high precision in the regression formulas. Additionally, the dataset’s high diversity ensures strong generalizability, while the low-dimensional, simple structure of the formulas (e.g., Figure A1(f)–(j)) minimizes overfitting potential.

Beam height

A total of 3486 beams are collected from the dataset. Due to differences in physical behavior and computational models between the coupling and frame beams, it is necessary to analyze the heights of different beams separately. The distributions of beam heights are shown in Figure A1(k) and (l) of Appendix A. It can be found that 92.0% of the coupling beams have heights ranging between 400 mm and 1000 mm, while 97.1% of the frame beams have heights ranging between 300 mm and 600 mm. Considering the indoor net height requirement of 2.1 m to 2.2 m, and given that the story height is generally 2.9 m (as discussed in Subsection ‘Story height'), the range for the height of coupling beams can be set between 400 mm and 1000 mm, and for frame beams, it is between 300 mm and 600 mm during the schematic design phase.

From a physical perspective, the height of a beam is primarily influenced by its span. In engineering practice, a large number of coupling beams have a span-to-height ratio of less than 2.5 (Qian et al., 2018). According to the statistical results of Zhao et al. (2022), the average span-to-height ratio for coupling beams is 2.74, which is also close to 2.5. While 2.74 reflects the dataset average, 2.5 was chosen as a conservative proxy that aligns with engineering practices. Therefore, within the height range of 400 mm to 1000 mm, the span-to-height ratio for coupling beams is set to 2.5. The height of coupling beams can be determined by equation (4):

h_{cb} = {\begin{cases} 0.4 \\ 0.4 \\ 1.0 \end{cases} l_{cb} \begin{array}{l} l_{cb} < 1 \\ 1 \leq l_{cb} \leq 2.5 \\ l_{cb} > 2.5 \end{array}

(4)

where

h_{cb}

and

l_{cb}

are the height and the span of the coupling beams (unit: m), respectively.

Given that the height of a frame beam is generally between 1/18 to 1/10 of its span (Qian et al., 2018), an intermediate value of 1/12 can be taken in the schematic design phase. The height of frame beams can be determined by equation (5):

h_{fb} = {\begin{cases} 0.3 \\ 1 / 12 \\ 0.6 \end{cases} l_{fb} \begin{array}{l} l_{cb} < 3.6 \\ 3.6 \leq l_{cb} \leq 7.2 \\ l_{cb} > 7.2 \end{array}

(5)

where

h_{fb}

and

l_{fb}

are the height and the span of the frame beams (unit: m), respectively.

During the schematic design phase, beam height is less critical than shear wall configuration. A fixed span-to-height ratio, bounded by statistically derived minimum/maximum values (to avoid unreasonable beam heights), balances simplicity and practicality, ensuring design feasibility without excessive computational overhead.

Structural optimization of shear wall layout

The design of shear wall structures is based on architectural design, and the shear wall layout shall not interfere with the functions of the architectural spaces. Meanwhile, structural design should aim at saving material costs as much as possible while meeting the physical requirements specified by the design codes (Dehnavipour et al., 2021). In the design of a shear wall layout, if the lengths of the shear walls are too short, the structural design may not meet the code-specified physical requirements; if they are too long, the design may lead to a serious waste of materials (Lou et al., 2021; Tafraout et al., 2019; Zhou et al., 2022). Therefore, this study will adopt the shear wall layout manually designed by engineers as the initial design and adjust the lengths of the shear walls through structural optimization to improve the design quality.

Note that the generative AI utilized in this study is specifically focused on the shear wall layout (to be discussed in Section ‘Diffusion model-based structural design with data enhancement') without considering other design information, such as shear wall thickness and beam height. Consequently, adjusting other design information during the process of structural optimization will not change the training data quality of the generative AI. In this study, the optimization of the shear wall layout is achieved exclusively through the adjustment of the shear wall lengths. Other design information is either kept identical to the initial design or ascertained using the regression formulas outlined in Subsection ‘Regression analysis of necessary design information'.

Design variables and penalty functions

A typical shear wall structure, as shown in Figure 2(a), commonly includes dozens of shear walls, which can only be positioned within the partition walls. To reduce the number of design variables and thus ease the optimization difficulty, connected shear walls are defined as shear wall groups, and the K-Means algorithm is used to cluster the shear wall groups (Qin et al., 2024).

Clustering is performed based on the shape, dimension, and positional features of shear wall groups. Specifically, the first three features characterize the geometric forms of shear wall groups, the 4^th feature represents their total length, and the last two features describe their relative positions. The K-Means algorithm tends to put shear wall groups into the same cluster when they share similar shapes (e.g., both L-shaped or T-shaped), comparable total lengths, and symmetric layouts (along the X or Y axis). Following the recommendation (Qin et al., 2024), an initial cluster number of eight was selected, with additional cluster numbers tested for validation. Numerical experiments on typical cases revealed that increasing cluster numbers (e.g., 14 and 18) led to negligible improvements in optimization results when maintaining constant computational time, with the additional decrease in the final penalty function remaining below 1%. This phenomenon arises because while higher cluster numbers enhance design flexibility for shear wall optimization, they simultaneously expand the search space and problem complexity—factors that may not necessarily benefit the optimization algorithm’s solution efficiency. Therefore, a cluster number of eight is recommended for common cases; for scenarios where the standard story area significantly exceeds that of the typical cases, moderate increases in cluster numbers can be considered.

Based on the eight clusters, a total of eight design variables $x_{i} (i = 1 \dots 8)$ are defined. $x_{i}$ is a floating-point number with a range of $- 1 \leq x_{i} \leq 1$ , representing the wall lengths in the i-th shear wall group cluster. The initial design corresponds to $x_{i} = 0$ . When $x_{i} = - 1$ , the wall lengths in the i-th shear wall group cluster are reduced by 100%. When $x_{i} = 1$ , the wall lengths in the i-th shear wall group cluster are increased by 100%. When $- 1 < x_{i} < 1$ , the wall lengths are adjusted accordingly by structural optimization. It is worth noting that the range of design variables can be adjusted based on the evaluation of the initial design to improve optimization efficiency, as shown in Figure 4. For example, if the overall lateral stiffness of the initial design is too low, the range of design variables can be set to $- 0.2 \leq x_{i} \leq 1$ , so that the length of the shear walls will be adjusted mainly to increase.

Figure 4.

Flowchart for structural optimization.

At the same time, the wall lengths are also subject to certain design rules: (1) The layout of shear walls shall not be beyond the location of the partition walls so as not to interfere with the architectural spaces; (2) According to the Chinese design codes, the length-to-thickness ratio of shear walls shall not be less than 4, and the length of a single shear wall shall not exceed 8 m (MOHURD, 2010). After adjusting the shear wall layout, the beam spans can be correspondingly adjusted without changing the topological relationship between the beams and the walls. The beam heights are then determined according to the regression formula in Subsection ‘Beam height'.

This study considers four physical metrics (namely the inter-story drift ratio, the torsional period ratio, the shear weight ratio, and the axial compression ratio) and one material metric (which comprehensively considers concrete and steel rebars) in structural optimization (Jin et al., 2024; Lou et al., 2021), as summarized in Table 4. The relationship between the design metrics and the penalty functions can be found in Appendix B.

Table 4.

Design metrics and penalty functions of shear wall structures.

Type	Design metrics	Target	Penalty function
Physical response	Inter-story drift ratio $p_{IDR}$	Smaller than code threshold	$f_{IDR}$
	Torsional period ratio $p_{TPR}$	Smaller than code threshold	$f_{TPR}$
	Shear weight ratio $p_{SWR}$	Larger than code threshold	$f_{SWR}$
	Axial compression ratio $p_{ACR, i}$	Smaller than code threshold	$f_{ACR}$
Material cost	Concrete volme $v_{c}$ and steel mass $m_{s}$	As low as possible	$f_{MC}$

A total penalty function $f_{total}$ is defined to transform the multi-objective problem into a single-objective one (Qin et al., 2024), as expressed in Equations (6) and (7). This total penalty function shall be minimized.

f_{total} = f_{MC} \cdot f_{PHY}

(6)

f_{PHY} = f_{IDR} \cdot f_{TPR} \cdot f_{SWR} \cdot f_{ACR}

(7)

where

f_{MC}

is the material penalty function,

f_{PHY}

is the physical penalty function;

f_{IDR}

f_{TPR}

f_{SWR}

, and

f_{ACR}

are the penalty functions for inter-story drift ratio, torsional period ratio, shear weight ratio, and axial compression ratio, respectively.

In the design of shear wall structures, the design scheme is considered safe only upon the physical responses meet the design code requirements. On this basis, further reducing material costs makes the design scheme economical. According to the definitions given in Appendix B, when the physical responses do not meet the design code requirements, the penalty function will increase dramatically. Therefore, the optimization algorithm will prioritize the design procedure to ensure that the physical responses meet the design code requirements first, and then seek to reduce material costs.

Compared with other mathematical forms of penalty functions such as summation, the penalty function in Equations (6) and (7) offers the following advantages. First, it has a clear design rationale: $f_{total}$ represents the material cost adjusted by physical performance requirements. When physical indices fall within a reasonable range, $f_{total}$ directly reflects the material cost in an intuitive manner. Second, since material and physical indices have different dimensions and physical requirements must be prioritized, summation-based formulations require introducing to-be-determined weight coefficients to balance the terms—an issue avoided by the multiplicative structure. Third, in scenarios where multiple physical indices fail to meet requirements, the multiplicative form generates significantly larger penalty terms compared to summation. This characteristic encourages optimization algorithms to proactively avoid such undesirable scenarios by imposing stronger penalties on combined violations.

Discussion on optimization parameters

To obtain the penalty functions described in Subsection ‘Design variables and penalty functions', a detailed FE model of the shear wall structure is automatically created through parametric modeling in the commercial design software YJK-GAMA, and the required design metrics are obtained through FE analysis (GAMA, 2025). Furthermore, the online learning algorithm provided by YJK-GAMA is used to perform structural optimization. Online learning is a surrogate-assisted evolutionary algorithm, which is more effective than traditional evolutionary algorithms such as genetic algorithms and simulated annealing (Fei et al., 2025). Specifically, online learning utilizes several heterogeneous surrogate models for promoting ensemble diversity, such as polynomial regression, truncated Fourier series, support vector machine, and radial basis function networks. It updates the surrogate models to improve prediction accuracy using the new data points collected during the optimization, as the name “online” indicates. The workflow of online learning is illustrated in Figure 4.

Online learning requires the setting of two parameters, that is, the number of FE analysis iterations (which corresponds to the number of fitness evaluations) and the initial sample size (related to the surrogate model). According to the recommendation from the official documentation, the initial sample size is set to 1/10 of the number of the FE analysis iterations. Obviously, the more FE analyses are performed, the longer the optimization process will take, and generally, the better the optimized design will be. This study requires performing structural optimization on the entire training set (303 cases), which induces a massive computational cost. Therefore, it is necessary to find a balance between the optimization time and effectiveness by selecting an appropriate number of FE analysis iterations. To understand the relationship between the optimization efficiency and effectiveness, optimization experiments are conducted on eight typical shear wall structures.

The design conditions for the typical shear wall structures are shown in Table 5. The eight cases are selected to represent dataset diversity, covering typical design conditions: design seismic acceleration (0.1g–0.3 g), site characteristic period (0.35s–0.55s), the number of stories (7–26), and the number of standard stories (1–4). Among them, the initial designs of four cases do not fully meet the physical requirements and are referred to as Group A1. The optimization algorithm applied will primarily reduce the physical penalty function to make these designs safer. For the other four cases, their initial designs have already met the physical requirements and are referred to as Group B1. The optimization algorithm will aim to reduce the material penalty function to make these designs more economical while still meeting the physical requirements.

Table 5.

Design conditions of typical cases.

Group	ID	Design seismic acceleration $a_{d}$ (g)	Site characteristic period $T_{g}$ (s)	Number of stories $n_{s}$	Number of standard stories $n_{std}$	Satisfy physical metrics?
A1	1	0.2	0.45	26	4	No
	2	0.1	0.45	24	3	No
	3	0.2	0.40	26	4	No
	4	0.1	0.55	18	2	No
B1	5	0.1	0.35	12	1	Yes
	6	0.3	0.55	9	3	Yes
	7	0.2	0.55	11	2	Yes
	8	0.15	0.45	7	1	Yes

For three scenarios with the number of FE analysis iterations being set to 50, 100, and 200, the average decreases in penalty functions for Groups A1 and B1 are shown in Figure 5(a) and (b), respectively. For the cases in Group A1, both the total penalty function and the physical penalty function show a significant decrease at 50 iterations and remain stable at 100 and 200 iterations. For the cases in Group B1, the total penalty function and the material penalty function show a significant decrease at 50 and 100 iterations and a slight decrease at 200 iterations. Therefore, the choice of 100 iterations is shown to achieve a good balance between the optimization time and effectiveness, and higher iteration numbers will not induce a substantial reduction in the penalty function. The corresponding initial sample size is 100/10 = 10.

Figure 5.

Penalty function reductions under different optimization parameters. (a) Group A1 (b) Group B1.

Additionally, it can be observed from Figure 5(a) that the structural optimization can significantly improve the physical performance of Group A1 without substantially increasing the material cost. From Figure 5(b), it can be seen that structural optimization can greatly reduce the material cost of Group B1 while slightly improving the physical performance. These outcomes suggest that the proposed structural optimization method can effectively achieve the goal of multi-objective optimization.

Optimization results of training data

Using the optimization parameters described in Subsection ‘Discussion on optimization parameters', data enhancement is carried out on the training set (303 cases) as described in Subsection ‘Shear wall structure dataset'. Concurrently, structural optimization is performed on three computers, with six threads in parallel on each, taking approximately 1 month in total to complete.

Within the training set (303 cases), 154 cases do not fully comply with the physical requirements and are referred to as Group A2. The remaining 149 cases meet the physical requirements and are referred to as Group B2. Various reasons may result in the structural designs not meeting the physical requirements, including (1) The quality of the design drawings is not high enough, for example, they are preliminary designs that have not yet been verified. (2) The number and division of standard stories, material grade, story height, and shear wall thickness for 15 cases, as well as the beam height for all 303 cases, are derived using the regression formulas in Subsection ‘Regression analysis of necessary design information'. (3) There might be inevitable errors during the preprocessing of the design drawings due to the highly non-standardized nature of these drawings. The phenomenon of structural designs not meeting physical requirements highlights the necessity of enhancing data quality through structural optimization.

Figure 6 shows the comparison of shear wall structure design before and after structural optimization, with major differences marked with circles. Case 1 and Case 2 are typical cases of Group A2. Case 1 has insufficient lateral stiffness before optimization, and the inter-story drift ratio ( $p_{IDR}$ ) and shear weight ratio ( $p_{SWR}$ ) do not meet the code requirements (Figure 6(a)). After optimization, the length of several shear walls increases, and all physical responses meet the code requirements (Figure 6(b)). Case 2 has poor torsional performance before optimization, and the torsional period ratio ( $p_{TPR}$ ) does not meet the code requirements (Figure 6(c)). After optimization, the length of the peripheral shear walls increases, and all physical responses meet the code requirements (Figure 6(d)). Case 3 is a typical case of Group B2. Before optimization, all physical responses meet the code requirements, but the design is conservative and uneconomical (Figure 6(e)). After optimization, the length of several shear walls is appropriately shortened, reducing material costs by 6.6% while meeting the code requirements (Figure 6(f)).

Figure 6.

Typical cases in the training set before and after optimization.

After conducting structural optimization on the 303 cases, the changes in the mean and standard deviation (δ_std) of the penalty functions, before and after data enhancement, are presented in Table 6. For the 154 cases in Group A2, the mean of the total penalty function decreases by 14.4% with a δ_std decreasing by 28.7%; the mean and the δ_std of the physical penalty function decrease by 14.1% and 44.8%, respectively; the material penalty function remains essentially unchanged. For the 149 cases in Group B2, the mean of the total penalty function decreases by 5.1% with a δ_std decreasing by 5.5%; the same values of the material penalty function decrease by 3.0% and 6.1%, respectively; and for the physical penalty function, the reduction in the mean and the corresponding δ_std are 2.9% and 66.7%, respectively.

Table 6.

Mean and standard deviation of penalty functions before and after data enhancement.

Group	Data enhancement	Total penalty		Material penalty		Physical penalty
Group	Data enhancement	Mean	δ _std	Mean	δ _std	Mean	δ _std
A2	Before	291.7	81.5	226.4	29.4	1.28	0.29
	After	249.6	58.1	226.2	29.4	1.10	0.16
	Change	−14.4%	−28.7%	−0.1%	0.0%	−14.1%	−44.8%
B2	Before	219.2	27.3	211.4	26.2	1.04	0.03
	After	208.1	25.8	205.0	24.6	1.01	0.01
	Change	−5.1%	−5.5%	−3.0%	−6.1%	−2.9%	−66.7%

The decrease in the mean of the penalty functions indicates that after optimization, Groups A2 and B2 have achieved significant quality improvements, primarily in terms of physical response and material cost, respectively. The reduction in the standard deviation of the penalty functions suggests that the design quality of the training set has become more consistent after optimization. In Section ‘Diffusion model-based structural design with data enhancement', the diffusion model will be trained using the training sets before and after data enhancement, and the model performances will be compared.

Diffusion model-based structural design with data enhancement

Diffusion model

The concept of diffusion originates from the field of non-equilibrium thermodynamics and has attracted wide interest due to its superior mathematical properties (Song et al., 2021). Diffusion models accomplish the generation task by progressively removing minor noise from Gaussian noise (Ho et al., 2020). Recently, a diffusion model for structural design tasks, called Struct-Diffusion, has been proposed and shown excellent performance on the shear wall layout task (Gu et al., 2024). In this work, the shear wall layout design has been transformed into an image inpainting task.

In the domain of image inpainting, diffusion models have emerged as a transformative approach, outperforming traditional generative models like generative adversarial networks (GANs) and variational autoencoders (VAEs) in several critical aspects. Unlike GANs, which often struggle with mode collapse and training instability, especially when reconstructing complex structural details or coherent textures, diffusion models leverage a probabilistic latent space to generate high-fidelity, diverse outputs that maintain semantic consistency with the existing image context. They also avoid the VAE’s inherent compromise between reconstruction accuracy and generative expressivity, as their iterative denoising process allows for fine-grained control over the synthesis of missing regions, enabling superior handling of intricate patterns and multi-modal distributions. While diffusion models entail higher computational costs, techniques like knowledge distillation can substantially accelerate inference speed, making them more practical for real-world applications.

The core strength of diffusion models in inpainting lies in their two-stage denoising framework. First, the forward diffusion process gradually adds Gaussian noise to the original image until it becomes a pure noise tensor, mapping the data distribution to a simple prior (e.g., standard normal distribution). In the reverse denoising process, the model learns to iteratively reverse this diffusion by predicting the noise added at each step and refining the latent noisy tensor back into a clean image. For inpainting specifically, the model is conditioned on the known regions of the masked image, allowing it to focus on synthesizing plausible content for the missing areas while respecting spatial and semantic dependencies. This iterative refinement ensures that the generated content seamlessly integrates with the surrounding pixels, producing results with higher structural coherence and perceptual quality compared to GANs’ adversarial training paradigm or VAE’s approximate posterior inference.

The basic workflow of the diffusion model-based structural design is shown in Figure 7. Initially, the architectural design drawings and design conditions are semantically processed to obtain input data. Specifically, colors are used to represent different components, and floating-point numbers are used to represent design conditions. Then, a pure Gaussian noise is sampled and a trained convolutional neural network model is used to progressively remove minor noise from it, thereby generating the layout of shear walls. Finally, a semantic structural design image is obtained as the output. The mathematical principles and the implementation details of the aforementioned diffusion model are not the focus of this study, and readers are referred to the published literature (Gu et al., 2024).

Figure 7.

Shear wall layout design based on diffusion models.

Using the aforementioned method, the diffusion model is trained based on the enhanced training set (303 cases), and the trained diffusion model is used to design the test set (61 cases). For comparison, the same operations are performed based on the original training set without data enhancement. The technical details of the diffusion models can be found in Table 7.

Table 7.

Configurations of the diffusion model.

Configuration	Choice	Configuration	Choice
U-net type	Guided diffusion	Beta schedule	Linear
Image size	256 × 256	Number of timestep	2000
Input channel	6	Linear start	1 × 10^-6
Inner channel	64	Linear end	0.01
Output channel	1	Activation function	SiLU
Residual blocks	2	Learning rate	5 × 10^-5
Channel multipliers	[1, 2, 4, 4]	Batch size	4
Dropout ratio	0.2	Loss function	Mean square error

Evaluation metrics

In traditional generative AI designs for shear wall structures, the similarity between the AI design and the engineer design (e.g., Intersection over Union) is commonly used to evaluate the performance of the AI models (Liao et al., 2024). This evaluation method is advantageous in its high evaluation efficiency, but it also has some limitations: (1) The engineer design, which serves as the ground truth, may not be of high quality, leading to potential misjudgment of high-quality AI designs. (2) Structural design is a creative task with no single correct answer. The difference between the AI design and the engineer design does not necessarily mean that the AI design is unreasonable. (3) In the model application phase, when the engineer design is not available, similarity metrics cannot be used to evaluate the AI design.

Therefore, this study adopts an evaluation method based on design metrics, which involves establishing a detailed FE model to obtain the physical response and material cost of the AI design and then evaluating the performance of the AI design. Firstly, based on computer vision techniques and design information, the coordinates of shear walls are extracted from the semantic structural design images generated by the diffusion model (Fei et al., 2023). Secondly, based on the regression formulas given in Section ‘Regression analysis of necessary design information', other necessary design information required for FE analysis is obtained, including component section sizes, material grades, etc. Finally, the physical response and material cost (listed in Table 4) of the AI design are obtained according to the FE results, and $f_{total}$ , $f_{MC}$ , and $f_{PHY}$ are used as evaluation metrics.

Evaluation results of design outcomes

The diffusion models, both with and without data enhancement, are used to design the 61 cases in the test set. The evaluation metrics of their design outcomes are shown in Table 8. Among the 61 cases, 15 cases do not meet the physical requirements without data enhancement, and are referred to as Group A3; the remaining 46 cases already meeting the physical requirements without data enhancement are referred to as Group B3.

Table 8.

Mean and standard deviation of design metrics with and without data enhancement.

Group	Data enhancement	Total penalty		Material penalty		Physical penalty
Group	Data enhancement	Mean	δ _std	Mean	δ _std	Mean	δ _std
A3	Without	343.3	70.4	222.0	13.9	1.56	0.36
	With	270.7	57.0	225.4	15.0	1.20	0.21
	Change	−21.1%	−19.0%	1.5%	7.9%	−23.1%	−41.7%
B3	Without	245.7	28.3	231.5	21.0	1.06	0.11
	With	240.1	20.8	230.3	18.9	1.04	0.06
	Change	−2.3%	−26.5%	−0.5%	−10.0%	−1.9%	−45.5%

After data enhancement, the mean of the total penalty function for Group A3 decreases by 21.1%, and the standard deviation (δ_std) decreases by 19.0%; the mean and the δ_std of the physical penalty function decrease by 23.1% and 41.7%, respectively; the mean of the material penalty function increases by 1.5%, with the corresponding δ_std increasing by 7.9%. Meanwhile, out of the 15 cases of Group A3, 10 cases can now meet the physical requirements. The remaining 5 cases, although still not meeting the physical requirements, see the mean of the physical penalty function decreasing by 18.5% and the δ_std decreasing by 61.6%, indicating that they can be more easily adjusted to meet the physical requirements in the subsequent detailed design phase. For Group B3, the mean of the total penalty function decreases by 2.3%, and the δ_std decreases by 26.5%; the mean of the physical penalty function decreases by 1.9%, and the δ_std decreases by 45.5%; the mean and the δ_std of the material penalty function decrease by 0.5% and 10.0%, respectively. Additionally, all cases in Group B3 still meet the physical requirements.

By enhancing the data quality of the training set, the number of AI designs that do not meet the physical requirements has been reduced by 67%, and the remaining 33% can be more easily adjusted to meet the physical requirements; the material cost for the AI designs that meet the physical requirements has been reduced by 0.5%. The proposed data enhancement method can make the AI designs significantly safer and marginally more economical. Since meeting the physical requirements is a prerequisite for reducing the material costs in structural design tasks, it is reasonable that the proposed data enhancement method has a more significant effect on improving the physical performance while the effect on reducing the material costs is expectedly minor. At the same time, the decrease in the standard deviation of the total penalty function of the AI designs indicates that the consistency of the AI design quality has also been improved.

Case study

To intuitively demonstrate the improvement in the design quality of the diffusion models after data enhancement, two shear wall structure cases from the test set are presented herein, as presented in Figure 8 and Table 9. These two cases are geographically close, and thus share the same seismic and site design conditions, both with a design seismic acceleration $a_{d} = 0.2 g$ and a site characteristic period $T_{g} = 0.4 s$ . These two cases have different building heights, with Case 1 being 32.0 m tall ( $n_{s} = 11$ ) and Case 2 being 29.1 m tall ( $n_{s} = 10$ ). According to the Chinese design codes (MOHURD, 2010), both shear wall structure cases must meet the physical requirements $p_{IDR} \leq 0.1 %$ , $p_{TPR} \leq 0.9$ , and $p_{SWR} \geq 0.032$ . Additionally, the material cost $f_{MC}$ and the proportion of shear walls with excessive axial compression ratios $r_{ACR}$ should be as small as possible.

Figure 8.

Design outcomes of two case studies.

Table 9.

Design metrics of two case studies with and without data enhancement.

Case	Data enhancement	$f_{total}$	$f_{MC}$	$p_{IDR}$	$p_{TPR}$	$p_{SWR}$
A	Without	339.4	231.9	0.131%	0.687	0.049
	With	240.8	238.5	0.071%	0.621	0.070
B	Without	236.5	234.4	0.074%	0.556	0.072
	With	231.3	229.1	0.087%	0.788	0.069

The architectural design of Case A is shown in Figure 8(a). Prior to data enhancement, the shear wall layout designed by the diffusion model is shown in Figure 8(c). According to Table 9, the designed shear wall structure shows an excessive inter-story drift ratio ( $p_{IDR} > 0.1 %$ ), indicating that the structural stiffness is insufficient and that more shear walls are needed. After incorporating data enhancement, the shear wall layout designed by the diffusion model is shown in Figure 8(e), with a noticeable increase in shear wall lengths (highlighted by circles). According to Table 9, the inter-story drift ratio of the new design complies with the design code requirements ( $p_{IDR} \leq 0.1 %$ ), suggesting a more reasonable shear wall layout.

The architectural design of Case B is depicted in Figure 8(b). Before data enhancement, the shear wall layout designed by the diffusion model is shown in Figure 8(d). According to Table 9, the physical metrics of this design meet the design code requirements, with a material cost $f_{MC} = 234.4$ . After incorporating data enhancement, the shear wall layout designed by the diffusion model is illustrated in Figure 8(f), where a reduction in shear wall lengths is evident (highlighted by circles). As indicated in Table 9, the physical metrics of the new design remain compliant with the design code requirements, and its material cost is reduced to $f_{MC} = 229.1$ , reflecting a 2.3% decrease in material costs.

It is evident that data enhancement improves the shear wall layouts designed by the diffusion model, making them safer and more economical.

Conclusions

This study proposes a data enhancement method based on the combination of structural optimization and diffusion models to address the issue of low data quality exhibited in the generative AI design of shear wall structures. The core contributions of this study are: Firstly, a generative AI design workflow incorporating data enhancement is proposed, which adds a data enhancement phase to the traditional workflow and modifies the data preparation and model evaluation phases. Secondly, a series of regression formulas are established based on the distribution characteristics of the design information, enabling model evaluation and data enhancement with missing design information. Thirdly, a structural optimization method for shear wall layout is presented, capable of simultaneously improving multiple design metrics of shear wall structures. The specific conclusions drawn from this study are as follows:

(1) The established regression formulas for necessary design information fit the collected dataset well and can be used to supplement missing information for training cases during data preparation, as well as to determine the necessary information for test cases during model evaluation.

(2) The proposed shear wall layout optimization method effectively achieves multi-objective optimization, significantly improving the engineer designs and thereby enhancing the data quality of the training set. After 100 FE analysis iterations, for Group A2, the mean of the physical penalty function decreases by 14.1%, and the standard deviation decreases by 44.8%; for Group B2, the mean and the standard deviation of the material penalty function decrease by 3.0% and 6.1%, respectively. This indicates that the data quality of the enhanced training set is higher and more consistent.

(3) The proposed data enhancement method significantly improves the performance of the diffusion models in designing test set cases. The number of AI designs with inadequate physical performance is reduced by 67% and the remaining 33% can be easily adjusted to meet the physical requirements. For AI designs already meeting the physical requirements, their average material cost is reduced by 0.5%. Additionally, the standard deviations of the total penalty function of Groups A3 and B3 decrease by 19.0% and 26.5%, respectively. This demonstrates that the proposed method can effectively enhance the physical performance of the shear wall layouts designed by the diffusion models, marginally reduce their material costs, and improve the consistency of design quality.

This study has the following limitations: Firstly, the applicability of the proposed data enhancement method to design tasks other than those presented in this study and the associated generative AI algorithms needs further verification. Secondly, more design metrics could be considered in structural optimization and model evaluation to better facilitate engineering applications. Lastly, a key limitation is the reliance on Chinese design codes. Adapting the method to other regions would require redefining penalty functions to align with local codes, a straightforward modification that preserves the core method’s validity.

In the future, machine learning methods (support vector machines, gradient boosting trees, etc.) could be employed to supplement missing design information, potentially providing a more reasonable design scheme due to their capacity to model complex nonlinear relationships. Furthermore, the integration of pre-trained universal surrogate models into the data enhancement pipeline would lower the computational demand.

Footnotes

Acknowledgment

This work is supported by the Sichuan Science and Technology Program (2025ZNSFSC1312), the Beijing Municipal Natural Science Foundation (8252008), and the National Natural Science Foundation of China (52408348). The authors would like to acknowledge Ms Yuanxin Liu from Shaodong Jianye Engineering Technology Co., Ltd, Mr Hongjing Xue from Beijing Institute of Architectural Design Institute Co., Ltd, and Mr Shulu Zhang from China Southwest Architectural Design & Research Institute Co., Ltd for providing structural design blueprints used in this work and giving valuable advice on design metrics of shear wall structures.

ORCID iDs

Yifan Fei

Xinzheng Lu

Wenjie Liao

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Sichuan Science and Technology Program (2025ZNSFSC1312), the Beijing Municipal Natural Science Foundation (8252008), and the National Natural Science Foundation of China (52408348).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Appendix

Figure A1.

Statistical distribution of design information. (a) Story number (b) Design seismic acceleration (c) Story height of the first story (d) Story height of other stories (e) Shear wall thickness (f) Line bearing capacity ( $a_{d} = 0.05 g$ ) (g) Line bearing capacity ( $a_{d} = 0.1 g$ ) (h) Line bearing capacity ( $a_{d} = 0.15 g$ ) (i) Line bearing capacity ( $a_{d} = 0.2 g$ ) (j) Line bearing capacity ( $a_{d} = 0.3 g$ ) (k) Height of coupling beam (l) Height of frame beam

References

Afzal

Liu

Cheng

JCP

, et al. (2020) Reinforced concrete structural design optimization: a critical review. Journal of Cleaner Production 260: 120623.

Bandi

Adapa

PVSR

Kuchi

YEVPK

(2023) The power of generative AI: a review of requirements, models, input–output formats, evaluation metrics, and challenges. Future Internet 15(8): 260.

Côté

P-O

Nikanjam

Ahmed

, et al. (2024) Data cleaning and machine learning: a systematic literature review. Automated Software Engineering 31(2): 54.

Dehnavipour

Meshki

Naderpour

(2021) Torsion-based layout optimization of shear walls using multi-objective water cycle algorithm. Advances in Structural Engineering 24(13): 3030–3042.

Fei

Liao

Huang

, et al. (2022) Knowledge-enhanced generative adversarial networks for schematic design of framed tube structures. Automation in Construction 144: 104619.

Fei

Liao

, et al. (2023) Semi-supervised learning method incorporating structural optimization for shear-wall structure design using small and long-tailed datasets. Journal of Building Engineering 79: 107873.

Fei

Qin

Liao

, et al. (2025) Graph neural network-assisted evolutionary algorithm for rapid optimization design of shear-wall structures. Advanced Engineering Informatics 65: 103129.

Feng

Fei

Lin

, et al. (2023) Intelligent generative design for shear wall cross-sectional size using rule-embedded generative adversarial network. Journal of Structural Engineering 149(11): 04023161.

Flokas

Liu

, et al. (2022) Complaint-driven training data debugging at interactive speeds. In: Proceedings of the 2022 International Conference on Management of Data. SIGMOD ’22. Association for Computing Machinery, 369–383. Available at: DOI: 10.1145/3514221.3517849 (accessed 24 February 2025).

10.

Gao

Wang

(2023) Dual generative adversarial networks for automated component layout design of steel frame-brace structures. Automation in Construction 146: 104661.

11.

GAMA (2025) YJK-GAMA secondary development guide. Available at: https://gitee.com/NonStructure/yjk-gama-secondary-development/

12.

Huang

Liao

, et al. (2024) Intelligent design of shear wall layout based on diffusion models. Computer-Aided Civil and Infrastructure Engineering 39(23): 3610–3625.

13.

Jain

Abbeel

(2020) Denoising diffusion probabilistic models. In: Advances in Neural Information Processing Systems. Curran Associates, Inc, Vol. 2020, 6840–6851. Available at. https://proceedings.neurips.cc/paper/2020/hash/4c5bcfec8584af0d967f1ab10179ca4b-Abstract.html (accessed 11 May 2024).

14.

Jin

Yang

Xiao

, et al. (2024) Shear wall layout optimization of multi-tower buildings based on conceptual design and extended evolutionary structural optimization method. Engineering Optimization 56(4): 486–505.

15.

Lee

K-H

Zhang

, et al. (2018) CleanNet: transfer learning for scalable image classifier training with label noise. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 5447–5456. Available at: https://ieeexplore.ieee.org/document/8578669 (accessed 24 February 2025).

16.

Liao

Huang

, et al. (2021) Automated structural design of shear wall residential buildings using generative adversarial networks. Automation in Construction 132: 103931.

17.

Liao

Wang

Fei

, et al. (2023) Base-isolation design of shear wall structures using physics-rule-co-guided self-supervised generative adversarial networks. Earthquake Engineering & Structural Dynamics 52(11): 3281–3303.

18.

Liao

Fei

, et al. (2024) Generative AI design for building structures. Automation in Construction 157: 105187.

19.

Liu

Zhou

Rekatsinas

(2021) Picket: guarding against corrupted data in tabular data during learning and inference. The VLDB Journal 31(5): 927–955.

20.

Lou

Gao

Jin

, et al. (2021) Shear wall layout optimization strategy for high-rise buildings based on conceptual design and data-driven tabu search. Computers & Structures 250: 106546.

21.

MOHURD (2010) Technical Specification for Concrete Structures of Tall Building (JGJ 3-2010). China Architecture & Building Press.

22.

Qian

Zhao

, et al. (2018) Design of Tall Building Structures. China Architecture & Building Press.

23.

Qin

Guan

Liao

, et al. (2024) Intelligent design and optimization system for shear wall structures based on large language models and generative artificial intelligence. Journal of Building Engineering 95: 109996.

24.

Rottmann

Reese

(2023) Automated detection of label errors in semantic segmentation datasets via deep learning and uncertainty quantification. In: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 3213–3222. Available at: https://ieeexplore.ieee.org/abstract/document/10030499 (accessed 24 February 2025).

25.

Shirgir

Farahmand-Tabar

(2025) An enhanced optimum design of a Takagi-Sugeno-Kang fuzzy inference system for seismic response prediction of bridges. Expert Systems with Applications 266: 126096.

26.

Shirgir

Farahmand-Tabar

Aghabeigi

(2024) Optimum design of real-size reinforced concrete bridge via charged system search algorithm trained by nelder-mead simplex. Expert Systems with Applications 238: 121815.

27.

Song

Sohl-Dickstein

Kingma

, et al. (2021) Score-based generative modeling through stochastic differential equations. In: Proceedings of the 9th International Conference on Learning Representations. IEEE.

28.

Stonebraker

Rezig

(2019) Machine learning and big data: what is important? IEEE Data Eng. Bull, Epub ahead of print 2019.

29.

Tafraout

Bourahla

, et al. (2019) Automatic structural design of RC wall-slab buildings using a genetic algorithm with application in BIM environment. Automation in Construction 106: 102901.

30.

Whang

Roh

Song

, et al. (2023) Data collection and quality challenges in deep learning: a data-centric AI perspective. The VLDB Journal 32(4): 791–813.

31.

Guo

(2025) Advances in AI-powered civil engineering throughout the entire lifecycle. Advances in Structural Engineering 0(0): 13694332241307721.

32.

Qian

, et al. (2022) Typical advances of artificial intelligence in civil engineering. Advances in Structural Engineering 25(16): 3405–3424.

33.

Yüksel

Börklü

Sezer

, et al. (2023) Review of artificial intelligence applications in engineering design perspective. Engineering Applications of Artificial Intelligence 118: 105697.

34.

Zhao

Liao

Xue

, et al. (2022) Intelligent design method for beam and slab of shear wall structure based on deep learning. Journal of Building Engineering 57: 104838.

35.

Zhao

Fei

Huang

, et al. (2023) Design-condition-informed shear wall layout design based on graph neural networks. Advanced Engineering Informatics 58: 102190.

36.

Zhou

Wang

Liu

, et al. (2022) Automated structural design of shear wall structures based on modified genetic algorithm and prior knowledge. Automation in Construction 139: 104318.

37.

Zhou

Leng

Meng

, et al. (2024) StructDiffusion: End-To-End intelligent shear wall structure layout generation and analysis using diffusion model. Engineering Structures 309: 118068.

Number of standard stories	i-th standard story	Shear wall/coupling beam	Frame beam/slab
1	1	C30	C30
2	1	C35	C30
2	2	C30	C30
3	1	C40	C30
	2	C35	C30
	3	C30	C30
4	1	C45	C30
	2	C40	C30
	3	C35	C30
	4	C30	C30
5	1	C50	C30
	2	C45	C30
	3	C40	C30
	4	C35	C30
	5	C30	C30

Number of standard stories	i-th standard story	Shear wall/coupling beam	Frame beam/slab
1	1	C30	C30
2	1	C35	C30
2	2	C30	C30
3	1	C40	C30
	2	C35	C30
	3	C30	C30
4	1	C45	C30
	2	C40	C30
	3	C35	C30
	4	C30	C30
5	1	C50	C30
	2	C45	C30
	3	C40	C30
	4	C35	C30
	5	C30	C30

Number of standard stories	i-th standard story	Shear wall/coupling beam	Frame beam/slab
1	1	C30	C30
2	1	C35	C30
2	2	C30	C30
3	1	C40	C30
	2	C35	C30
	3	C30	C30
4	1	C45	C30
	2	C40	C30
	3	C35	C30
	4	C30	C30
5	1	C50	C30
	2	C45	C30
	3	C40	C30
	4	C35	C30
	5	C30	C30