Abstract
Accurate prediction of the seismic response of buildings is crucial for their structural assessment and performance evaluation. To this end, leveraging recent advancements in deep learning, this study introduces a convolutional long short-term memory neural network with attention mechanism (CNN-LSTM-ATT) for predicting the seismic response of moment frame and shear wall-frame structures. Through ablation experiments, the effectiveness of the convolutional and attention blocks was validated. Furthermore, employing transfer learning, the CNN-LSTM-ATT model was fine-tuned to predict seismic response across different target buildings. Two distinct transfer learning scenarios were investigated: 1) transfer from finite element models with various parameters of the same structure; and 2) transfer from finite element models to same actual structures. These scenarios demonstrate that model-based transfer learning significantly enhances the prediction accuracy of CNN-LSTM-ATT. Compared to the finite element models, the model based on transfer learning (i.e., with fine-tuning) in various scenarios, accurately predicted the nonlinear behaviors of structures. Thus, the proposed method is applicable for easy modeling and rapid prediction of dynamic response in various building structures under earthquakes.
Keywords
Introduction
Earthquakes, as destructive and hazardous natural disasters, present significant challenges to the safety and reliability of buildings. To accurately assess the response of structures under earthquakes, various effective prediction models and analytical methods have been developed. Generally, the dynamic response analysis of structures under seismic actions involves using physics-based models, namely nonlinear time-history analysis (NLTHA) based on mechanical principles. Considerable research has been conducted over the past few decades in this area, resulting in numerous successful computational implementations in the field of civil engineering (Ayoub et al., 2022; Çavdar and Bayraktar, 2013), among which the finite element analysis (FEA) stands as one of the most popular numerical methods in structural dynamic analysis. Finite element models, commonly used in structural analysis, provide detailed structural response information. However, finite element models used for nonlinear dynamic analysis often require finely discretized mesh elements and smaller computational time steps, which result in high computational costs for nonlinear time-history analysis. Moreover, relying solely on finite element models for earthquake response prediction poses limitations due to model imperfections and parameter uncertainties.
To address challenges, such as high computational costs and numerical convergence difficulties, proxy models have been widely adopted in engineering design. These proxy models encompass various techniques, including generalized linear regression models (Guo et al., 2020), radial basis functions (Das and Choudhury, 2019), polynomial response surface models (Moodi et al., 2018), Kriging models (Ghosh et al., 2019; Hoang et al., 2021; Zhang and Wu, 2017), Chebyshev Polynomials Method (Zeng et al., 2024), and artificial neural networks (ANN) (Jahangir and Rezazadeh Eidgahee, 2021; Kalakonas and Silva, 2021; Mangalathu et al., 2018; Tran and Kim, 2020; Wang et al., 2021). However, these proxy models also exhibit the following drawbacks: 1) unbalanced sample distributions and imposed assumptions can diminish the predictive capabilities of these models; and 2) with an increase in problem dimensions, the construction of proxy models requires significantly more sample points and time, potentially compromising their accuracy and affecting their reliability.
In recent years, the rapid development of deep learning technologies has offered new insights and solutions for predicting the earthquake response of structures. The emergence of deep learning models, such as convolutional neural networks (CNN), long short-term memory networks (LSTM), attention mechanisms (ATT), and transfer learning, has significantly improved the modeling and prediction for the earthquake response of structures (Ahmed et al., 2022; Huang and Chen, 2021; Oh et al., 2020a; Perez-Ramirez et al., 2019; Wang et al., 2022, 2023; Zhang et al., 2020). Oh et al. (2020c) proposed a method for predicting the seismic response of building structures using CNN, based on acceleration and displacement time histories, and validated its effectiveness through numerical and experimental analyses. Xu et al. (2022) utilized LSTM to predict nonlinear structural earthquake response under arbitrary lengths and sampling rates. Zhang et al. (2019) introduced two LSTM network schemes that replaced traditional physics-based numerical analysis methods in a data-driven manner, to accurately predict the nonlinear earthquake response of buildings. Liao et al. (2023) proposed a recursive neural network based on deep learning and attention mechanisms (AttLSTM) to accurately predict bridge response under dynamic loads, and demonstrated superior accuracy and reliability compared to traditional LSTM models. Ning et al. (2023) utilized three different deep learning models (i.e., LSTM, WaveNet, and 2D CNN) to predict the time history response of various structures under earthquakes. Sadeghi Eshkevari et al. (2021) designed a physics-based recurrent neural network for estimating the dynamic characteristics of linear and nonlinear multi-degree-of-freedom systems subjected to ground motions. Recent studies, such as Zhang et al. (2024) and Zhao et al. (2024), have developed CNN-LSTM and LSTM-based models to improve seismic response prediction for railroads and train-bridge systems, respectively.
Although deep learning models, including CNN and LSTM, have made significant strides in predicting earthquake response of structures, several crucial challenges are still encountered for practical applications. Firstly, deep learning models typically require large-scale labeled data for training. However, obtaining such data in the field of earthquake response prediction is highly challenging and costly. Experimental and observational data in earthquake engineering usually provide limited information on seismic events, which requires an improved training technique for deep learning models for accurate predictions with limited data. Secondly, the diversity in structural characteristics and seismic environments poses challenges to the generalization performance of the trained models. Each building has unique geometric shapes, material properties, and structural parameters, while seismic events vary based on location and time. As deep learning models need robust generalization capabilities to adapt to the diversity of different structures and seismic events, further research is required to enhance the generalization performance. Despite the significant potential of deep learning models in predicting the earthquake response of structures, overcoming challenges, such as data acquisition and generalization performance, is crucial to better satisfy the practical requirements for earthquake engineering.
This study utilized a finite element model as a source model for pre-training to establish a deep learning model (i.e., CNN-LSTM-ATT). Subsequently, fine-tuning was performed using actual earthquake response data from structures to enhance the prediction performance of the deep learning model. In this model, CNN effectively extracts the local feature patterns and frequency components from structural vibration signals (Oh et al., 2020b), LSTM captures the temporal relationships and long-term dependencies in sequential data (Li et al., 2022), and ATT filters the crucial information for the current task from vast amounts of data. This study designed three different application scenarios to verify the robustness of this transfer learning method: 1) transfer between structures with different material strengths; 2) transfer from finite element models to the same actual structures. In each scenario, a CNN-LSTM-ATT model was established and then fine-tuned with a small amount of data.
This transfer learning approach is expected to address issues such as data scarcity and differences among various structures, enabling the model to better adapt to diverse structural earthquake response prediction tasks. By integrating the physical knowledge from finite element models and the advantages of deep learning models, this study aims to achieve more accurate and reliable earthquake response predictions. The finite element model provides an in-depth understanding of structural behavior, encompassing detailed information on the material properties, geometric shapes, and boundary conditions. This information can be utilized to initialize the deep learning model, aiding its comprehension of the fundamental characteristics of structures. This fusion of physical knowledge and deep learning holds the potential for significant breakthroughs in predicting the earthquake response of structures.
Structural information and modeling
Information of models.
Shear wall-frame structure model
Figure 1 shows a twelve-story shear wall-frame structure. The story height is 4.5 m at the first floor and 3.5 m at the floors above (i.e., total height of 43.0 m). The dead load is 5.0 kN/m2, and the live load is 2.0 kN/m2. OpenSees was used to establish a finite element model for the shear wall-frame structure with a damping ratio of 0.05, using a Rayleigh damping model. The Concrete01 constitutive model (Kent and Park, 1971) was used for the unconfined concrete, and the Concrete02 model (Yassin, 1994) was used for confined core concrete. The Steel02 constitutive model was used for reinforcing bars. The compressive strength of concrete is 30 MPa, and the yield strength of both longitudinal reinforcement and stirrups is 388 MPa. The moment frame and beam-column connections were modeled using force-based fiber elements (forceBeamColumn Element), and the shear wall was modeled using multiple vertical line element model (MVLEM). P-Delta effect was also considered. For nonlinear analysis, the Newton-Raphson algorithm was employed. Component section information is provided in Table 2. The material constitutive relationships for concrete and reinforcement are shown in Figure 2. Analysis model for shear wall-frame structure (unit: mm) (a) structural floor plan (b) equivalent overall model of shear wall-frame structure. Component dimensions and reinforcement details. Material constitutive relationships (a) concrete01 constitutive model (b) concrete02 constitutive model (c) steel02 constitutive model.

To improve the efficiency of analysis, the model is simplified by selecting one frame in Y-direction and one segment of coupled shear wall from the three-dimensional structure. An equivalent planar model in Y-direction is established, where the principle of equivalence maintains the relative stiffness of the frame and shear wall (i.e., the structural stiffness characteristic value
Moment frame structure model
This model represents a two-story frame structure, selected from the literature (Ma et al., 2013). The frame is regular both horizontally and vertically, with plan dimensions of 1.6 m × 1.6 m and a floor height of 1.2 m, as illustrated in Figure 3. The scale ratio is 1:4. The dead load is 5.0 kN/m2, and the live load is 2.0 kN/m2. The column section dimensions are 100 mm × 100 mm, beam section dimensions are 100 mm × 70 mm, and the slab thickness is 60 mm. The mass of each floor in the model is 630 kg (including the weight of the reinforcement in concrete). The compressive strength of concrete at 28 days is 12 MPa. The yield strength of the longitudinal reinforcement and stirrups is 335 MPa and 190 MPa, respectively. For the finite element model of this structure, nonlinear beam-column elements (nonlinearBeamColumn) from OpenSees were employed to simulate the frame beams and columns, using fiber sections. The unconfined concrete adopted the Concrete01 (Kent and Park, 1971) material model from OpenSees, while the confined concrete used the Concrete02 material (Yassin, 1994). The reinforcing bars used the Steel02 constitutive model. Analysis model for moment frame structure (unit: mm).
Ground motion data
Ground motion data.
CNN-LSTM-ATT deep learning model
This study adopted a deep learning model (CNN-LSTM-ATT) that combined CNN, LSTM, and ATT to predict the nonlinear earthquake response of structures. In this model, CNN is utilized to capture the spatial features of time-series data, followed by passing these features to LSTM for time-series analysis and modeling. Finally, ATT is employed to enhance the representational and generalization capabilities of the trained model.
Convolutional neural networks (CNN)
CNN is a type of deep learning model particularly well-suited for image processing and pattern recognition tasks. CNN, based on convolutional operations, extracts features from input data by sliding a window (convolutional kernel). In time series data, this is similar to identifying local patterns and trends (i.e., small-scale patterns and overall changes in the data). This aids the model in better understanding the local structure of the input data. By employing CNN, the model can more effectively capture the local features and patterns in time series data, rather than treating the entire sequence as a whole. This enhances the model’s sensitivity to the internal complexity of the data and allows it to better adapt to local variations, thereby improving its ability to model time series data. The principles of CNN can be summarized as two main steps of convolution operation and pooling operation (Figure 4).
Convolution operation is the core step used to extract local features from input data. The core idea of convolutional operations is to detect and extract specific patterns throughout the entire time series by applying filters (or convolutional kernels) to different parts of the input data. Specifically, for a given input data, the calculation formula for the convolution operation is as follows:
Pooling operation is used to reduce the size of feature maps, to extract the essential features, and to reduce computational complexity. The most widely used pooling operations are the max pooling and average pooling. The max pooling (equation (4)) selects the maximum value within a local region as the pooled result, whereas the average pooling (equation (5)) calculates the mean value within a local region. Architecture of 1d CNN block.

Long short-term memory neural networks (LSTM)
LSTM is a type of recurrent neural network structure primarily used for handling sequential data. In the CNN-LSTM-ATT model, LSTM is employed to perform sequential modeling on the local feature sequences extracted by CNN, enabling the learning of dynamic evolution and temporal characteristics within sequences. Through memory units and gate mechanisms in LSTM, the model can effectively handle long sequences and prevent issues, such as vanishing or exploding gradients during training. By modeling the sequence data, LSTM integrates contextual information of input data into the model, which enables a better understanding of the inherent patterns within sequential data and consequently provides more accurate prediction results. Therefore, within the CNN-LSTM-ATT model, LSTM plays a crucial role by enabling the model to perform temporal modeling of sequence data and modeling long-term dependencies, further enhancing the model’s predictive performance on time-series data. As seen in Figure 5(a), the LSTM architecture consists of a series of self-connected LSTM units. In Figure 5(b), each comprises three interconnected components including a forget gate, an input gate, and an output gate. The specific computational process of the unit is defined as follows: Architecture of LSTM block: (a) architecture of LSTM neural network with multiple hidden layers; (b) architecture of a LSTM cell.

Attention mechanisms (ATT)
ATT layer is an implementation of the attention mechanism within deep learning models, which is used to dynamically adjust the focus of the model for different positions within input sequences. The ATT layer calculates the attention weights to encode the importance of different time steps. These weights are then used to perform a weighted summation across the input sequence, thereby generating representations of key information within the sequence. The role of the ATT layer is to provide a degree of focus on the input sequence, enabling the model to pay attention to parts meaningful for the current task. Specifically, given a query vector (
In this study, a self-ATT is employed (refer to Figure 6). For an input sequence ( Architecture of attention block.

CNN-LSTM-ATT for seismic response of structures
CNN-LSTM-ATT model was utilized for predicting the seismic response of building structures, taking seismic excitations (i.e., accelerations) and structural response (i.e., displacements) within any fixed time as input and producing the structural response for the next time step. For instance, it might use the ground acceleration and structural displacement for the previous n time steps as input and predict the displacement for the n + 1 time step. In this context, n represents the length of the input vector for the CNN-LSTM-ATT model, and can be any integer less than the total number of time steps in the seismic event. Therefore, the CNN-LSTM-ATT model is not constrained by the overall duration of the seismic motion or the length of time steps for individual samples. Generally, the CNN-LSTM-ATT model performs single-step predictions of the structure response, forecasting only one time step at a time. If multiple time steps need to be predicted, the predicted values are used as known values for subsequent predictions in chronological order.
The CNN-LSTM-ATT is a deep learning architecture derived from the LSTM model. Figure 7 shows the architecture of the CNN-LSTM-ATT that comprises the input layer, convolutional layers, LSTM blocks, attention blocks, and output layer. The convolutional layers consist of two one-dimensional convolutional layers (Conv1d) and two max-pooling layers (MaxPool1d). The LSTM block consists of three stacked LSTM layers. The attention block comprises three fully connected (FC) layers used to generate vectors ( Architecture of CNN-LSTM-ATT.
Rectified Linear Unit (ReLU) activation functions were employed in both the CNN and FC layers, to reduce the computational load and mitigate gradient vanishing to some extent. Dropout layers were applied after the attention block and FC layers during the training phase to prevent overfitting. Given a dropout probability of p (equal to 0.1 in this study), each neuron was dropped out with a probability of p during each iteration. During the training of the CNN-LSTM-ATT model, the Adam optimizer was selected, and the mean squared error (MSE) widely used in regression tasks was chosen as the loss function. The Adam optimizer is a gradient descent-based optimization algorithm that integrates momentum and adaptive learning rate features. The algorithm maintains the first and second moment estimates of gradients for each parameter, incorporating bias correction for efficient parameter updates. Due to the large volume of data, the training data are fed to the model in batches using batch size. The batch size represents the number of samples selected for a single training iteration and affects the optimization level and speed of the CNN-LSTM-ATT model. In this study, a batch size of 64 was used for all subsequent models, trained over 100 epochs. The entire training process was executed in a Python environment, utilizing PyTorch as the deep learning library.
Performance validation of CNN-LSTM-ATT model
To verify the feasibility of the proposed CNN-LSTM-ATT model in predicting the dynamic response of structures, the sixth floor of the SF-FEM structure model with small residual displacements was taken as an example. Twenty sets of earthquake ground motion data, along with the corresponding displacement data from finite element analysis, are split into training, validation, and testing sets in an 8:1:1 ratio for the deep learning model. Therefore, among the 20 datasets, there are 2 test sets, numbered TS1 and TS2, respectively. Prior to inputting data, the training data need to be normalized, mapping all data into the range
Considering the requirement of consistent input sequence length in LSTM, a common approach is to use a fixed-length sliding window to obtain samples and labels. In all the models presented in this paper, the sliding window length is set to 100, corresponding to a time duration of 1 second. Models using sliding windows of varying lengths exhibit significant performance differences. Typically, longer windows result in higher model performance compared to smaller window lengths. However, longer window lengths require more input information, which can diminish the model’s generalizability. To address this, this study proposes a data processing strategy: padding the first data point of seismic waves and displacement data with a fixed length of zeros, matching the sliding window’s length. This strategy is crucial as it allows the model to predict the entire displacement sequence without requiring any displacement information when the seismic wave is known.
This study selected four different model structures. Firstly, the basic LSTM model was used as the baseline, followed by the introduction of the attention mechanism (ATT) to enhance the representational capacity of the LSTM model, resulting in the LSTM-ATT model. Subsequently, an attempt was made to integrate the CNN module into the LSTM model, forming the CNN-LSTM model. Finally, both the CNN and ATT modules were integrated into the LSTM model simultaneously, resulting in the CNN-LSTM-ATT model. To validate the performance improvement of these two modules in structural seismic response prediction, an ablation study was conducted. In the field of machine learning, especially in complex deep neural networks, ablation study is used to describe the process of removing certain parts of the network to better understand the behavior of the network. During the training phase, all network architectures utilized the same training dataset, employed the Adam optimizer, and adopted the mean squared error (MSE) loss function, with the maximum of 100 epochs. The initial learning rate was set to
Figure 8 shows the training processes of four models: LSTM, LSTM-ATT, CNN-LSTM, and CNN-LSTM-ATT, with minimum losses of Comparison of the loss functions in four networks for predicting the seismic response of the shear-wall frame structure.
Figure 9 demonstrates the prediction performance of the four models on the test set with a time step of 0.01 seconds. Observing the waveforms, it is evident that both the CNN-LSTM and LSTM-ATT curves exhibit superior fitting results, compared to LSTM. The predictions of the CNN-LSTM-ATT model closely matched the actual displacement, achieving the highest prediction accuracy. As the system exhibited a strongly nonlinear phase, LSTM generated significant displacement deviations due to its inability to accurately capture the dynamic response characteristics of the structure. Prediction results of different models for SF-FEM(C30): (a) time history curves for the TS1; (b) scatter plot for the TS1; (c) time history curves for the TS2; (d) scatter plot for the TS2.
Prediction metrics of different models for the shear-wall frame structure.
The Bold values indicate the highest prediction accuracy among the compared models.

Prediction metrics of different models for the shear-wall frame structure: (a) TS1; (b) TS2.
This ablation study highlights the contribution of both the CNN and ATT mechanisms in improving model performance. By integrating CNN for local feature extraction and ATT for focusing on important time steps, CNN-LSTM-ATT is able to better handle complex sequential data, improving both prediction accuracy and reliability. The incorporation of the ATT mechanism enables the model to focus on key seismic features, leading to stronger prediction capabilities in earthquake response tasks.
Prediction methods based on transfer learning
Transfer learning
Transfer learning is a machine learning method that leverages a model trained in one domain (Task 1) and applies it to another related domain (Task 2) to enhance the model’s performance in the target domain (Task 2). This approach finds applications across various fields, such as computer vision, natural language processing, recommendation systems, etc. While the transfer learning offers numerous advantages, considerations include the similarity between the source and target domains, as well as how to select or adapt models to address challenges in new tasks.
In practical engineering scenarios, it is often prohibitively expensive and challenging to obtain extensive and authentic seismic response data of structures. This is a realistic issue in many engineering fields, such as structural design and assessment of buildings, bridge engineering, safety assessment of nuclear power plants, and design of oil and gas pipelines and tanks, as well as underground structures and tunnel engineering. Due to the difficulty in obtaining a large amount of seismic response data for evaluating the stability and safety of structures, the training of seismic response models relies on limited samples. Because traditional machine learning methods rely on statistical characteristics among samples, the uncertainty of statistical estimation would decrease the trained model performance in small sample contexts. In particular, the unstable parameter estimation would further result in overfitting or underfitting issues. The transfer learning, through mechanisms including knowledge transfer, feature sharing, and domain adaptation, helps overcome the challenges of learning with small samples, thereby enhancing the generalization and performance of the trained model. Leveraging existing data and knowledge, the transfer learning renders machine learning tasks in small sample scenarios more feasible and effective. Thus, in situations of limited data availability or high costs, the transfer learning is effective to mitigate the impacts of data constraints.
Strategy of transfer learning
The strategy employed in this study involves a finite element model-based transfer learning approach, utilizing a well-trained model from the source domain and adapting it through fine-tuning for application in the target domain. The source domain is determined where the model is initially trained and performs well. In the transfer learning, a well-trained model from this domain is adjusted or fine-tuned for the target domain. The target domain is the new domain where the model is transferred. The goal is to adapt the model to the target domain, improving its performance. Fine-tuning is a specific method of transfer learning. This methodology provides a pragmatic solution to the scarcity of structural response data in practical engineering, enabling the acquisition of a higher-performing surrogate model at a reduced cost. Typically, pre-trained models would exhibit robust feature extraction capabilities. Consequently, during the fine-tuning process, adjustments are primarily made to the weights and parameters at the final layers. Additionally, compared to training models from scratch, the fine-tuning necessitates fewer epochs. However, the learning rate during this phase should be carefully controlled to prevent excessive modifications to the learned features that could lead to diminished performance. This approach optimally uses the rich information from the source domain data and the acquired features, which allows more efficient and feasible application in the target domain.
The robustness of transfer learning in predicting structural response was verified through two combinations of source models and prediction models. These combinations include SF-FEM(C30) and SF-FEM(C50), F-FEM and F-AS, where C30 and C50 represent the compressive strengths of concrete of 30 MPa and 50 MPa, respectively. In these combinations, SF-FEM(C30) and F-FEM indicate the source domain models, while SF-FEM(C50) and F-AS indicate the target domain models. The model is pre-trained using data generated from the source domain models, and then fine-tuned with a small amount of data from the target domain models to achieve large-scale predictions in the target domain. Considering the scenario with the limited data available, this study only considers the first 4 seconds of data as known data.
Prediction of finite element model response
In the ablation study, the prediction performance of four models was compared. For the same structure, the CNN-LSTM-ATT model was able to fully reconstruct the structural response under seismic excitations that were not included in the training set. However, changes in structural parameters affect the corresponding seismic response. To adapt the model to new structures with varying strength parameters, a detailed fine-tuning strategy was employed. This involved collecting a small amount of response data from the new structure and fine-tuning the model by adjusting the weights and parameters of the final layers to better capture the response characteristics of the new structure. Note that careful control of the learning rate is necessary during this process to avoid excessive modifications to the learned features.
In this transfer learning process, the response data from F-FEM(C30) was used to pre-train the model, with SF-FEM(C50) as the target prediction structure. The difference between these two models lies in the concrete strength: SF-FEM(C30) has a concrete strength of 30 MPa, while SF-FEM(C50) has a concrete strength of 50 MPa. As a result, the stiffness of the two finite element models differs, which would affect the dynamic response of the structure. Figure 11 compares the prediction results of different models including 400 input time steps. As seen in Table 5, the Prediction results of different models with 400 input time steps for SF-FEM(C50): (a) time history curves for TS1; (b) scatter plot for TS1; (c) time history curves for TS2; (d) scatter plot for TS2. Prediction metrics of different models with 400 input time steps for SF-FEM(C50). The Bold values indicate the highest prediction accuracy among the compared models. Prediction metrics of different models according to input time steps: (a) 

The advantage of transfer learning in this section lies in its ability to significantly reduce the reliance on target-domain data while maintaining high prediction accuracy. By leveraging the pre-trained model from F-FEM(C30) and fine-tuning it with a small amount of response data from the new structure SF-FEM(C50), the CNN-LSTM-ATT model was able to adapt to the new structure with varying concrete strength, demonstrating its ability to generalize across different structural configurations. This approach minimizes the need for large amounts of target-domain data, making it a cost-effective and efficient method for predicting the seismic response of structures with varying parameters.
Prediction of actual structure response
In this section, F-FEM is the source domain model, and F-AS is the target domain model. F-AS represents the real structure from the shaking table test, while F-FEM is the corresponding finite element model of F-AS. The difference between the two lies in the fact that F-FEM is an imprecise fit of F-AS, resulting in a significant discrepancy in the dynamic response of the structure, including phase and amplitude. Figure 13 compares time history curves and scatter plot of each prediction model including 400 input time steps in the F-AS structure. As summarized in Table 6, the CNN-LSTM-ATT model consistently outperformed the other models in terms of prediction accuracy, showing an Prediction results of different models for F-AS: (a) time history curves; (b) scatter plot. Prediction metrics of different models with 400 input time steps for F-AS. The Bold values indicate the highest prediction accuracy among the compared models.
Figure 14 compares the performance of these models in accordance with input time steps, specifically evaluating Prediction metrics of different models according to input time steps: (a) 
In this section, the advantage of transfer learning lies in its ability to effectively reduce reliance on target-domain data while maintaining high prediction accuracy. Despite certain dynamic response differences between the F-FEM model and the F-AS structure, transfer learning successfully improved prediction accuracy through fine-tuning the pre-trained F-FEM model with a small amount of F-AS response data. The CNN-LSTM-ATT model outperformed other models, especially in handling complex nonlinear time-series predictions, showing stronger robustness. Through transfer learning, the model can quickly adapt to new structures or experimental data without the need for large amounts of target-domain data, thus greatly reducing data collection and experimental costs.
Conclusions
This study employed a convolutional long short-term memory neural network model with attention mechanism (CNN-LSTM-ATT), which integrates the feature extraction capabilities of convolutional neural networks (CNN), the information identification ability of attention mechanism (ATT), and sequential modeling proficiency of long short-term memory networks (LSTM). The model aims to simulate the nonlinear dynamic response of building structures under earthquakes. The efficacy of the CNN block and ATT block was validated through ablation experiments and numerical model integration. Additionally, transfer learning was utilized to fine-tune models for predicting the seismic response of different target building structures. The main conclusions can be summarized as follows: (1) The performance of CNN-LSTM-ATT was validated using a numerical model that simulated a twelve-story frame shear wall structure. Ground acceleration was selected as the input, while inter-story displacements of the sixth level were used as the output. Results from ablation experiments demonstrated that LSTM-ATT and CNN-LSTM outperformed LSTM, with CNN-LSTM-ATT exhibiting the highest predictive accuracy. This affirms the model’s capability to predict the nonlinear seismic response with low error, demonstrating its proficiency in capturing nonlinear features. (2) Two scenarios were devised, including transfer from finite element models with various parameters of the same structure, and transfer from finite element models to the same actual structures. Additionally, the impact of different input time steps on the prediction results of model was compared. As the number of input time steps increased, the prediction accuracy of the model also gradually improved. The outcomes highlight the effectiveness of model-based transfer learning strategies in enhancing the predictive accuracy, validating the robustness of fine-tuning in seismic response prediction. (3) Compared to finite element models, the model after fine-tuning effectively transferred the structural nonlinear behaviors trained from numerical models to actual structures, resulting in more accurate predictions of the target structure’s response. The fine-tuned models demonstrated higher accuracy in various scenarios compared to finite element models, highlighting their robust applicability and versatility. (4) In the field of seismic engineering, the proposed CNN-LSTM-ATT model can be used for rapid assessment of structural nonlinear responses under seismic actions, providing data support for seismic design optimization and structural health monitoring. Although this study primarily focused on a specific type of structure (i.e., a twelve-story shear wall-frame structure), the proposed method is also applicable to other types of structures, such as bridges, high-rise buildings, and underground structures, making the findings potentially generalizable across different structural types.
A limitation of this study is its reliance on numerical models for initial training, which may not fully capture the complexities and variations of real-world structures. Additionally, the fine-tuning process was conducted using a limited amount of real structural data, which was effective for adapting the model to the target domain, but it would not fully encompass all possible behaviors of structures under various seismic conditions. Although pre-training on finite element simulations significantly reduces the reliance on costly experimental or field data, a small amount of target-domain data (e.g., only 1–2 s) is still required for fine-tuning and adapting to specific structural configurations. In the current small-scale trained model, transfer learning cannot be realized without any input data. Future improvements could include integrating data augmentation techniques (e.g., simulating different types of seismic waves or varying structural parameters) to generate more diverse training samples. Further, the application of unsupervised domain adaptation methods could reduce the dependency on target domain data, allowing the model to better adapt to new domains without the need for labeled structural data. Therefore, further studies are needed to enhance the generalization of the proposed method across various structural types and seismic scenarios.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the National Natural Science Foundation of China (Grant No. 52278498 and No. 51878268), the Huxiang Youth Talent Support Program of Hunan Province, China (Grant No. 2021RC3041), the Natural Science Foundation of Hunan Province, China (Grant No. 2020JJ4195), a grant (RS-2022-00143493) from Digital-Based Building Construction and Safety Supervision Technology Research Program funded by Ministry of Land, Infrastructure and Transport of Korean Government, and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RS-2024-00333882 and RS-2024-00455788).
