Abstract
In response to the limited monitoring resources within regional bridge networks, this study presents a transfer learning–driven digital twin framework designed for collaborative cross-bridge response prediction and efficient operation and maintenance, realizing continuous structural‐condition perception for each bridge in the regional transport network. By integrating vehicle-load statistical modeling, finite element simulations, and a Transformer architecture, we establish mappings from traffic loads to structural responses, from responses to responses between different points. Transfer learning is employed to facilitate the sharing of model parameters across different bridges. Numerical validations demonstrate that satisfactory results were achieved in scenarios such as transferring from continuous girder bridge to simply supported bridges and from simulated to measured data. Experiments conducted further validate the effectiveness of the framework, revealing that transfer learning improves predictive accuracy by around 20% while reducing training time by 12%. The proposed method delivers an efficient, high-precision solution for the coordinated, smart management of regional bridge infrastructures. These findings offer a novel technical pathway for digitalized infrastructure maintenance and structural health monitoring in resource-constrained environments.
Keywords
Introduction
Digital twin (DT) technology, defined as a dynamic virtual representation of physical systems, has emerged as a transformative paradigm in civil infrastructure management. By integrating sensing networks, computational models, and historical data, DTs enable bidirectional interaction between physical assets and their digital counterparts, supporting real-time monitoring, predictive analysis, and data-driven decision-making (Chang et al., 2024; Lin et al., 2021; Tao et al., 2019; Tao and Qi, 2019; Tian et al., 2022). This paradigm shift is particularly valuable for large-scale and aging transportation infrastructure, where condition-based maintenance strategies are increasingly required to enhance safety and optimize limited resources (Grieves and Vickers, 2017; Tuegel, 2012).
The construction industry has begun to witness preliminary implementations of DT, particularly in the domain of transportation infrastructure (Cheng et al., 2019; Jiménez et al., 2023; Zhang et al., 2024a). Significant progress has been made in applying DT to individual bridges, with demonstrated use cases spanning real-time safety warning systems (Dan et al., 2022; Zhang et al., 2025), damage identification and localization (Xu et al., 2011; Zhang et al., 2023, 2024b), and asset management (Kaewunruen et al., 2023; Saback et al., 2022). For instance, Lai et al. (Lai et al., 2024) developed a non-destructive testing approach integrated with DT technology for structural health monitoring (SHM) of a suspension bridge. Jiang et al. (Jiang et al., 2021a; 2021b) proposed a fatigue life evaluation framework for steel bridge components driven by monitored data and physical simulation. While these studies underscore the value of DT, they predominantly concentrate on individual, well-instrumented bridges, often treated as isolated entities.
The implementation of network-wide DT faces a fundamental barrier: the prohibitive cost of deploying dense SHM systems across every bridge (Tronci et al., 2022; Xu et al., 2015, 2018; Zanelli et al., 2023). This economic constraint necessitates a paradigm shift from direct, resource-intensive sensing to indirect, model-driven inference. In this context, vehicle loads, quantitatively captured by Weigh-in-Motion (WIM) systems, emerge as a critical and scalable interface between the physical and digital realms (Dan et al., 2020; Hou et al., 2020; Tang et al., 2024; Yu et al., 2022). By using traffic load information collected at a small number of instrumented bridges to drive physics-based simulations, structural responses of unmonitored bridges can be indirectly inferred. This vehicle-load-driven paradigm decouples sensing locations from assessment targets, thereby offering a feasible pathway toward regional DT implementation under resource constraints.
A critical difficulty in such a framework lies in transferring predictive knowledge from a few data-rich bridges to numerous data-scarce ones with different structural configurations. Transfer learning (TL) offers an effective solution by leveraging knowledge learned in a source domain—such as a well-instrumented benchmark bridge or a high-fidelity numerical model—to improve prediction performance in target domains with limited data (Pan and Yang, 2010; Teng et al., 2023; Zhuang et al., 2021). Previous studies have demonstrated the effectiveness of TL in bridge damage detection (Mayya et al., 2024; Song et al., 2024; Xiao et al., 2024; Zhao et al., 2024) and seismic response prediction (Pan et al., 2023), indicating its strong potential for cross-bridge response modeling.
Despite these advancements, a significant gap persists in the development of practical and scalable DT frameworks for entire regional bridge networks. Existing research largely neglects the synergistic potential and knowledge transfer among bridges, leaving a wealth of data underutilized. Moreover, the prevailing assumption that comprehensive monitoring systems can be deployed across all bridges is economically unrealistic, limiting the practical impact of many proposed DT concepts. To address this, we propose a TL-driven DT framework tailored for resource-constrained bridge networks. Central to this approach is a resource-optimization strategy that designates a fully instrumented benchmark bridge as the knowledge source. A Transformer-based sequence-to-sequence model first establishes high-fidelity load-response mappings on this bridge, and TL then propagates this knowledge to unmonitored bridges, accommodating structural variations. This approach is rigorously validated through a case study integrating three concrete girder bridges, employing a hybrid data strategy that combines simulation data with field measurements to validate both intra-bridge (load-to-response) and inter-bridge (response-to-response) prediction mechanisms. By explicitly reconciling the ambition of network-wide DT with the constraint of finite resources, this work provides a novel, technically grounded, and immediately applicable pathway for the coordinated and intelligent management of regional bridge infrastructures.
Methodology
Digital twin model framework
The proposed framework, illustrated in Figure 1, is designed to realize the core functionalities of a DT for regional bridge networks through a four-layer architecture that ensures scalability and practical applicability. In this study, a regional bridge network refers to a group of bridges with similar structural forms and loading conditions—such as concrete girder bridges subjected to highway traffic—while allowing variations in design parameters including span length and support conditions. This definition delineates the scope within which the proposed TL methodology is developed and validated. Intelligent management framework for regional bridge network based on DT and TL.
The DT framework consists of four interconnected layers. The Physical Layer comprises the bridge network, traffic flows, and monitoring systems (SHM/WIM). The Data Connection Layer is responsible for data acquisition, standardization, and integration. The Model and Simulation Layer serves as the computational core, combining physics-based simulations with a Transformer-based encoder–decoder model to establish load–response and response–response mappings, which are efficiently transferred to data-scarce bridges via TL. The Service Application Layer converts model outputs into actionable services, including visualization, predictive maintenance, and integration with bridge management systems.
While the framework supports full-layer implementation, this study focuses on the Model and Simulation Layer, addressing the development of an accurate, adaptable, and transferable predictive model for network-wide response prediction. This focus provides a foundational technical component for deploying practical DT systems under resource-constrained conditions.
Gaussian distribution and K-S test
The fidelity of a DT depends critically on the realism of its input loads. To model the stochastic characteristics of traffic loads, including the multi-modal distributions of vehicle velocity and mass across different vehicle types, a Gaussian Mixture Model (GMM) is adopted. By combining multiple Gaussian components with associated weights, the GMM effectively captures non-Gaussian and heterogeneous traffic characteristics observed in real-world vehicle data (An et al., 2022; Dou et al., 2023).
For a given vehicle type, particularly when classified by axle number, the velocity distribution can be represented by multiple Gaussian components, making the GMM a suitable statistical model. Accordingly, the probability density function of vehicle velocity is expressed as:
Fitting effectiveness is quantified using the Kolmogorov–Smirnov (K–S) test, which assesses the discrepancy between
Transformer model
Accurate sequence-to-sequence mapping from vehicle loads to structural responses is fundamental to the predictive capability of a DT. Although recurrent neural networks (RNNs), such as LSTM and GRU models (Chen et al., 2019; Sherstinsky, 2020), have been widely used for temporal modeling, their ability to capture long-range dependencies is limited by sequential processing and gradient degradation. To address these limitations, this study adopts a Transformer architecture (Xu et al., 2023), whose self-attention mechanism enables efficient modeling of long-range temporal dependencies. As shown in Figure 2, the Transformer consists of stacked encoder–decoder layers incorporating positional encoding and multi-head attention. Architecture of the transformer (N = 3).
Relative position tokens are injected to capture time series information within the input sequence. In this study, we employ sine and cosine functions of different frequencies:
The attention mechanism mitigates long-range dependencies through the following equation:
The attention results propagate to the encoder’s subsequent feedforward module - a dual-layer network employing ReLU and linear activations in its first and second layers, respectively. The transformation is computed as:
Moreover, the system can simultaneously focus on different semantic aspects at multiple sequence points by employing parallel attention heads, thereby enhancing its ability to capture a wide range of contextual information.
The Transformer architecture surmounts key drawbacks of conventional recurrent networks by enabling (1) effective modeling of long‐distance dependencies, (2) parallel extraction of features, and (3) avoidance of gradient explosion and vanishing. Although LSTM and GRU offer partial solutions for these challenges, they still suffer from cumulative weight updates over sequential steps. Furthermore, by stacking self‐attention layers, Transformers can form very deep networks suited to training on datasets comprising millions, or even billions of samples.
Current Transformer implementations mainly exist in three architectural forms: (1) encoder‐only models for tasks like classification, (2) decoder‐only models for language modeling, and (3) encoder–decoder architectures for applications such as machine translation (Flah et al., 2021). Since bridge response prediction is inherently a sequence‐to‐sequence task, the method presented in this study utilizes an encoder–decoder architecture to capture the spatiotemporal dynamic correlations inherent in the multi-sensor data. The model consists of three encoder and three decoder layers, with four attention heads per layer, an embedding dimension of 256, and a feed-forward network dimension of 1024. ReLU activation is used in the feed-forward networks, and a dropout rate of 0.1 is applied for regularization. Sinusoidal positional encoding is employed to model temporal order, and a causal mask is applied to prevent information leakage from future time steps. The model is trained using the Adam optimizer (initial learning rate = 0.001, batch size = 50) with the warm-up learning rate schedule, and the same learning rate configuration (including the initial rate and warm-up schedule) is maintained during the fine-tuning phase of transfer learning. The best model is selected based on validation loss. The model was implemented in Python 3.9 using PyTorch 2.6.0 framework. All computations were performed on a workstation equipped with a 12th Generation Intel Core i5-12400F processor and an NVIDIA GeForce RTX 2060 GPU.
The loss function is the root mean square error (RMSE)
Three quantitative measures are adopted for model performance analysis: T
a
, the relative mean absolute error (RMAE), and the coefficient of determination R
2
. The formulations of these evaluation metrics are presented as follows:
Transfer learning
Having established a Transformer-based model for single-bridge response prediction, we extend this capability to the network scale using TL, thereby avoiding the need to retrain models from scratch for each bridge. TL is defined as a learning paradigm in which knowledge acquired from a source domain D
S
and task T
S
is leveraged to improve prediction performance in a target domain D
T
and task T
T
. Each domain DDD consists of two components:
Analogously, a task T is defined as:
By employing TL techniques, knowledge can be transferred across distinct domains to improve model performance in the target domain with limited data. The ultimate goal of TL is to learn a more accurate decision function in the target domain. The entire TL process is illustrated in Figure 3(a). Schematic of TL. (a) Fundamental concept; (b) Implementation in a neural network.
The type of transferable knowledge determines the choice of TL strategy. According to the knowledge representation, TL methods can be categorized into instance-based, feature-based, model-based, and relation-based approaches (Cai et al., 2019; Jang et al., 2019; Ritto et al., 2022; Weiss et al., 2016). In this study, a model-based TL strategy is adopted, as the proposed DT framework focuses on continuous response prediction rather than classification tasks. By transferring learned model parameters, accurate prediction performance can be achieved with limited target-domain data.
As illustrated in Figure 3(b), the TL-based model construction follows two main steps. First, a baseline neural network is trained using a data-rich source dataset to learn a representative load–response mapping. Second, selected initial layers of the baseline model are transferred to a new model for a similar target dataset, where their parameters are initialized from the baseline model and frozen, while the remaining layers are fine-tuned to obtain a well-fitted target model.
In the Transformer model, the first two layers of both the encoder and decoder are frozen during fine-tuning to preserve general temporal representations learned from the source domain, which means that the multi-head attention modules and feed-forward blocks within these frozen layers are identical between the source and target models. The remaining layers, including the third layer and output projection layers, remain trainable to adapt to the target task. In comparison to the Transformer, LSTM-based models (LSTM, BiLSTM, BiGRU) also use an encoder-decoder structure but operate differently due to their sequential nature. Freezing the first two layers of the LSTM encoder and decoder in the same way as the Transformer leads to poorer performance. This is because LSTMs rely heavily on the sequential flow of information through each layer, and freezing too many layers hinders the model’s ability to capture crucial temporal dependencies. Therefore, for LSTM-based models, only the first two layers of the encoder are frozen, allowing the remaining layers to adapt to the target task.
A 0–1 min–max normalization strategy was adopted with task-specific adaptations to accommodate different data types across tasks and bridges. For vehicle-to-response prediction, each vehicle-related input feature is normalized independently to account for differences in physical units, while structural responses are normalized globally. For response-to-response prediction, both inputs and outputs are globally normalized, with each bridge using its own normalization parameters to preserve intrinsic structural characteristics. During transfer learning, normalization parameters computed from the source task are used in pre-training, whereas those derived from the target task are applied during fine-tuning, ensuring domain-wise consistency and preventing information leakage. All normalization statistics are computed exclusively from the training data and reused for validation and testing. In addition, the output layer weights are initialized from the pre-trained model rather than randomly re-initialized, which preserves learned feature representations while allowing task-specific adaptation during fine-tuning.
Numerical validation
A numerical simulation was conducted to validate the proposed DT framework using a regional bridge network comprising three concrete beam bridges (A–C). Due to practical cost constraints, only Bridge A is equipped with SHM and WIM systems, while Bridges B and C are uninstrumented, reflecting a realistic monitoring scenario. Traffic loads with statistically consistent characteristics were generated from measured vehicle data and applied to finite element models to obtain corresponding structural responses. The initial simulation produced 1500 paired samples of vehicle loads and displacement responses at monitoring points. To improve model generalization, data augmentation was performed by injecting zero-mean Gaussian noise with a variance of 0.001, expanding the dataset to 4500 samples. Transformer-based response prediction models were then developed for the bridge network, including three vehicle-load-to-response scenarios and four response-to-response scenarios, to evaluate the effectiveness of transfer learning.
Vehicle loads and numerical model
Statistical analysis of vehicle
The stochastic vehicle loads for the simulations were derived from a 5-month WIM dataset from Bridge A, covering over six million records. Key load parameters, such as vehicle mass and velocity, were modeled using Gaussian Mixture Models (GMMs) and validated with the Kolmogorov-Smirnov test. Markov Chain Monte Carlo (MCMC) sampling ensured the generated loads reflected the empirical distributions. Detailed methodology is provided in our previous work (Zhao et al., 2025). This probabilistic load model was used as input for the finite element simulations of the three bridges.
Loading
Bridge A is a three-span continuous box-girder bridge with 25 m spans. Bridges B and C share the same five-cell narrow box-girder cross-section with transverse diaphragms; B is a single 25 m span, and C a 30 m span. The similar cross-sections enable TL-based predictive modeling. Vehicle load schematics are shown in Figure 4(a). The finite element models were built in ANSYS using linear elastic concrete and steel, with transient dynamic analysis via the full solution method and a linear solver. Load and monitoring scheme. (a) Schematic diagram of vehicle load loading on bridge B; (b) Sensor placement configuration of bridge A; (c) Sensor placement configuration of bridge B and C.
Given the transient dynamic analysis requirements and solid-element modeling complexity, a mesh size of 0.5 m was selected to achieve an optimal trade-off between accuracy and computational cost. Furthermore, in accordance with actual WIM statistics, the real-world traffic exhibits a highly imbalanced distribution of vehicle types—for instance, two-axle vehicles dominate at 74.07%, whereas three-, four-, and five-axle vehicles together account for less than 10%. If the simulated traffic load distribution strictly follows this actual lane-wise distribution, the overwhelming prevalence of two-axle vehicles would result in severely insufficient samples for other vehicle types during neural network training. This would cause the trained model to become heavily biased toward the dominant type, yielding high prediction accuracy for two-axle vehicles but poor performance for low-sample types (with R2 consistently below 0.5), thereby impeding a comprehensive investigation of load effects across different vehicle types. To mitigate this imbalance and ensure that the model achieves high prediction accuracy for all vehicle types, each lane was assigned an equal representation of five vehicle types, with 100 vehicles per type, amounting to 500 vehicles per lane. As the emergency lane seldom accommodates vehicles, it was removed from the study. Thus, vehicle loads were exclusively applied to the fast, middle, and slow lanes, with each bridge subjected to 1500 vehicle passages.
Several monitoring points were positioned at the bottom of small box girders to capture the spatiotemporal relationships among different measurement points. As illustrated in Figure 4(b), Bridge A is instrumented with seven measurement points: five in its central span and one on each end span. In contrast, the single‐span Bridges B and C each carry five monitoring points, as shown in Figure 4(c).
Response prediction based on loads
All vehicle‐loads were applied to the finite element models, yielding response for all three bridges. Moreover, response standardization was achieved through terminal zero-padding of each output sequence, resulting in a uniform sequence length of 177 for Bridge A, and 77 and 87 for Bridges B and C, respectively. Data size is determined by the length of the bridge. This standardization procedure generated time-aligned displacement matrices with consistent dimensions, enabling batch-processable tensor representations for neural network architectures.
Gaussian noise (μ = 0, σ 2 = 0.001) was applied to augment the dataset beyond the initial 1500 samples, which were limited by computational constraints. To prevent data leakage, data augmentation was strictly confined to the training set, while the validation and test sets consisted exclusively of original, non-augmented signals. The dataset was split at the vehicle-passage level, with 80% (3600 samples) assigned to training, 10% (450 samples) to validation, and 10% (450 samples) to testing, ensuring complete independence across all subsets. No time-based or random temporal splitting was used. Consequently, the training set contained both original and augmented samples, whereas the validation and test sets contained only original samples. This data partitioning and augmentation protocol was applied consistently to all experimental cases (Cases 1–10), with identical sample counts across cases. Experimental results indicated diminishing performance gains beyond approximately 3500 augmented samples. Accordingly, a final dataset of 4500 samples was adopted, which achieved optimal prediction accuracy without evidence of overfitting. We explicitly emphasize that the test set was not used at any stage of model training or transfer learning, ensuring a fully unbiased evaluation of model performance.
Three representative TL scenarios were investigated to assess the Transformer model’s cross-domain generalizability, encompassing both intra-bridge and inter-bridge adaptability:
Spatial response transfer within Bridge A from point 3 to point 2
Cross-structural transfer from Bridge A’s point 3 to Bridge B’s point 3 (continuous-span to single-span system)
Inter-single-span transfer between Bridges B and C (point 3 to point 3 with differing material properties)
The evaluation of the prediction results for the three cases using the TL.
Table 1 systematically evaluates displacement prediction performance under three cases. Specifically, three TL configurations are compared in every case: Transfer-1 (no layers frozen), Transfer-2 (the first two layers of both encoder and decoder frozen), and Transfer-3 (all three encoder/decoder layers frozen). Compared to Transfer-2, Transfer-1 yields comparable prediction accuracy but requires increased training time, while Transfer-3 significantly reduces training time at the cost of a noticeable drop in accuracy. Therefore, the Transfer-2 strategy is adopted in subsequent analyses as it optimally balances prediction accuracy and computational efficiency. Notably, this TL methodology (Transfer-2) significantly enhances conventional training approaches, improving prediction accuracy while reducing computational costs to below 80% of conventional training durations. This computational resource efficiency proves particularly advantageous in DT implementations for bridge networks, where numerous predictive models must be established.
Bridge response prediction based on response
Considering potential WIM system unavailability, this study focuses on mutual prediction between bridge measuring points. Displacement data from a single sensor are used to predict another sensor’s response (one-to-one scheme), offering a concise structure, high computational efficiency, and sufficient accuracy for engineering purposes. Many-to-one approaches provide minimal accuracy improvement at higher computational cost. Therefore, the one-to-one modeling approach is adopted for response prediction.
To further evaluate TL performance both within a single bridge and across different bridges, four cases are studied.
Explores knowledge transfer within the same bridge (Bridge A). The source task is to Predict measurement point 2 from point 1, while the target task is to predict the response of measurement point 3 based on the point 1.
Investigates knowledge transfer from the Bridge A to a different bridge (Bridge B). Its source task is to predict the response at measurement point 1 of Bridge B from point 1 of Bridge A, while the target task is to predict the response at point 3 of Bridge B from point 3 of Bridge A.
Examines cross-bridge transfer from Bridge A to Bridge B, sharing the same source task as Case 5. The target task, however, is to predict the response at point 3 of Bridge B from point 1 of Bridge A.
Source task is to predict the response at measurement point 1 of Bridge C from point 1 of Bridge A, while the target task is to predict the response at point 3 of Bridge C from point 1 of Bridge A.
Figure 5 systematically compares TL performance across four cases using multiple evaluation metrics, T
a
, RMAE and R
2
, as defined in equations (11)–(13). Panels (a) and (b) demonstrate that TL achieves satisfactory improvements in three cases; only in Case 6 does it fail to outperform a model trained from scratch. The R2 values across all cases are consistently close to 1, rendering the distinctions in panel (c) less perceptible. Cases 5 and 6 share the same source model: transferring from measurement point 1of Bridge A to point 1of Bridge B, but differ in their target transfers: case 5 predicts point 3 of Bridge B using point 3 of Bridge A, whereas case 6 predicts point 3 of Bridge B by point 1of Bridge A. Given that point one is located on the side span and point 3 on the midspan, these results suggest that patterns learned from side-span mutual predictions transfer more effectively to mid-span mutual predictions. Since this study is primarily concerned with the application of TL within a bridge-network DT framework, further discussion of Case 6 is omitted. The predictive performance of the four cases. (a) T
a
of different cases; (b) RMAE of different cases; (c) R
2
of different cases.
Experimental validation
Bridge A is outfitted with the SHM and WIM systems, which facilitates the pre-training of predictive models using simulated data. These models can subsequently be enhanced with measured data through TL techniques to forecast the bridge’s responses. Figure 6 depicts the layout of Bridge A together with its installed dynamic displacement sensors. Deployment of sensors on Bridge A. (a) Bridge A; (b) substructure of bridge A; (c) dynamic displacement sensor; (d) sensor deployment of the midspan cross section.
Downsampling was applied for TL accounting for the bridge’s fundamental natural frequencies below 5 Hz and the requirement for consistent data representation. The sampling rate was decimated by a factor of 10, resulting in an effective frequency of 20 Hz, closely matching that of the simulated dataset. The data-preprocessing is illustrated in Figure 7(b) is a data segment with a length of 177 after downsampling. Down-sampling and comparison of evaluation metrics. (a) Displacement response of bridge A; (b) sample of down-sampling; (c) T
a
before and after TL; (d) RMAE before and after TL; (e) R
2
before and after TL.
The network underwent an initial pre-training phase utilizing 4500 simulated load–response datasets. Subsequently, an extra 4500 measurement sets were utilized for model fine-tuning within the Transformer architecture. Specifically, data from measurement point one on Bridge A were employed to forecast displacements at measurement point 3. Figure 7 illustrates a comparison of key evaluation metrics between the model trained solely on real data and the transfer-learning model. The transfer-learning method resulted in a significant reduction in both the T a (0.0035 to 0.0030) and the RMAE (0.1252 to 0.1097), alongside a marked improvement in the R 2 (0.9963 to 0.9995). These findings indicate that pre-training on simulated datasets considerably enhances the predictive accuracy of the Transformer model concerning actual measurements, successfully mitigating the constraints imposed by scarce field measurement data. It should be emphasized that the experimental validation in this study primarily serves as a proof of concept for the sim-to-real transfer learning paradigm on a single instrumented bridge. Due to the absence of monitoring systems on other bridges within the current network, cross-bridge transfer between real-world structures could not be experimentally validated, which is acknowledged as a limitation of this work and an important direction for future field investigations.
Discussion
Zero padding
In previous studies on inter-bridge TL for predicting vehicle-load responses, it was observed that the output data points varied across different bridges, with bridge A yielding 177 data points, bridge B 77, and bridge C 87. Consequently, successful transfer of response predictions was accomplished by adjusting the output dimensions during the TL process.
Furthermore, by modifying the outputs of bridges B and C to a value of 177, which involves the implementation of zero padding, TL can be effectively utilized between the bridges to enhance the predictive model. Table 2 presents a comparison of the two strategies, which are cases 2 and 3 discussed in Section 3.2, respectively. The table reveals that: (1) Both output-layer modification and zero-padding substantially enhance predictive accuracy through the process of TL. (2) While zero-padding results in enhanced accuracy, it also leads to prolonged training times. This trade-off is attributed to the fact that modifying the output layer requires the retraining of only a limited number of model parameters, in contrast to zero-padding, which increases the total number of parameters and consequently enhances precision, albeit at the cost of increased computational time. Comparison of changing output layer and zero padding.
Comparison with other neural networks
A rigorous comparative analysis was conducted between the Transformer, LSTM, BiLSTM, and BiGRU networks under identical conditions. This assessment was carried out across three separate scenarios.
Investigates the cross-bridge transfer of vehicle load-to-response models. Its source task is to predict the response at measurement point 3 of Bridge A based on vehicle loads, while the target task is to predict the response at point 3 of Bridge B based on vehicle loads.
Examines the cross-bridge transfer of response-to-response models. Its source task is to predict measurement point 1 of Bridge B from point 1 of Bridge A, while the target task is to predict point 3 of Bridge B from point 3 of Bridge A.
Focuses on the simulation-to-real transfer within Bridge A. Its source task is to predict measurement point 3 from point 1 based on simulated data, while the target task is to predict point 3 from point 1 based on measured data.
Figure 8(a) compares the performance of Transformer, LSTM, BiLSTM, and BiGRU models under Case 10. The Transformer accurately reconstructs the full displacement trajectory, capturing both primary peaks/troughs and secondary oscillations. In contrast, the recurrent models show limited dynamic modeling ability, resulting in amplitude miscalibrations and phase shifts. These observations are quantified in Figure 8(b): while recurrent networks exhibit peak errors near 1 mm, the Transformer maintains sub-millimeter precision (<0.1 mm) throughout. The results highlight the Transformer’s superior temporal modeling, enabled by its attention mechanism to resolve both macro-waveform patterns and micro-dynamics—a key advantage over sequentially constrained recurrent architectures. Prediction performance and evaluation metrics of Case 10. (a) Prediction performance of four models; (b) difference between predicted and actual value; (c) T
a
under different models; (d) RMAE under different models; (e) R
2
under different models.
The evaluation results of predictive performance for Case 10 are illustrated in Figure 8. Experimental data demonstrate that the Transformer architecture exhibits significant performance superiority. In terms of key evaluation indicators, the model’s Ta and RMAE were close to 20% of the baseline model group, and R2 reached 0.99, which was significantly better than the comparison model group (less than 0.8). Notably, the implementation of TL strategies further enhances the prediction accuracy of the Transformer, whereas conventional models exhibit varying degrees of performance degradation under equivalent conditions.
This performance disparity can be attributed to differences in architectural capacity, parameterization characteristics, and the effectiveness of systematic hyperparameter tuning. To ensure a fair comparison, all models were optimized using a comprehensive grid search. For the Transformer, the number of layers {2, 3, 4}, attention heads {2, 4, 8}, model dimension {128, 256, 512}, and feed-forward dimension {512, 1024, 2048} were explored. For recurrent architectures (LSTM, BiLSTM, and BiGRU), the hidden size {32, 64, 128} and number of layers {2, 3, 4} were evaluated. The final configurations correspond to Pareto-optimal trade-offs between prediction accuracy and training efficiency, with recurrent models consistently converging to hidden_size = 64 and num_layers = 3. Importantly, these hyperparameter settings were kept fixed during fine-tuning to isolate the impact of transfer learning strategies from that of hyperparameter adjustments.
Evaluation metrics for comparing different models of three cases.
The performance metrics for the three operational scenarios presented in Table 3 indicate that, under the tested settings, the predictive accuracy of conventional models remains largely stable or exhibits a slight decline. Although these models achieve lower computational costs, their prediction accuracy does not meet the high-precision requirements of the cross-bridge prediction tasks considered in this study. This result suggests that, within the scope of our experiments, the inherent parameter limitations and architectural designs of the conventional models hinder effective knowledge transfer across bridges. The observed trade-off between computational efficiency and prediction accuracy highlights the importance of architectural capacity.
Data scaling effects
This section investigates the influence of training‐set size on transfer‐learning efficacy and compares the predictive performance of four neural‐network architectures. Focusing on Case 9 (Table 3) and using the T
a
metric for accuracy assessment, Figure 9 illustrates how T
a
evolves for four models as the number of training samples increases from 900 to 4500. Data Scaling Effects on different models. (a) Transformer; (b) LSTM; (c) BiLSTM; (d) Bigru.
Transformer: T a decreases consistently with more data, demonstrating significant gains in precision. However, the marginal benefit of TL diminishes as sample size grows, indicating diminishing returns once ample target-domain data are available.
Recurrent Models (LSTM, BiLSTM, BiGRU): These architectures display marked data insensitivity, with T a plateauing around 0.03 regardless of dataset size. TL yields a slight accuracy improvement at 20% data volume but degrades performance as more samples are added.
In summary, TL markedly accelerates convergence and enhances accuracy for the Transformer under limited-data conditions, but its advantage wanes with larger datasets. In contrast, recurrent models derive little to no benefit from additional data or TL, reflecting inherent limitations in their capacity to leverage transferred knowledge.
Implications for a full-scope digital twin
This study establishes a core predictive capability for cross-bridge response, recognizing that a fully realized Digital Twin also requires geometric modeling, semantic interoperability, and management-system integration (Grieves and Vickers, 2017; Tao et al., 2019). The developed model serves as a foundational component: its outputs can animate Building Information Models with dynamic behavioral data, while its physics-grounded load-to-response mapping provides an intrinsic semantic layer for generating code-based alerts. Future work should integrate this predictive core into a comprehensive BIM environment, develop an ontology-based platform for semantic interoperability, and create robust APIs to connect with Bridge Management and Geographic Information Systems. By providing both a key enabling technology and a clear technical pathway, this work bridges high-fidelity predictive modeling with the operational needs of regional bridge networks, supporting a shift toward predictive, data-driven asset management.
Conclusion
This study presents a transfer learning (TL)-based digital twin (DT) framework for regional bridge networks, aiming to address the prevalent lack of structural health monitoring (SHM) systems and the need for intelligent operational optimization. Three key scenarios are examined: (1) predicting displacement responses from vehicle loads; (2) cross-bridge response estimation using data from other bridges in the network; and (3) transferring knowledge from simulation to real-world measured data. These cases demonstrate the framework’s robustness to disturbances and its potential for enhancing bridge network maintenance. The study also investigates zero-padding and output adjustment issues in TL, compares the performance of Transformer against LSTM, BiLSTM, and BiGRU models, and analyzes the impact of data volume on TL outcomes.
The experimental findings yield five critical insights for TL applications in regional bridge networks: (1) For the Transformer model, TL delivers a dual advantage: it simultaneously enhances prediction accuracy and reduces the time required to develop individual models. While computationally more intensive than simpler recurrent networks, this efficiency gain within the high-performance Transformer framework is crucial for the practical deployment of a network-wide DT, as it directly lowers the cost of scaling the system across many bridges. (2) Transferring from simulated data to real measured data improves predictive accuracy and addresses the issue of occasionally insufficient monitoring data. (3) In bridge network applications, TL may involve two tasks with differing output dimensions. In such cases, it is essential to determine whether to employ zero-padding (for higher accuracy) or to modify the network architecture (for greater efficiency). (4) Transformer models demonstrate superior performance compared to recurrent architectures. These traditional models tend to exhibit lower accuracy, and the application of TL does not improve their predictive performance unless the dataset is exceptionally small. (5) When the dataset is sufficiently large, it may be appropriate to reduce the amount of data in TL to achieve an optimal balance between accuracy and efficiency.
The developed DT framework leverages structural monitoring data from a limited number of instrumented bridges to significantly enhance the intelligent management of regional bridge networks. By integrating Transformer-based response prediction with TL, the proposed approach not only improves predictive accuracy but also substantially accelerates the development of bridge-specific models, demonstrating clear advantages for practical, resource-constrained network-level applications.
While the proposed framework demonstrates effective knowledge transfer within simulated regional bridge network and from simulation to real data on a single bridge, its validation across multiple real bridges in a network remains an essential next step for full-scale field deployment. Addition, it should be noted that the current framework is validated on concrete box-girder bridges subjected to similar traffic loading conditions, where structural forms and dominant response mechanisms are broadly consistent. For bridge typologies with fundamentally different structural systems—such as arch or cable-stayed bridges—direct transfer using the same pretrained model may require additional adaptation to account for distinct load-transfer mechanisms and dynamic characteristics. Future research will therefore focus on extending the proposed framework to heterogeneous bridge types through bridge-type-aware transfer strategies, as well as exploring TL-based anomaly detection to further enhance the completeness and robustness of regional bridge digital twin systems.
Footnotes
Acknowledgments
The authors acknowledge the financial support provided by Major Science and Technology Project of Yunnan Province (202502AD080007), Doctoral Fund Support Project of University of Jinan (XRC2563), Research Fund for Advanced Ocean Institute of Southeast University (GP202409) and the National Natural Science Foundation of China (52408343).
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Financial support was provided by Major Science and Technology Project of Yunnan Province (202502AD080007), Doctoral Fund Support Project of University of Jinan (XRC2563), Research Fund for Advanced Ocean Institute of Southeast University (GP202409) and the National Natural Science Foundation of China (52408343).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data and code that support the findings of this study are available from the corresponding author upon reasonable request.
