Abstract
To rate uncertainties within anomaly detection course for large span cable-supported bridges, a probabilistic approach is developed based on confidence interval estimation of extreme value analytics. First, raw signals from structural health monitoring system are pre-processed, including missing data imputation using moving time window mean imputation approach and thermal response separation through multi-resolution wavelet-based method. Then, an energy index is extracted from time domain signals to enhance robust of detection performance. A resampling-based method, namely the bootstrap, is adopted herein for confidence interval estimation. Four confidence levels are defined for the anomaly trend detection in this study, namely 95%, 80%, 50%, and 20%. Finally, the effectiveness of the proposed anomaly trend detection methodology is validated by using in-situ cable force measurements from the Nanjing Dashengguan Yangtze River Bridge. As a result, the four-level anomaly detection triggers are determined by using the confidence interval estimation based on cable force measurements in 2007, which are 58,671, 48,862, 42,499 and 39,035, respectively. Subsequently, three cases are presented, which are spike detection, overloading vehicle detection and snow disaster detection. Through the spike detection, it is verified that energy index is capable to tolerate signal spikes. Three overloading events are simulated to conduct overloading vehicle detections. As a result, the three overloading events are detected successfully associated with different confidences. Snow disaster is detected with a more than 80% confidence based on the field measurements during the snow storm time window.
Keywords
Introduction
Large span bridges face potential threats every day owing to sophisticated outer loadings and aggressive environments (Liu et al., 2021). Although regular inspections are scheduled, it is difficult to deal with sudden incidents timely, where the sudden incidents include significant changes in operational loadings (e.g., overloading vehicles) and structural capacity (structural damage) (Liu et al., 2020). On the other hand, structural health monitoring (SHM) systems are adopted to provide real-time screenings of structural responses, outer loadings and environmental factors. Moreover, SHM systems are often devised and installed on large-scale civil infrastructures worldwide, especially large span bridges, to afford quantitative information for further analytics (Fujino et al., 2019; Xu and Xia, 2011; Ou and Li, 2010). This makes it possible to conduct timely anomaly detection for large span bridges based on real-time measurements from SHM systems to ensure the operational and structural safety.
In order to avoid sudden structural failures, anomalous events, including sudden structural damages and accidents, are supposed to be detected as early as possible. Since SHM data lay a solid foundation for anomaly detection investigations, related studies were broadly implemented all over the world within the last few decades. Initially, dynamic fingerprints of structures (e.g., frequency) were used to detect structural damages since the dynamic characteristics were stable structural parameters, which in theory reflect changes in condition of structures. The natural frequencies, damping values, mode shapes, and curvature mode shapes were explored to detect structural damages (Das et al., 2016; Pandey et al., 1991; Salawu, 1997). Although the effectiveness of dynamic parameter-based damage detection methods was verified in both theory and laboratory, challenges posed when applied to large-scale sophisticated structures. The most two prime factors are the contamination of noise and influence of environmental variations (Peeters et al., 2001). It is proved that variations of dynamic indexes caused by temperature effects were larger than those induced by structural damages for a cable-stayed bridge (Xu and Wu, 2007). Peeters et al. (2001) suggested to distinguish thermal effects from damage events when detecting damages.
In view of the limitations of dynamic characteristics in practical applications, researchers explored to use static indexes (e.g., strain) to carry out anomaly detection for infrastructures. Yu et al. (2016) took advantages of deflection time history of a beam for damage detection using wavelet transform and Lipschitz exponent, where the effectiveness of the method was verified by a model experiment. Hua et al. (2009) used the changes in cable forces for damage detection of a cable-stayed bridge based on the fact that damage occurring in the bridge girders would cause a redistribution of forces in stay cables, where the validity of the approach is illustrated by numerical studies. Ni et al. (2020) adopted expansion joint displacements to detect damages of expansion joints under Bayesian context, where in-situ measurements acquired from a cable-stayed bridge were employed to validate the effectiveness of the proposed method. Similar to dynamic characteristics, static parameter-based anomaly detection is affected by environmental factors, especially temperature effects. Plenty of methodologies for thermal effect modeling and separation are developed, such as regression models (Kromanis and Kripakaran, 2014; Ren et al., 2019), wavelet transform (Xu et al., 2020a; Wu et al., 2014), blind source separation (Zhu et al., 2018), numerical model-based methods (Xu et al., 2019; Zhou and Sun, 2019), and Bayesian dynamic linear model (Wang et al., 2019). Considering the influence of thermal effects to static parameter-based anomaly detection, various temperature-driven anomaly detection approaches are developed. Xu et al. (2020b) proposed a two-level anomaly detection for a suspension bridge using girder deflection measurements, where multi-resolution wavelet-based method was used to separate thermal responses from recorded signals. Zhu et al. (2019) introduced the blind source separation to improve the performance of moving principal component analysis for anomaly detection by using strain data, where three cases were used to verify the effectiveness of the proposed methodology. Tome et al. (2020) and Fan et al. (2020) presented a strategy for early damage detection for a large cable-stayed bridge based on multivariate cointegration analysis and statistical process control, where the effects of environmental and operational variations were suppressed using cointegration analysis. Huang et al. (2020) proposed a strain-based anomaly detection method for bridge main girders, where the correlation relationship between temperature and strain of the main girder was established.
Majority of the aforementioned anomaly detection methodologies are based on deterministic manners, where the uncertainty inherent is not taken into account. However, uncertainties inevitably exist in the monitoring data induced by environmental variability, measurement noise and the estimated parameters. In this context, confidence interval estimation, instead of the point estimation, is devoted to determining the trigger for anomaly detection to consider the uncertainty inherent in the detection course. Compared with the point estimation, the confidence interval estimation specifies instead a range within which the parameter is estimated to lie (DiCiccio and Efron, 1996; Efron, 1987), which has been widely employed to between-subject designs (Loftus and Masson, 1994), clinical research (Cho et al., 2020; Schober and Vetter, 2020), distribution locational marginal price (Wei et al., 2020), etc. To the best of the author’s knowledge, however, no investigation has been observed to investigate the anomaly detection of large span bridges in the context of confidence interval estimation of extreme value analytics.
In this paper, a probabilistic anomaly detection approach for large span bridges is developed in the context of confidence interval estimation of extreme value analytics. The measured static response data from SHM systems are first pre-processed, where missing data are filled to achieve the continuity and thermal response is separated. Subsequently, the signal energy within a determined time window is extracted as the index to detect anomalous trend. The trigger associated with confidence coefficient for anomaly detection is determined based on confidence interval estimation of generalized Pareto distribution (GPD). Finally, the effectiveness of the proposed anomaly detection method is verified by using cable force monitoring data of the Nanjing Dashengguan Yangtze River Bridge, where three cases are presented.
Methodology for probabilistic anomaly trend detection
The general flowchart for probabilistic anomaly trend detection is demonstrated in Figure 1. In the data pre-processing, raw signals from SHM systems are first processed using the moving time window method for missing data imputation, which enhances the continuity of recorded signals (Kalaycioglu et al., 2016; Nevalainen et al., 2009). Subsequently, the multi-resolution wavelet-based approach is adopted to address thermal effects based on the distinguished frequency bandwidths (Ni et al., 2012; Xu et al., 2020a). Once obtaining the pre-processed signals, the anomaly detection index is extracted as the energy within a certain time window, where two principal parameters (i.e., length of window and number of overlaps) need to be determined. Based on relatively long-term monitoring data, the GPD, one of extreme value analysis, is used to determine the trigger for anomaly trend detection, where confidence interval estimation is employed to rate the uncertainty. Flowchart for probabilistic anomaly trend detection.
Data pre-processing
Data missing is a common issue in data mining of SHM signals, leading to information loss or even algorithm failures. According to the scale of missing data, methods subject to missing data imputation could be classified into two categories: large portion missing data imputation and small portion discrete missing data imputation. The maximum likelihood (Enders, 2001), artificial neural network (Martinez-Luengo et al., 2019), Bayesian inference approach (Lai and Kuok, 2019) etc., are preferred to be used in the case of large portion missing data. For the small portion discrete missing data, the moving time window imputation (Hawthorne et al., 2005) is mostly employed in practice due to its brevity. During the course of pre-processing, only small portion missing data were observed. Thus, the moving time window mean imputation is adopted to address the data missing. The imputation value x
k
subject to the missing location k is calculated as
The multi-resolution wavelet-based method is applied to separate thermal responses from recorded data on the foundation of their distinguished frequency bandwidths. The decomposition level n is determined by
Index extraction
In existing static response-based anomaly detection investigations, physical quantities (e.g., deflection) and their changes in form (e.g., cointegration residual) were always adopted as anomaly detection indexes owing to their straightforwardness and practicability (Tome et al., 2020; Xu et al., 2020b). However, these indexes are sensitive to signal spikes which are common phenomena for measurements of SHM systems, leading to false detection. Thus, an energy anomaly detection index is extracted from the measured data, which has tolerance to signal spikes.
The anomaly detection index is defined as the average energy within a determined time window, which is expressed as
There are two problems when determining the energy index in practical applications, including determination of the overlap and length of window. Considering the extracted indexes will be applied to predict anomaly detection trigger by using GPD that requires samples to be independent identically distributed, none overlap is employed. The length of window will influence the effectiveness of the anomaly detection. If the length of window is too short, the tolerance of the energy index to spikes is insufficient, while if the length is too large, the effectiveness of anomaly detection is weakened owing to the peak clipping of anomalous signals. In this context, the length of window should be determined by trade-off discussions subject to the time attribute of anomalous events. In theory, the optimal length of window is the minimum duration of all the potential anomalous events, which could be defined as
Trigger determination
In general, the trigger is the extreme value of the defined index (Liu et al., 2015). In this study, GPD is used to estimate the extreme values of indexes. Compared with the typical block maximum method, GPD takes full advantages of limited available information (Deng et al., 2018).
The cumulative GPD of a variate x takes the form
To determine the quantile value corresponding to a T-year return period, data are required to be resampled to meet the request of GPD analysis in independent identically distribution. In this paper, maxima of the index within 24h are adopted.
Based on the extracted daily maxima, an appropriate threshold is then determined for the GPD analytic. If the threshold is set too high, the number of out-of-sample is small, resulting in statistical uncertainty. On the other hand, if the threshold is too low, the excess quantity differs significantly from the maximum value, leading to a biased estimator. The mean excess function of the GPD is introduced to determine a proper threshold, which is
To consider the uncertainty in the parameter determination process, the confidence interval estimation is used to estimate probabilistic triggers (Kysely, 2010). A resampling method, the bootstrap, is adopted for interval estimation (Castillo and Hadi, 1997). The specific steps using the bootstrap to estimate the shape parameter ζ are summarized as follows (Chen et al., 2017): (1) Estimated shape and scale parameters (2) Form the estimated standard error of (3) Ggenerate B bootstrap samples each with size n from GPD ( (4) For each bootstrap sample, estimate ζ using the point estimation approach as (5) The equal-tailed 100 (1-α)% confidence interval for ζ is
The procedure of estimation of scale parameter confidence interval is similar.
Finally, the quantile value is estimated as the anomaly detection trigger. Within the reference period of T-years, the cumulative probability p corresponding to a certain guarantee rate P
r
is
The quantile value subject to a 100 (1-α)% confidence coefficient is
Generally, the trigger is defined as the quantile value corresponding to a 95% guarantee rate within a 100-years reference period (i.e., a return period of 1950 years).
Case study
The Nanjing Dashengguan Yangtze River Bridge and its SHM System
The Nanjing Dashengguan Yangtze River Bridge, a vital transportation link, crosses the Yangtze River and connects Liuhe District with Nanjing City, whose site plan is shown in Figure 2. The steel cable-stayed bridge has a total length of 1288 m, where the main span is 648 m. The configuration of the Nanjing Dashengguan Yangtze River Bridge is shown in Figure 3. The bridge deck is supported by a total of 167 stay cables, and each cable consists of 109–241 wires of a 7 mm diameter. Site plan of the Nanjing Dashengguan Yangtze River Bridge. Configuration of the Nanjing Dashengguan Yangtze River Bridge.

A SHM system was devised and installed to monitor the environmental, loading, and response information in the second year after the completion of the bridge in 2005. A total of 599 sensors were employed, including anemometers, temperature sensors, vehicle weighing systems, anchor load cells, and others. The cable forces of all the 167 stay cables were recorded by using the JC1-type anchor load cells with a sampling frequency of 10 Hz and a relative measurement error of ± 1%. In this paper, cable force measurements of the stay cable NJX21 as highlighted in Figure 4 are adopted for the anomaly trend detection analysis. The studied stay cable NJX21.
Data pre-processing
Raw cable force measurements of the three stay cables (i.e., NJX21, NJX20, and NJX19 as shown in Figure 4) in 30 s are plotted in Figure 5, where the data missing phenomenon is widespread. As discussed earlier, the moving time window mean imputation is employed to address the data missing issue with a window length of 2 s. The cable force signals after missing data imputation are plotted in Figure 6, which achieve a good continuity. Raw cable force signals of the three stay cables in 30s. Signals after missing data imputation.

Cable force signals of the stay cable NJX21 in 24 h, as shown in Figure 7, are taken as the example to explain the thermal separation procedure using the multi-resolution wavelet-based approach. The decomposition level is set as 27 according to equation (2), and the wavelet basis function is “coif5”. The daily thermal effect lies on the 19th detail level, whose energy, as shown in Figure 8, is not quite large because the daily temperature variation is limited to 4°C. The extracted temperature-induced cable force is plotted in Figure 7, whose variation trend is in line with that of the monitored temperatures. Figure 9 is the cable force signal without the influence of thermal actions, which is used for index extraction, trigger determination and anomaly trend detection. Thermal effect separation procedure. Energy in each detail level. Signals after thermal effect separation.


Index extraction
The length of window should be determined based on the minimum duration of anomalous events. Considering the relatively long duration of structural damages (from the instant of occurrence of damage to maintenance), damage-induced events are not the controllable cases for determination of window length. In this study, overloading vehicle events are adopted as the dominated cases to determine the length of window.
The overall length of the Nanjing Dashengguan Yangtze River Bridge is 1288 m, and the significant positive influence line of the studied stay cable shown in Figure 10 almost crosses 905 m of the bridge. Moreover, based on the statistical results of heavy vehicle speeds in this bridge as shown in Figure 10, the average speed of heavy vehicles is 52.6 km/h. The average active time window of overloading vehicle events is calculated as 61.9 s, that is, 619 data points for the sampling frequency of 10 Hz. In this paper, the length of time window is determined as 60 s subject to the sampling frequency of 10 Hz. Duration time window discussion subject to overloading vehicle events.
According to equation (3), the energy anomaly detection index along the timeline is extracted from the time history as shown in Figure 9, which is plotted in Figure 11. As requested by the GPD analysis in independent identically distribution, daily maximum index highlighted in Figure 11 is adopted for the trigger estimation. The extracted daily maximum indexes in 2007 are plotted in Figure 12, which are the database for the GPD analysis. Extracted energy index in March 5, 2007. Daily maximum energy index sequence in 2007.

Trigger determination
To predict the anomaly detection trigger, the threshold needs to be first determined based on the characteristics of the mean excess function and standardized residual. According to equations (6) and (7), the mean excess function and standardized residual derived from the daily maximum indexes in 2007 are plotted in Figure 13. It is observed that if the threshold is 17 000, the corresponding standardized residual approaches its lowest point, and the mean residual life plot approximately follows a straight line. Therefore, the optimal threshold of the daily maximum energy indexes for the GPD discussions is set as 17 000. Mean excess function and standardized residual.
Following the method of confidence interval estimation (i.e., the bootstrap), shape, and scale parameters subject to 95%, 80%, 50%, and 20% confidence coefficients are estimated as (−0.1782, 8964.1), (−0.2346, 8289.2), (−0.285, 7728.5), and (−0.32, 7362), respectively. Histograms of the exceedance and GPD fitting results corresponding to the four confidence coefficients are plotted in Figure 14. Pareto distribution with various confidence coefficients.
According to equation (9), the triggers of anomaly trend detection with 95%, 80%, 50%, and 20% confidence coefficients are predicted as 58,671.47, 48,862.53, 42,499.48, and 39,035.2, respectively.
Case study 1: spike detection
Signal spikes are always observed in measured signals from SHM systems, which are caused by electrical transients in voltage, current, or transferred energy in an electrical circuit. Spikes are not signals of interest for anomaly detection, and even coupled with signals induced by anomalous scenarios, resulting in false detection. For instance, cable force signals of the studied stay cable on 17 Jan. 2008 are influenced by the spike as shown in Figure 15. False detection is observed if the index of absolute cable force value is employed. The specific detection result is illustrated in Figure 16 by using the absolute cable force index. Based on Figure 16, anomaly event is detected with more than 95% confidence, however, the actual situation is that the anomaly detection is triggered by the spike signal, and the structure operated as usual. Measured cable force signals on Jan 17, 2008. Detection result using absolute cable force index.

In addition, the cointegration residual of the monitored cable force served as index to detect spikes. The details using cointegration techniques to detect anomalies are presented in our previous paper (Fan et al., 2020). The detection result is shown in Figure 17. As a result, the spike triggered the anomaly detection, resulting in a false alarm. Detection result using cointegration residual.
Considering the characteristics of spikes in short duration and large absolute value, the energy index is proposed in this paper to improve the tolerance of index to spikes. Similarly, the raw signals of the cable force shown in Figure 15 are used to conduct the anomaly trend detection. The detection result is demonstrated in Figure 18. Compared with the absolute cable force index and cointegration residual, the energy index has more tolerance to spikes. Thus, when using the energy index, it is not necessary to delete spikes in the data pre-processing section. Detection result using energy index.
Case study 2: overloading vehicle detection
Since no confidential overloading vehicle scenarios were recorded, simulated cases were introduced in this study. It is assumed that three 100-ton overloading vehicles simultaneously go through the bridge with a constant speed of 60 km/h, and the corresponding simulated cable force signal of the stay cable NJX 21 is shown in Figure 19. Then, the signal is merged into the real-time monitored cable force data at instant T on Jan 22, 2008 to simulate the event that the overloading vehicles involve in the normal operational traffic flow. The simulated cable force signals after data pre-processing are shown in Figure 20, where the simulated overloading event is highlighted. Moreover, it is observed that during the time window between 3:00 a.m. and 6:00 a.m., signals are concentrated on the horizontal axis of zero since there are rare traffic volumes during this time window, which further verifies the effectiveness of the proposed thermal response separation methodology. Simulated cable force signal with three 100-ton vehicles going through the bridge. Simulated cable force signals after data pre-processing.

Similarly, the energy indexes are calculated and plotted in Figure 21 together with triggers of point estimation and the four confidence levels. The details regarding calculation of the trigger using point estimation method is presented in our previous paper (Xu et al., 2020b). The trigger derived from the point estimation is lower than that of 20% confidence level, which is prone to raise false detection. Based on Figure 21, the anomalous event is detected by using the point estimation, while it is detected with more than 95% confidence using confidence interval estimation. Detection result of the three 100-ton vehicles through the bridge.
Additional two overloading events are assumed herein, which are two 100-ton vehicles and single 100-ton vehicle going through the bridge, respectively. The detection results of the two overloading events are shown in Figure 22 and Figure 23. Two 100-ton vehicles through the bridge is detected with a more than 80% confidence, and the single 100-ton vehicle through the bridge is detected with an almost 20% confidence. As a result, the level of detection confidence increases with the weight of overloading vehicles. Although the two overloading cases could be detected by using the point estimation, the detection results from the confidence interval estimation give more information for decision-makings. Detection result of the two 100-ton vehicles through the bridge. Detection result of the single 100-ton vehicle through the bridge.

Case study 3: snow disaster detection
Nanjing city suffered from a heavy snow storm at the end of January in 2008. With the accumulation of snow on the pavement of the Nanjing Dashengguan Yangtze River, the bridge gradually carried extra snow loads. The recorded cable force data of the stay cable NJX21 during the snow storm time window (Jan 26, 2008) are plotted in Figure 24. After the data pre-processing and index extraction, the energy index on Jan 26, 2008 is plotted in Figure 25. Raw cable force signal during the snow storm. Detection result of the snow storm.

Based on Figure 25, the snow storm induced anomalous scenario is detected via the point estimation, and it is detected with a more than 80% confidence using confidence interval estimation. The anomalous scenario resulted from two facts: (1) the bridge took extra snow loads with the accumulation of snow on the pavement; and (2) the traffic volume was extremely large when the bridge was re-opened since the short-term shutdown of the bridge generated large number of waiting vehicles. In view of the structural safety, the bridge was shut down for the whole day on 27 Jan. 2008.
Conclusions
In this paper, a probabilistic anomaly trend detection method is developed for large span cable-supported bridges, where energy index is proposed to achieve robust detection performance and confidence interval estimation is used to measure the uncertainty within the anomaly detection procedure. The concluding remarks are summarized as follows: (1) Data pre-processing (i.e., missing data imputation and thermal response separation) is a critical step in the anomaly detection process. Moving time window mean imputation is adopted for missing data imputation, and the multi-resolution wavelet-based approach is applied to separate thermal effects from the recorded structural response signals. (2) An energy index in the time domain is proposed for the probabilistic anomaly detection. Compared with the absolute value-based index, the energy index has more tolerance to spikes. Thus, spike detection is not necessary in the data pre-processing when using the energy index. (3) Confidence interval estimation is used to predict triggers with different confidences in the GPD analysis for anomaly detection, where four confidence levels (i.e., 95%, 80%, 50%, and 20% confidence) are defined. (4) The effectiveness of the anomaly detection methodology is verified by using field cable force measurements of the Nanjing Dashengguan Yangtze River Bridge. The triggers with different confidences are derived based on the measurements in 2007. Three cases are presented in this study, which are spike detection, overloading vehicle detection, and snow disaster detection. The spike is not detected when using the energy index. The simulated overloading events are all detected, where different overloading vehicles have different confidence levels. The snow disaster is detected on Jan 26, 2008 with more than 80% confidence.
In this study, the single index, cable force, is used for the anomaly detection discussions. In the future, a multi-index detection method is promising in the practical applications to ensure the structural and operational safety of bridges.
Footnotes
Acknowledgments
The authors thank the anonymous reviewers for their constructive comments that greatly improved the quality of this manuscript.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research reported in this article was supported in part by the Fundamental Research Funds for the Central Universities under Grant No. 2242021k30034, the Natural Science Foundation of Jiangsu Province under Grant No. BK20181278, Transportation Science Research Project in Jiangsu under Grant No. 2019Z02, and Special Research Fund for Academicians of China Communications Group under Grant No. YSZX-03–2020-01-B.
