Bridge structural damage identification model based on data preprocessing and real-time monitoring data

Abstract

With the rapid advancement of the social economy and the rapid increase in the number of transportation vehicles, bridge health monitoring has become increasingly important. Using information technology to analyze data and identify damage to bridge structures can effectively ensure bridge safety, thereby avoiding traffic accidents. The current data analysis and damage identification methods have limitations, including poor real-time performance and low accuracy. An improved support vector machine algorithm is developed, in this study, for real-time monitoring data classification. Moreover, a bridge structure damage identification model is proposed based on improved support vector machine and data preprocessing. When compared and analyzed alongside other algorithms, it was found that the accuracy and precision of the improved support vector machine algorithm were 97.4% and 95.7% respectively, outperforming the other algorithms. Subsequently, a performance comparison analysis was conducted between the proposed recognition model and other models. Results denoted that the mean running time of the model was 38.1 s, outperforming the comparison models. The results demonstrated that the improved support vector machine algorithm and recognition model proposed in the study are effective and can help improve the analysis efficiency of bridge monitoring data and the accuracy of identifying bridge structural damage, providing a theoretical basis for bridge structural damage identification.

Keywords

PCA SVM PSO bridge structure damage identification

Introduction

As a key component of the transportation network, bridges are subjected to natural factors such as temperature changes and vehicle loads for a long time, leading to structural aging, material degradation, and surface cracking, which in turn affect their safety and traffic capacity.¹ Any damage or collapse of a bridge can seriously affect traffic and economy, and even threaten public safety, causing profound impacts on society. Therefore, identifying damage to bridge structures is particularly important.² Nevertheless, with the rising in the amount and types of bridge data samples, the current data analysis methods and damage identification methods have problems such as poor real-time performance and low accuracy.³ Although many experts have conducted relevant research, the results are still unsatisfactory. For example, to address the limitations of methods for identifying damage to bridge rubber bearings and the lack of related research, the Huang M. team proposed a method combining prior knowledge of bearing damage localization with swordfish optimization. Experimental verification denoted that this method could effectively identify the severity of bearing damage, but the recognition accuracy was low.⁴ At the same time, Daneshvar et al. proposed an optimized iterative regularization method to solve the issue of low accuracy caused by noise in current bridge structure damage identification (BSDI) methods. Numerical simulation experiments verified that the algorithm can successfully identify bridge damage, but it has not solved the problem of large amounts of bridge-related data and poor real-time recognition.⁵ Effective research has been conducted. For example, Mousavi A A team proposed a structural health monitoring and damage detection method grounded on complete set empirical mode decomposition technology to address the low accuracy in bridge structural damage identification. Experimental verification showed that this method had robustness and higher sensitivity in damage identification and detection.⁶ In the research field of BSDI systems based on hybrid optimization, combining meta-heuristic algorithms with machine learning models has become one of the mainstream paradigms for improving the accuracy of damage identification. For instance, Pham V T et al. proposed a hybrid model integrating Particle Swarm Optimization (PSO) and classification gradient enhancement algorithms for cable-stayed Bridges, verifying the effectiveness of this method in the inherent health monitoring system.⁷ In the research field of BSDI based on deep learning, Corbally R and Malekjafarian A proposed a new driven deep learning framework for bridge condition monitoring to improve the accuracy of damage to bridge clamps and bearings. Experimental results show that this framework has good accuracy in terms of damage categories and degrees.⁸

As the social economy rapidly develops, the number of cars and freight vehicles is increasing rapidly, and real-time analysis of bridge health data and damage identification is becoming increasingly important. Finding a method for rapid data analysis, improved accuracy, and real-time detection of bridge structures damage is critical to the safety of people and property. Support Vector Machines (SVM) have the advantages of strong generalization and avoidance of overfitting, and are broadly applied.⁹ However, SVM has the problem of uncertain parameter selection, such as kernel function and penalty factor. PSO is a meta-heuristic optimization algorithm with the merits of strong global search capability and adaptability, which can better make up for the defects of the SVM algorithm.¹⁰ Principal Component Analysis (PCA) is an unsupervised dimensionality reduction method, which has the advantages of being simple and easy to implement and reducing data dimensionality, and is widely used in the fields of data analysis and data compression.¹¹ Many experts have done related research. For example, Zheng Q’s team proposed a multi-feature recognition system for chatter detection combining PSO and SVM to solve the low chatter recognition accuracy of machine tools, and compared the performance of this system with other systems, and the results showed that the recognition accuracy of this system was higher than other systems.¹² As another example, for the problem of poor real-time sales prediction, Zhang et al. proposed a prediction model based on PCA and back-propagation neural network, which was experimentally verified to have good robustness.¹³

The results indicate that there are currently few methods applied to the analysis of bridge health monitoring data, and there is still a problem of low accuracy in the identification of bridge structural damage. Therefore, the study combines PSO algorithm and SVM algorithm to construct an improved SVM algorithm. Moreover, Random Forest (RF) algorithm, PCA algorithm, and K-means algorithm are utilized to preprocess the bridge structure data. Finally, the improved SVM algorithm is applied to preprocessed data to construct a BSDI model that combines PCA and SVM. The research aims to raise the real-time and accuracy of BSDI.

The innovation of this research lies in the combination of PSO algorithm and SVM, optimizing the key parameters of SVM, and conducting multi-algorithm collaborative preprocessing of bridge structure data through RF, PCA, and K-means. A BSDI model combining PCA and SVM is constructed. This method not only enhances the accuracy of damage identification and the generalization ability of the model, but also strengthens the real-time monitoring capability. The contribution of the research lies in providing a new means of bridge health monitoring. Through experimental verification and performance comparison, the proposed method outperforms existing technologies, providing a new theoretical and practical basis for the development of the field of BSDI.

The contribution of this research lies in the significant improvement of the efficiency and reliability of damage identification by developing a new type of bridge health monitoring technology. This technology utilizes advanced data processing and machine learning algorithms to analyze bridge structure data, achieving rapid and accurate detection of damage. In addition, the research has provided a set of practical tools and methods for the field of bridge monitoring, which can be directly applied to actual bridge maintenance work, enhancing the practicality and operability of the monitoring system and providing a strong guarantee for the safe operation and long-term health of bridges.

Methods and materials

Construction of improved SVM algorithm for real-time monitoring data classification

In recent years, placing sensors on bridges and using machine learning algorithms for health monitoring have been widely applied.¹⁴ However, due to the increase in the amount and types of bridge data samples, the current data analysis methods and BSDI methods have problems such as poor real-time performance and low accuracy.¹⁵ SVM has the merits of strong generalization ability and avoiding overfitting, and is broadly utilized in fields such as image recognition and text recognition. Finding an optimal segmentation hyperplane that maximizes the distance between different categories is the goal of SVM.¹⁶ The schematic diagram of the hyperplane is denoted in Figure 1.

Figure 1.

Schematic diagram of the optimal hyperplane for SVM.

In Figure 1, $G_{1}$ and $G_{2}$ are the planes closest to the optimal hyperplane $G$ , and $G_{2}$ is $w \cdot x + ϖ = - 1$ , $G$ is $w \cdot x + ϖ$ , and $ϖ$ is the bias term of the hyperplane, $w$ denotes the normal vector of the hyperplane, $x$ denotes the sample feature vector, and $M a r g g i n$ is twice the minimum distance from all support vectors to the hyperplane. Therefore, it can be understood that the support vector is the point closest to the optimal hyperplane, and finding the optimal hyperplane can be transformed into a convex quadratic programming problem. This problem can be represented by formula (1).

{\begin{cases} \min_{w, ϖ, ψ} 1 / 2 w^{T} w + b \sum_{i = 1}^{n} ψ_{i} \\ S . T . {\begin{cases} z d_{i} [(w \cdot ϕ (x_{i})) + ϖ] - 1 + b_{i} \geq 0 \\ b_{i} \geq 0, i = 1, 2, \dots, n \end{cases} \end{cases}

(1)

In formula (1),

n

indicates the total amount of samples,

z d_{i}

means the category label of the

i

th sample point,

x_{i}

indicates the feature vector of the

i

th sample point,

ψ

denotes the relaxation variable, and

b

denotes the penalty factor. This problem can be addressed with the Lagrange function, which can be represented by formula (2).

L_{(w, ϖ, ψ, l a, β)} = 1 / 2 w^{T} w + b \sum_{i = 1}^{n} ψ_{i} + \sum_{i = 1}^{n} l a_{i} {z d_{i} [w \cdot x_{i} + ϖ] + ψ_{i} - 1} - b \sum_{i = 1}^{n} β_{i} ψ_{i}

(2)

In formula (2),

l a_{i}

and

β_{i}

mean Lagrange multipliers, and

l a_{i} \geq 0

β_{i}

and

L_{(w, ϖ, ψ, l a, β)}

are Lagrange functions. Then,

ϖ

and

w

are derived and taken as 0. The partial derivatives are obtained and then substituted into formula (2). Thus, it can be transformed into a dual problem of Lagrange function, which can be represented by formula (3).

{\begin{cases} \max_{α} \sum_{i = 1}^{n} a_{i} - 1 / 2 \sum_{i = 1}^{n} \sum_{j = 1}^{n} l a_{i} l a_{j} z d_{i} z d_{j} k (x_{i} \cdot x_{j}) \\ s . t . {\begin{cases} \sum_{i = 1}^{n} l a_{i} z d_{i} = 0 \\ b \geq l a_{i} \geq 0, i = 1, 2, \dots, n \end{cases} \end{cases}

(3)

In formula (3),

n

denotes the amount of samples,

z d_{j}

is the category label of the

j

th sample point,

l a_{j}

is the Lagrange multiplier corresponding to

j

sample points, and

k (x_{i} \cdot x_{j})

is the kernel function. Finally, by solving

l a_{i}

, the calculation expression for the decision function

g (x)

can be obtained, which can be represented by formula (4).

f (x) = s i g n (\sum_{i}^{n} z d_{i} l a_{i} k (x_{i}, z d_{i}) + ϖ)

(4)

However, SVM has the problem of uncertain parameter selection, such as kernel function and penalty factor. These parameters determine the complexity and width of the SVM decision boundary, and the choice of parameters determines the performance of the SVM model. To assist SVM in selecting optimal parameters, the PSO algorithm is studied to improve it. The PSO flow is denoted in Figure 2¹⁷

Figure 2.

PSO algorithm flow.

From Figure 2, the PSO algorithm process first initializes parameters such as the number of particle swarm, initial velocity, and position. Secondly, each particle’s fitness value in the particle swarm is calculated. At the same time, the best position of each particle, that is, the individual Optimal Position (OP) and the best position of the entire population of particles, that is, the global OS, are recorded. The latest particle population is generated by adjusting the velocity and position of all particles. The calculation expression for updating particle velocity is shown in formula (5).

v_{χ d} (υ + 1) = u v_{χ d} (υ) + δ_{1} r a n d () (p_{χ d} - o_{χ d} (υ) + δ_{2} r a n d () (p_{g d} - o_{g d} (υ))

(5)

In formula (5),

χ

and

d

are particle indices and dimensions,

t

is the amount of iterations,

v_{χ d} (t + 1)

stands for the velocity of the

χ

th particle in the

d

th dimension in

υ + 1

iterations,

δ_{2}

and

δ_{1}

are acceleration constants,

p_{g d}

is the Optimal Solution (OS) of the current population,

o_{g d}

is the global OS,

p_{χ d}

is the OS of the

χ

th particle so far, that is, the individual OS,

v_{χ d}

and

o_{χ d}

is the velocity and the position of the

χ

th particle in the

d

th dimension in the

υ

th iteration.

u

is the inertia weight. The expression for position update calculation is represented by formula (6).

o_{χ d} (υ + 1) = o_{χ d} (υ) + v_{χ d} (υ + 1)

(6)

In formula (6),

o_{χ d} (υ + 1)

means the position of the

χ

th particle in the

d

th dimension at the

υ + 1

th iteration. Then, the positional fitness value of each particle in the generated latest particle population is calculated, comparing it with the past OSs of individual particles and the OSs of the population. If the fitness value of the particle’s OS at the particle’s current position and the OS of the population is better than the past fitness value, the new one is used, otherwise the old one is not changed. Finally, if the set condition is satisfied then the value is taken as the OS and output. Instead, continue iterating until the condition is satisfied. To maintain a balance between global search and local fine search for the particle swarm, the study adopts dynamic inertia weight to improve it. The formula of dynamic inertia weight is shown in formula (7).

ω (t) = ω_{\max} - (ω_{\max} - ω_{\min}) \frac{t}{T_{\max}}

(7)

In formula (7),

ω_{\max}

and

ω_{\min}

are the initial maximum and termination minimum inertia weights respectively,

T_{\max}

represents the maximum number of iterations, and

ω (t)

is the current inertia weight. Thus, the PSO algorithm is introduced into the SVM algorithm to improve it, resulting in an improved SVM algorithm. The improved SVM algorithm process is denoted in Figure 3.

Figure 3.

Flow of improved SVM algorithm.

In Figure 3, $g$ is the kernel function parameter. The process of improving the SVM algorithm begins with initializing the particle swarm, where each particle denotes a possible solution to the SVM parameters. Next, the SVM parameters are initialized and the model is trained for testing. During the training, each particle’s fitness value is calculated based on the prediction accuracy of SVM, and the individual OS of each particle and the global OS of the population are updated accordingly. The velocity and position of the particles are then updated in accordance with the rules of the PSO algorithm, to find a better parameter combination in subsequent iterations. This process is repeated continuously until the preset max amount of iterations is reached. At the end of the iteration, if the conditions are met, the algorithm will obtain the optimal SVM parameters $C$ and $g$ . Finally, the SVM model is trained with the optimal parameters obtained to obtain the final optimal model.

Design of bridge structural damage identification model grounded on improved SVM and data preprocessing

Although the improved SVM algorithm can achieve the identification of bridge structural damage, there are still problems such as high noise, inconsistent dimensions, and excessive data volume in the bridge monitoring process. Finding an effective method to process data is greatly significant for improving the efficiency of bridge monitoring. In view of this, the study begins by dividing the bridge data collected through linear acceleration and gravitational acceleration into two datasets. Time series are used as an index of the data samples, and duplicate values in both data sets are removed, retaining only the last duplicate value. Secondly, the RF algorithm is used to fill in the missing values in the data. RF is an ensemble learning algorithm that has advantages such as feature importance assessment and strong robustness, and is widely utilized in natural language processing and image recognition. The process of filling missing values in the RF algorithm is to construct multiple decision tree models to predict and fill missing values in the dataset, with each model using the remaining features to predict the values of missing features.¹⁸ Next, the PCA algorithm is utilized to minimize the dimensionality of the data. The PAC algorithm process is shown in Figure 4.

Figure 4.

PAC algorithm flow.

From Figure 4, the process of the PCA algorithm first assumes that there are $m$ input data samples, and each sample has $l$ dimensional features, which can be represented by formula (8).

H = [\begin{array}{l} h_{11} & \dots & h_{1 l} \\ ⋮ & ⋱ & ⋮ \\ h_{m 1} & \dots & h_{m l} \end{array}]

(8)

In formula (8),

H

is the data matrix, and

h_{m l}

means the specific value of the

l

th feature of the

m

th sample. Secondly, all samples are standardized, which can be represented by formula (9).

\tilde{h_{η ι}} = \frac{h_{η ι} - \tilde{h_{ι}}}{s_{ι}}

(9)

In formula (9),

ι

and

η

are the index of the sample and the index of the feature, respectively.

\tilde{h_{η ι}}

is the

ι

th normalized feature vector of the

η

th sample, and

s_{ι}

is the standard deviation of the

ι

th feature. Next, the covariance matrix

R

is calculated, which can be represented by formula (10).

R = [\begin{array}{l} r_{11} & \dots & r_{1 l} \\ ⋮ & ⋱ & \dots \\ r_{l 1} & \dots & r_{l l} \end{array}]

(10)

In formula (10),

r_{l l}

is an element in the covariance matrix, and

c_{η ι} = \frac{\sum_{γ = 1}^{n} \tilde{h_{γ η}} \cdot \tilde{h_{γ ι}}}{n - 1}

, where

\tilde{h_{γ η}}

and

\tilde{h_{γ ι}}

are the normalized values of the

η

th and

ι

th features of the

γ

th sample. Immediately after that, the eigenvalues

T Z

and eigenvectors

ϑ

of the covariance matrix are calculated, and the relationship between

T Z

and

ϑ

can be expressed by the formula

R ϑ = T Z \times ϑ

. At the same time, the eigenvalues are sorted in order from largest to smallest. To find the most important parts of the data, the vector with the largest value is chosen to be the first component, after sorting. The next component is chosen to be the vector with the second largest value, and so on. Wherein, the principal components can be expressed by formula (11).

{\begin{cases} A_{1} = B_{11} \tilde{h_{1}} + B_{21} \tilde{h_{2}} + \dots + B_{l 1} \tilde{h_{n}} \\ A_{m} = B_{1 m} \tilde{h_{1}} + B_{2 m} \tilde{h_{2}} + \dots + B_{l m} \tilde{h_{n}} \end{cases}

(11)

In equation (11),

B_{1}, B_{2}, \dots B_{m}

are the corresponding eigenvectors for eigenvalue sorting, and

A_{1}

and

A_{m}

are the 1st and the

m

th principal components, respectively. Then, less than the total number of features

m

principal components are chosen, and the composite assessment values are calculated and ranked. Finally, the contribution of the eigenvalues is calculated, which can be expressed by formula (12).

D_{ι} = \frac{T Z_{ι}}{\sum_{γ = 1}^{m} T Z_{γ}}

(12)

In formula (12),

D_{ι}

is the contribution rate of the

ι

th eigenvalue, and the cumulative contribution rate is obtained by summing up the contribution rates of all eigenvalues. After dimensionality reduction using PCA algorithm, it needs to cluster the data to efficiently identify bridge structure damage. Therefore, the K-means algorithm is employed for clustering in the study. It is an unsupervised clustering algorithm whose basic principle is to divide a data set into K different groups or clusters. The aim is to make all the data points in the same group as similar as possible, and to make all the data points in the different groups as different as possible. The K-means algorithm determines the differences between clusters by calculating the distance between points, which can be represented by formula (13).

U (S J, ζ) = \sqrt{\sum_{E = 1}^{V} {(S J_{E} - C u_{E})}^{2}}

(13)

In formula (13),

U (S J, ζ)

is the Euclidean distance,

C u

indicates the cluster center,

V

indicates the amount of dimensions of the data,

S J

is the data point, and

E

is the index. Then, the K value average class algorithm is used to analyze the contour coefficients, and it is found that the classification effect is best when the cluster is 3. Contour coefficient is an indicator used to evaluate clustering performance, which comprehensively considers the “internal similarity” and “external separability” of sample points in clustering. Its calculation expression is shown in formula (14).

F = \frac{Y - Z}{\max (Z, Y)}

(14)

In formula (14),

F

is the contour coefficient,

Y

and

Z

denote the mean distance from a sample point to all points in the nearest other cluster and the mean distance from a sample point to all other points in its cluster. Therefore, the study uses data preprocessing and PCA algorithm to process and reduce the dimensionality of the structural health monitoring dataset of the Golden Gate Bridge in the United States, and obtains the linear acceleration relationship and gravity acceleration relationship between the horizontal axis

S

, longitudinal axis

J

, and vertical axis

Z

of the data, as shown in Figure 5.

Figure 5.

Data on the horizontal, vertical, and vertical axes of the relationship between the linear acceleration and acceleration of gravity.

In Figure 5, $φ S$ and $φ J$ represent the linear accelerations along the longitudinal and horizontal axes, $φ Z$ represents the linear acceleration along the vertical axis, $g φ S$ and $g φ J$ represent the gravitational accelerations along the longitudinal and horizontal axes, respectively, and $g φ Z$ represents the gravitational acceleration along the vertical axis. From Figure 5(a), the linear acceleration of the data on the horizontal axis and the vertical axis are both concentrated at the origin position and similar to the divergence levels of each interval. Figure 5(b) showcases that the divergence level of data on the vertical axis is lower than the linear acceleration divergence level of data on the horizontal axis. From Figure 5(c) and (d), the horizontal vertical axis direction of gravity acceleration is more uniform than the horizontal vertical axis direction of gravity acceleration. The study combines the results with clustering analysis to classify the data set into labels. The bridge status is divided into three categories. The first category, $S a f e$ , indicates that the gravitational acceleration in both directions is below the 25th percentile. The second category, $N o r m a l$ , indicates that the gravitational acceleration in both directions is between the 25th and 75th percentile values. The third category, $D a m a g e$ , indicates that at least one of the horizontal or vertical gravitational accelerations is above the 75th percentile. According to the above content, the research combines the improved SVM algorithm with the PC algorithm to construct a bridge structure damage recognition model based on the improved SVM and data preprocessing. The model is shown in Figure 6.

Figure 6.

Bridge structural damage identification model grounded on improved SVM and data preprocessing.

From Figure 6, the model process first uses time series as the index of data samples to remove duplicate values in the bridge dataset, retaining only the last duplicate value, to achieve the goal of removing duplicate values. Secondly, the RF algorithm is utilized to fill in the missing values in the data. Next, the PCA algorithm is utilized to cut down the dimensionality of the data, ensuring that all data are dimensionally consistent. The dimensionality reduced raw data is then classified into three categories using the K-means algorithm. Finally, the improved SVM algorithm is utilized to classify and recognize the data, thereby achieving the goal of BSDI. The pseudo-code of the model is shown in Table 1.

Table 1.

Pseudo-code for bridge damage identification model.

Identify model
Algorithm: Enhanced SVM Algorithm with PSO
Input: Dataset D, percentage of missing values (p_missing), PCA variance retention (variance_retention), number of particles (num_particles), maximum iterations (max_iterations),
initial inertia weight (w_max), final inertia weight (w_min),
learning factors (c1, c2), penalty factor range (PF_range), kernel function parameter range (KF_range)
Output: Optimized SVM model
Begin:
1. Load dataset D
2. If percentage of missing values in D > p_missing then a. Use RF to fill missing values in D
3. Split D into training set (80%) and test set (20%)
4. Further split training set into training (90%) and validation (10%) sets
5. Apply PCA to training set to retain variance_retention percentage of variance a. Reduce dimensions to 87 (or calculate based on variance_retention)
6. Initialize PSO parameters a. Set particle positions randomly within PF_range and KF_range
b. Set particle velocities to zero
c. Evaluate fitness (SVM accuracy) for each particle using validation set
d. Update personal best (pbest) and global best (gbest) positions
7. For each iteration from 1 to max_iterations do a. For each particle do
i. Calculate inertia weight w using linear decrease from w_max to w_min
ii. Update velocity and position using PSO equations
iii. Evaluate new fitness (SVM accuracy) on validation set
iv. If new fitness is better than pbest, update pbest
v. If new fitness is better than gbest, update gbest
8. Train SVM using gbest parameters on the full training set
9. Test the optimized SVM model on the test set
10. Output the optimized SVM model and its performance metrics
End

Results

Performance analysis of algorithms

To analyze the effect of the enhanced SVM algorithm, it was subjected to comparative performance analysis experiments using accuracy, F1 score, and Mean Squared Error (MSE) as metrics. Take 50% of the cross-validation average of all results. Comparative algorithms included CNN-LSTM, GA-BP, and PSO-LSTM algorithms. Parameter settings: The number of particle swarm was 40, the interval for penalty factor and kernel function parameters was [0.1100], the maximum number of iterations was 300, the learning factor was 1.5, the initial maximum inertia weight and the final minimum inertia weight were 0.9 respectively, and the kernel function used radial basis functions. The dataset was derived from the structural health monitoring dataset of the Golden Gate Bridge in the United States, download address to https://www.kaggle.com/datasets/mrcity/golden-gate-accel-20180512, with a total of 23,749 data points, of which 20% are the test set and the remaining are the training set. Among them, 10% was used as the validation set to control the early shutdown of PSO. The data processing process first removed duplicate values by indexing the time series samples, and then used the RF algorithm to fill in missing values. The proportion of missing values was 3.8%. Finally, the PCA algorithm and the K-means algorithm were used to reduce dimensionality and cluster the data. PCA retained 95% of the cumulative variance and reduced to 87 dimensions. The K value of K-means was 3. Data collection was performed using the Physical Sensor Toolbox suite on an Android V18.6 platform to monitor real-time variations in the surrounding magnetic field, which were then converted into electrical signals for display and storage. The data collection point was strategically positioned at the midpoint of the span, directly below the line connecting the two bridge towers, to capture critical structural health indicators. The sampling interval for data collection was set to 60 s, allowing for the capture of magnetic field data at one-minute intervals. This rate balanced the need for timely data with storage considerations while ensuring that dynamic changes in the magnetic field are effectively recorded. During the data preprocessing phase, the dataset was indexed by time series to identify and remove duplicate entries, with the last record for each duplicate timestamp being retained to ensure data consistency. Missing values were imputed using a RF algorithm, which is effective for handling missing data and enhancing the robustness of the dataset. Noise reduction was achieved through PCA, a method that retains the principal variance in the data, simplifying the dataset while discarding less significant noise components. Damage scenarios in this study included surface cracking, stiffness changes, and support failures. The network structure of the CNN-LSTM algorithm: It consists of two convolutional layers, each with 32 3 × 3 filters, followed by the ReLU activation function and a 2 × 2 Max pooling layer. Two stacked LSTM layers, each with 64 units. A 128-cell fully connected layer, followed by the ReLU activation function. Finally, there was a softmax output layer for multi-classification. Using the Adam optimizer, the learning rate was 0.001. The loss function was the classification cross-entropy. The batch size was 32, and the training period was 50 cycles. If there is no improvement in the loss within 10 cycles, the training will be stopped in advance. The population size of the GA-BP algorithm was 50 individuals, and the mutation rate and crossover rate were 0.1 and 0.8 respectively. The termination condition is that there is no improvement in the performance of the validation set after 100 generations of operation or 20 generations. The particle swarm size of the PSO-LSTM algorithm was 30 particles, the inertia weight was 0.4, and the learning factors were all 0.2. The dataset and preprocessing method are consistent with the experimental setup proposed in the study. The experimental environment for this study is denoted in Table 2.

Table 2.

Specific experimental environment of this study.

Parameter names	Parameter
CPU	Intel Core i9-14900KS
Main frequency	6.2 GHz
Internal memory	32 GB
Hard disk capacity	1 TB
Operating system	Windows 10 64
Matlab version	Matlab R2022b
Data analysis software	Spss 27.0

To verify the improvement effect of the RF algorithm on the SVM algorithm, the KNN-imputation and matrix completion techniques were respectively applied to data processing in the study, and a performance comparison analysis was conducted. The analysis results are shown in Table 3.

Table 3.

Analysis results.

Algorithm	Accuracy rate	Precision rate	Recall rate
RF	97.40%	95.70%	98.20%
KNN-imputation techniques	90.16%	89.14%	90.27%
matrix completion	86.87%	87.78%	89.52%

From Table 3, after using the RF algorithm, the accuracy rate, precision rate, and recall rate of the improved SVM algorithm were 97.40%, 95.70%, and 98.20% respectively. It was significantly higher than 90.16%, 89.14%, 90.27% of KNN-imputation techniques and 86.87%, 87.78%, 89.52% of matrix completion. The above results indicate that the RF algorithm, while completing the filling of missing values, ensures that the feature dimensions of the data are retained to the greatest extent possible, which improves the performance of the improved SVM algorithm. Therefore, using the RF algorithm for missing value supplementation results in better performance. To verify the validity of the K value of K-means being 3, the study conducted statistical tests and accuracy tests with values ranging from 1 to 5 on it. The experimental results are shown in Table 4.

Table 4.

Results of K value test.

K	Average contour coefficient	Average accuracy rate	p value
1	0	82.41 ± 0.21%	p < 0.001
2	0.32 ± 0.02	82.41 ± 0.19%	p < 0.001
3	0.57 ± 0.01	97.41 ± 0.24%	p < 0.001
4	0.48 ± 0.02	89.24 ± 0.13%	p < 0.001
5	0.39 ± 0.02	89.49 ± 0.24%	p < 0.001

From Table 4, the p values in this process were all less than 0.001, indicating that the statistical results are statistically reliable. Furthermore, as the value of K increased, the average contour coefficient value grew from 0 to 0.57 ± 0.01, at which point the K value was the maximum. However, when the K value increased to 5, the average contour coefficient value decreased by 0.39 ± 0.02. Meanwhile, when the K value increased from 0 to 3, the average accuracy rate changed from 82.41 ± 0.21% to 97.41 ± 0.24%. When the value of K ranged from 3 to 5, the value of average accuracy rate gradually decreased. It indicated that when the value of K was 3, the performance of the K-means algorithm reached the optimum. In the above environment, the accuracy and precision comparison experiments were first carried out on each method, and the accuracy and precision findings are denoted in Figure 7.

Figure 7.

The accuracy and precision of each algorithm.

From Figure 7(a), the accuracy of the improved SVM, CNN-LSTM, GA-BP, and PSO-LSTM algorithms was 97.4%, 87.1%, 89.2%, and 83.8%, respectively, with the improved SVM algorithm having the highest accuracy. According to Figure 7(b), the improved SVM algorithm had a precision of 95.7%, which was higher than the CNN-LSTM algorithm’s 88.2%, GA-BP algorithm’s 81.3%, and PSO-LSTM algorithm’s 78.9%. The results denoted that the improved SVM algorithm had better performance in terms of accuracy and precision compared to the comparison algorithms. The findings of the recall and F1 score comparison between the improved SVM algorithm and the other algorithms are shown in Figure 8.

Figure 8.

Comparison results of recall rate and F1 score of each algorithm.

From Figure 8(a), the SVM algorithm had the highest average F1 score, which was 96.2%. The CNN-LSTM algorithm had a mean F1 score of 90.4%, the GA-BP algorithm had a mean F1 score of 87.3%, and the PSO-LSTM algorithm had an average F1 score of 85.1%. From Figure 8(b), the average recall rates of the improved SVM, CNN-LSTM, GA-BP, and PSO-LSTM algorithms were 98.2%, 92.1%, 89.8%, and 90.3%, respectively. Among them, the SVM algorithm had the highest average recall rate. The results showed that, in recall rate and F1 score, the improved SVM algorithm had better performance contrast to the comparative algorithm. The comparison findings of MSE and Root Mean Squared Error (RMSE) of each algorithm are shown in Figure 9.

Figure 9.

The MSE and RMSE comparison results of each algorithm.

From Figure 9(a), the average MSE of the improved SVM algorithm was 1.2, which was lower than the 2.1 of the CNN-LSTM algorithm, the 3.7 of the GA-BP algorithm, and the 4.2 of the PSO-LSTM algorithm. According to Figure 9(b), the average RMSEs of the improved SVM, CNN-LSTM, GA-BP, and PSO-LSTM algorithms were 1.09, 1.44, 1.93, and 2.04, respectively, with the proposed improved SVM algorithm having the lowest average RMSE. The improved SVM algorithm had better performance compared to the comparison algorithm, both in MSE and RMSE. Combining the above findings, the improved SVM algorithm performs better and is effective in terms of RMSE, MSE, and accuracy rate. To verify the reliability of the results, the study conducted statistical tests on each experiment, and the test results are shown in Table 5.

Table 5.

Statistical test results.

Algorithm	Accuracy	Precision	F1	Recall	MSE	RMSE	p value
Improved SVM	97.4%	95.7%	96.2%	98.2%	1.2	1.09	P < 0.001
CNN-LSTM	87.1%	88.2%	90.4%	92.1%	2.1	1.44	P < 0.001
GA-BP	89.2%	81.3%	87.3%	89.8%	3.7	1.93	P < 0.001
PSO-LSTM	83.8%	78.9%	85.1%	90.3%	4.2	2.04	P < 0.001

From Table 5, at each time the experiment was conducted, the p value of the statistical test results was less than 0.001, indicating that the experimental results had a significant difference at the statistical 1% level and were statistically significant.

Performance analysis of bridge damage identification model

After verifying the performance of the improved SVM algorithm through multiple dimensions, a comparative performance analysis experiment was carried out on the bridge damage identification model. The comparison models were the bridge damage identification models based on CNN-LSTM, GA-BP, and PSO-LSTM. The comparative indicators were the recognition accuracy, running time, and AUC value of the ROC curve for the three types. The recognition accuracy results of each model for three types of bridge structures are shown in Figure 10.

Figure 10.

Identification accuracy results of each model for the three bridge structure types.

From Figure 10(a), the improved SVM model had recognition accuracies of 97.8%, 98.3%, and 97.4% for $S a f e$ , $N o r m a l$ , and $D a m a g e$ bridge conditions, respectively, which were significantly higher than the CNN-LSTM model’s 88.3%, 79.9%, and 90.2%, as well as the GA-BP model’s 84.7%, 86.2%, and 88.6%, and the PSO-LSTM model’s 79.8%, 82.1%, and 81.8%. The above outcomes indicated that, from the recognition accuracy, the improved SVM model outperformed the comparison model. The comparison outcomes of the running time and AUC values of the ROC curves for each model are shown in Figure 11.

Figure 11.

The model of running time and AUC value contrast result of ROC curve.

From Figure 11(a), the AUC value of the improved SVM model was 0.982, which was the highest among all models. The AUC value of the CNN-LSTM, the GA-BP, and the PSO-LSTM models was 0.898, 0.863, and 0.825. According to Figure 11(b), the average running times of the improved SVM model, CNN-LSTM model, GA-BP model, and PSO-LSTM model were 38.1s, 139.1s, 109.4s, and 78.4s, respectively, with the proposed improved SVM model having the shortest running time. The above outcomes indicated that the proposed model had the best performance and effectiveness in terms of AUC value and running time dimension of the ROC curve. The execution time has been reduced from 138 s to 38 s, mainly thanks to the use of the high-performance Intel Core i9-14,900KS CPU, the optimized Matlab R2022b version, and the introduction of the PSO algorithm with dynamic inertia weight. In addition, the research also optimized the data preprocessing process, including using RF algorithms to fill in missing values and PCA dimensionality reduction. These measures worked together to reduce the computational burden of the algorithm. To further verify the real-time performance of the model, the study was conducted in the original experimental environment, that is, under the conditions of a CPU of i9-14,900KS, a memory of 32 GB, and the experimental software of MATLAB R2022b, without adding any hardware. The end-to-end delay of a single sample was defined as less than or equal to 200 ms as the real-time acceptance limit, and the experiment was conducted by supplementary measurement of small segments of data at 1s, 5s, and 10s. The single-sample delay and real-time performance results under different data lengths are shown in Table 6.

Table 6.

Results of single-sample delay and real-time performance under different data lengths.

Data length	Number of samples	Total time	Delay
1s	256	6.8 ms	0.26 ms
5s	1280	33.9 ms	0.26 ms
10s	2560	67.8 ms	0.26 ms

From Table 6, the single-sample delay at different data lengths was all 0.26 ms, which was significantly less than 200 ms. The above results indicate that the model has real-time reasoning capabilities in the original hardware environment.

Discussion

This study analyzed the performance of the improved SVM algorithm and the BSDI model based on PCA and improved SVM. The outcomes denoted that the improved SVM algorithm had significant advantages in accuracy, precision, and F1 score. In the comparative analysis of accuracy, the accuracy of the improved SVM, CNN-LSTM, GA-BP, and PSO-LSTM algorithms was 97.4%, 87.1%, 89.2%, and 83.8%, respectively. Among them, the improved SVM algorithm had the highest accuracy. This result indicated that the introduction of the PSO algorithm optimized the parameters of the algorithm and improved the performance of the SVM algorithm. This result is similar to the results obtained by the Hang Y team in their related research on combining the PSO algorithm with the SVM algorithm.¹⁹ In the comparative analysis of precision, the precision of the improved SVM, CNN-LSTM, GA-BP, and PSO-LSTM algorithms was 95.7%, 88.2%, 81.3%, and 78.9%, respectively. The improved SVM algorithm had the highest accuracy. These outcomes indicated that the combination of dynamic inertia weight optimization PSO algorithm and SVM algorithm improved the classification accuracy of the algorithm. This result is consistent with the improved SVM algorithm proposed by Yuming et al.²⁰ In addition, in the comparative analysis of F1 score, recall rate, MSE, and RMSE, the improved SVM algorithm had an average F1 score, average recall rate, average MSE, and average RMSE of 96.2%, 98.2%, 1.2, and 1.09, respectively, which were all superior to the comparative algorithm. In the performance comparison analysis of recognition models, the improved SVM model achieved recognition accuracies of 97.8%, 98.3%, and 97.4% for Safe, Normal, and Damage bridge conditions, respectively, which were significantly better than the comparison models. This result indicated that the introduction of PSO algorithm improved the recognition accuracy and precision of the model. This result coincides with the conclusion reached by the Honghua X team in 2024.²¹ In addition, in the comparative analysis of running time and ROC curve AUC values, the proposed improved SVM model, CNN-LSTM model, GA-BP model, and PSO-LSTM model had AUC values of 0.982, 0.898, 0.863, and 0.825, respectively. The average running times were 38.1s, 139.1s, 109.4s, and 78.4s, respectively. The improved SVM model was the most optimal. This result indicated that the introduction of RF, K-means algorithm, and PCA algorithm for data preprocessing, combined with PSO algorithm and SVM algorithm, improved the recognition efficiency and accuracy of the model. This result coincides with the conclusion drawn by Zhao et al. in their 2024 related study.²² The limitation of this study is that it only considered the application of machine learning algorithms in BSDI, while there have been many applications of deep learning for BSDI. Considering the use of deep learning methods such as BP for BSDI is a further direction of research. Although the improved SVM algorithm performs best in terms of accuracy, it should also be noted that the algorithm may have limitations. For instance, the model may be rather sensitive to the selection of training data, and there may be differences in generalization ability across datasets of different scales. Future research needs to further explore these factors to enhance the robustness and applicability of the model.

Conclusion

This study introduces an improved SVM algorithm for BSDI, achieving superior performance metrics over existing methods, including accuracy of 97.4% and an AUC value of 0.982. The integration of PCA for dimensionality reduction and dynamic inertia weight-based PSO optimization enhances the algorithm’s efficiency and accuracy. The real contribution of this research is the demonstration of how algorithmic enhancements can significantly improve the real-time analysis of bridge health monitoring data. This advancement has practical implications for industrial applications, as it enables the early identification of structural damage, potentially reducing maintenance costs and improving bridge safety. By deploying sensors in key bridge structures and utilizing cloud platforms for data transmission, the proposed model can be integrated into existing infrastructure to provide continuous monitoring. This proactive approach to bridge health management can lead to more efficient maintenance strategies and extend the lifespan of bridges. In summary, the enhanced SVM algorithm and BSDI model offer a significant advancement in structural health monitoring, with both scientific and industrial implications. Future work will focus on validating the model in diverse conditions and exploring its integration with other monitoring technologies.

Footnotes

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Zhang

Liu

, et al. Causes and statistical characteristics of bridge failures: a review. J Traffic Transport Eng 2022; 9(3): 388–406.

Figueiredo

Brownjohn

. Three decades of statistical pattern recognition paradigm for SHM of bridges. Struct Health Monit 2022; 21(6): 3018–3054.

Hajializadeh

. Deep learning-based indirect bridge damage identification system. Struct Health Monit 2023; 22(2): 897–912.

Huang

Ling

Sun

, et al. Two-stage damage identification for bridge bearings based on sailfish optimization and element relative modal strain energy. Structural Engineering and Mechanics, An Int'l Journal 2023; 86(6): 715–730.

Daneshvar

Saffarian

Jahangir

, et al. Damage identification of structural systems by modal strain energy and an optimization-based iterative regularization method. Eng Comput 2023; 39(3): 2067–2087.

Mousavi

Zhang

Masri

, et al. Structural damage detection method based on the complete ensemble empirical mode decomposition with adaptive noise: a model steel truss bridge case study. Struct Health Monit 2022; 21(3): 887–912.

Pham

Thai

Kim

. A novel procedure for cable damage identification of cable-stayed bridge using particle swarm optimization and machine learning. Struct Health Monit 2025; 24(2): 714–737.

Corbally

Malekjafarian

. A deep‐learning framework for classifying the type, location, and severity of bridge damage using drive‐by measurements. Comput Aided Civ Infrastruct Eng 2024; 39(6): 852–871.

Kurani

Doshi

Vakharia

, et al. A comprehensive comparative study of artificial neural network (ANN) and support vector machines (SVM) on stock forecasting. Ann Data Sci 2023; 10(1): 183–208.

10.

Zhou

Yang

Peng

, et al. Performance evaluation of rockburst prediction based on PSO-SVM, HHO-SVM, and MFO-SVM hybrid models. Mining, Metallurgy & Exploration 2023; 40(2): 617–635.

11.

Daffertshofer

Lamoth

CJC

Meijer

, et al. PCA in studying coordination and variability: a tutorial. Clin Biomech 2004; 19(4): 415–428.

12.

Zheng

Chen

Jiao

. Chatter detection in milling process based on the combination of wavelet packet transform and PSO-SVM. Int J Adv Manuf Technol 2022; 120(1): 1237–1251.

13.

Zhang

Tian

Fan

. Forecasting sales using online review and search engine data: a method based on PCA–DSFOA–BPNN. Int J Forecast 2022; 38(3): 1005–1024.

14.

Gheisari

Hamidpour

Liu

, et al. Data mining techniques for web mining: a survey. Artif Intell Appl 2023; 1(1): 3–10.

15.

Karimi

Mirza

. Damage identification in bridge structures: review of available methods and case studies. Aust J Struct Eng 2023; 24(2): 89–119.

16.

Hoque

Billah

Debnath

, et al. Heart disease prediction using SVM. Int J Sci Res Arch 2024; 11(2): 412–420.

17.

Harrison

Dieu Nguimfack-Ndongmo

Alombah

, et al. Robust nonlinear MPPT controller for PV energy systems using PSO-Based integral backstepping and artificial neural network techniques. Int J Dyn Control 2024; 12(5): 1598–1615.

18.

Yan

Sun

, et al. A real-time intelligent lithology identification method based on a dynamic felling strategy weighted random forest algorithm. Pet Sci 2024; 21(2): 1135–1148.

19.

Zhang

Liu

, et al. Comparison of LR, 5-CV SVM, GA SVM, and PSO SVM for landslide susceptibility assessment in Tibetan Plateau area, China. J Mt Sci 2023; 20(4): 979–995.

20.

Yuming

Jiaohong

Zhenguo

, et al. On combined PSO-SVM models in fault prediction of relay protection equipment. Circ Syst Signal Process 2023; 42(2): 875–891.

21.

Honghua

Ziqiang

Yong

, et al. Assessment of radial loosening state in transformer windings based on FBG signals and DECE-PCA-BHOSVM. J Electr Eng 2024; 19(2): 381–390.

22.

Zhao

Wei

, et al. Comparison of debris flow susceptibility assessment methods: support vector machine, particle swarm optimization, and feature selection techniques. J Mt Sci 2024; 21(2): 397–412.