Abstract
Bridge scour poses a significant risk to infrastructure safety, yet traditional underwater monitoring methods are often unreliable and impractical during critical flood events. To address these limitations, this study proposes a physics-guided feature fusion network (PG-FFN) for estimating scour depth from bridge pier acceleration data. The PG-FFN features a novel dual-branch architecture design. A temporal feature branch uses a bidirectional long short-term memory (BiLSTM) network to extract latent patterns from raw vibration signals. Concurrently, a physical feature branch processes a set of engineered, physics-based descriptors using a multilayer perceptron. These distinct feature sets are integrated through a fusion module to produce a comprehensive representation of the structural response. To ensure the model’s predictions adhere to fundamental physical principles, a composite loss function is introduced to strictly enforce monotonicity constraints, while the non-negativity of the scour depth is explicitly guaranteed by the network architecture. For proof of the concept, the proposed model was trained and evaluated on a comprehensive synthetic data set generated from a calibrated numerical model. Results show the PG-FFN achieved a normalized root mean square error (NRMSE) of 7.60% on the unseen test set, representing a 33% improvement over a standard baseline BiLSTM model, which yielded an 11.31% NRMSE. An ablation study confirmed that the feature fusion mechanism was the primary contributor to this enhanced performance. Moreover, field validation on a real flood event demonstrated the model’s robustness in capturing scour trends under practical conditions. The findings demonstrate that integrating physical constraints and domain knowledge with deep learning provides a more accurate and physically consistent framework for vibration-based bridge scour monitoring.
Keywords
Introduction
Bridge scour, the erosion of sediment around bridge foundations by water flow, is widely recognized as a leading cause of bridge collapse worldwide, posing a critical threat to infrastructure safety ( 1 ). In the United States alone, scour-related failures account for approximately half of all bridge collapses ( 2 ), and similar trends are observed elsewhere ( 3 ). The risk is exacerbated during flood events, when rapidly flowing water can remove support from pier footings and abutments. Climate change and the increasing frequency of extreme floods further heighten the danger, as larger and more frequent flood discharges accelerate scour progression ( 4 ). Past analyses have documented hundreds of bridge failures from scour in recent decades ( 2 ), underscoring the urgent need for effective scour monitoring to prevent catastrophic collapses. Ensuring the safety of bridges against scour has therefore become a paramount concern for civil infrastructure managers worldwide.
Given the severity of the problem, various techniques have been developed for bridge scour monitoring. Traditional approaches often rely on underwater instrumentation such as sonar depth sounders, buried pressure or tilt sensors, scour rods, and fiber-optic probes to directly measure the riverbed level around foundations ( 5 – 7 ). These in-stream methods can provide point measurements of scour depth, but they suffer from practical limitations. Installation and maintenance of underwater sensors are difficult and costly, especially on existing bridges ( 8 ). For example, fixed sonar transducers and magnetic sliding collars have been used to track bed erosion, but debris impact and strong currents can produce faulty readings or destroy the equipment ( 9 ). Visual inspections, the most common scour assessment practice, are likewise limited, as inspectors cannot observe underwater erosion in real time and may only detect damage after the fact ( 10 ). Remote monitoring by video or lidar has been explored to overcome this. For instance, micro-camera systems have been developed to track bed level changes around model piers in laboratory settings ( 11 ). While image-based methods can capture scour evolution without interfering with flow, they are highly sensitive to lighting, turbidity, and camera alignment, which has hindered robust field deployment ( 11 ). In summary, many conventional scour sensing techniques entail high expense, poor survivability under flood conditions, significant maintenance demands, and data interruptions during the very events of interest.
To address these shortcomings, researchers have proposed indirect vibration-based monitoring approaches that infer scour progression from changes in a bridge’s structural response. The fundamental concept is rooted in the influence of soil-structure interaction on the global dynamic response. As scour erodes the sediment surrounding the piers, the effective unsupported length of the foundation elements increases. This geometric change leads to a reduction in the lateral stiffness of the pier-foundation system. Consequently, these physical alterations modify the bridge’s dynamic characteristics, resulting in measurable shifts such as decreased natural frequencies and distorted mode shapes ( 12 ). For example, the loss of soil support tends to decrease the natural frequency of the pier-foundation system as the effective cantilever length increases. Experiments have confirmed a monotonic decline in fundamental frequency with increasing scour depth, and other vibration metrics (e.g., mode shape curvatures or modal flexibility) also show systematic changes as scour develops ( 13 ). By tracking such features, one can potentially detect the onset and severity of scour without direct underwater measurements. The great advantage of vibration-based methods is that they use sensors (typically accelerometers) mounted on the bridge superstructure or pier above the waterline. This placement keeps the instruments safe from flood debris and eliminates the need for intrusive work in the river. Accelerometer-based monitoring is low-cost, relatively easy to retrofit on existing bridges, and capable of continuous real-time data collection during extreme events ( 14 ). These benefits have spurred growing interest in vibration analysis as a scour surveillance tool ( 8 ). Although accelerometer-based hardware is commercially available for general monitoring, current industrial solutions typically rely on threshold-based triggers rather than quantitative estimation. Similarly, most prior studies in this domain have focused on detecting scour via identified modal parameters or stiffness indices ( 15 ) rather than directly predicting scour depth from raw vibration signals. In particular, leveraging the rich information in the time-series acceleration data (beyond simple frequency shifts) for scour quantification is an area that has seen limited exploration to date. There is a clear need for data-driven frameworks that can ingest raw structural vibration measurements and output a reliable estimate of scour depth in real time.
Deep learning provides a promising avenue for extracting complex patterns from structural response data that are indicative of scour. In recent years, advanced sequence-learning models have been applied in structural health monitoring for tasks such as damage detection ( 16 ), anomaly identification ( 17 ), and condition forecasting ( 18 ). Recurrent neural networks, in particular, are well-suited to analyzing temporal sensor data and have shown success in capturing subtle changes in structural dynamics ( 19 ). In the context of bridge scour, several researchers have begun employing deep learning to improve scour predictions. For example, Hashem and Yousefpour ( 20 ) pioneered the application of long short-term memory (LSTM) sequence models for real-time scour forecasting, using historical monitoring records (water levels, flow velocities, and bed elevations) to predict future scour growth during flood events. Their studies demonstrated that data-driven LSTM models can learn the complex, nonlinear scour progression behavior without manual feature extraction, and were able to forecast scour depth changes up to a week in advance under varying hydrologic conditions. Other machine-learning approaches, such as support vector machines ( 21 ), decision trees ( 22 , 23 ), and shallow neural networks ( 24 , 25 ), have also outperformed traditional empirical formulas in maximum scour depth estimation. However, purely data-driven models often face generalization issues outside the range of training data ( 26 ). Black-box models might capture correlations present in the training set but can produce physically inconsistent predictions when extrapolated to new sites or flood scenarios. This limitation motivates the integration of domain knowledge or constraints into the learning process to guide the model toward physically plausible behavior.
Recently, physics-informed neural networks (PINNs) have emerged as an effective framework to imbue learning models with physical law constraints ( 27 ). In a PINN, the network is trained not only on data fidelity but also to satisfy governing equations or known analytical relationships by penalizing their residual in the loss function. This concept, introduced by Raissi et al. ( 28 ), provides a way to enforce that neural network outputs respect fundamental physics, thereby improving solution interpretability and realism. PINN approaches have been successfully applied in various civil and mechanical engineering problems, including fluid–structure interaction ( 29 ) and structural mechanics simulations ( 30 ). By coupling data with physics, these models can achieve better generalization and reliability, especially when observational data are sparse. For instance, Vahab et al. ( 31 ) showed that PINNs could accurately solve pile–soil interaction problems by embedding the governing differential equations of soil mechanics into the learning model, leading to results consistent with engineering theory even with limited data. Generally, the inclusion of physical constraints serves to regularize the model and reduce unphysical behavior. This yields predictions that are not only more accurate but also easier to interpret in light of expected system response ( 32 ). In the realm of bridge scour, such physics-guided strategies offer a compelling strategy to combine measured vibration data with analytical scour mechanics, harnessing the strengths of both data-driven and mechanics-based approaches.
Based on the identified limitations of existing scour monitoring approaches and the critical need for more robust, physically consistent predictive models, a physics-guided feature fusion network (PG-FFN) is proposed for the estimation of bridge pier scour depth from pier-mounted accelerometer sensors. This method is designed to leverage both the comprehensive information contained in high-frequency vibration data and domain knowledge from structural mechanics. The key advantage of PG-FFN lies in its dual-branch architecture, which fuses temporal features extracted from raw acceleration sequences with engineered physics-based features. This structure enables the model to capture complex vibration patterns while simultaneously maintaining consistency with established physical trends in scour progression. As a result, the PG-FFN offers improved prediction accuracy and enhanced generalization, while the incorporation of physics-guided regularization ensures that the outputs remain physically meaningful and interpretable, ultimately providing a more reliable foundation for real-time bridge scour assessment.
The remainder of this paper first describes the data acquisition and pre-processing procedures, then introduces the baseline BiLSTM network and the proposed physics-guided feature fusion network, along with relevant training strategies. The performance of both networks is evaluated and discussed, including an ablation study to quantify the contribution of the physics-guided feature branch and a field validation case study using data from a real flood event. The main findings and conclusions are summarized at the end.
Data Acquisition
Field Instrumentation
Field instrumentation was installed on a concrete bridge located in Georgia, USA, to capture pier vibrations and directly measure scour depth. MEMS (micro-electro-mechanical systems)-based triaxial accelerometers (Model: ADXL345) were enclosed in a weather-proof box, secured to the upstream face of the cap beam of the selected scour-critical pier. This specific location was chosen because the pier cap acts as the anti-node for the fundamental transverse bending mode, thereby ensuring maximum observability of the vibration response induced by stiffness changes at the foundation. Concurrently, an ultrasonic depth sensor was affixed to the submerged portion of the same pier, as illustrated in Figure 1. The data logger and battery pack were placed inside a sealed container fastened to the pier cap to protect the electronics during flood events. A right-handed orthogonal coordinate system was established at the accelerometer, with the bridge pier’s transverse axis (parallel to the bent cap, which is the primary direction of flow-induced vibration) assigned to x, the bridge longitudinal axis to y, and the vertical axis to z. Although the x-axis is aligned with the streamwise direction for this bridge, the sensor measures the structural response in the pier’s local coordinate system.

Field instrumentation set-up.
The accelerometer recorded three axis acceleration at a sampling rate of 75 Hz. Concurrently, the ultrasonic depth sensor measured the distance between its transducer and the foundation interface, enabling scour depth to be inferred from changes in that distance. Data sets were obtained during non-flood periods as well as during several flood events, yielding representative vibration and hydrodynamic conditions. These field measurements were used to support finite element model calibration and synthetic data generation. Model training was carried out solely with a pure synthetic data set, which offered broad scour depth coverage, eliminated sensor noise, permitted controlled variation of governing parameters, and represented extreme events that were not captured during field monitoring.
Bridge Pier Numerical Modeling
Given the limited coverage and quantity of field measurements, a detailed three-dimensional structural model of the instrumented pier was developed using the commercial finite element software SAP2000 to systematically generate synthetic data sets that simulate river flow-induced structural responses. The model’s geometry and material properties were based on as-built drawings of the in-service bridge, a 1971 cast-in-place concrete T-beam structure. The bridge has an overall length of 121.92 m with a 14.63-m deck width, consisting of nine 12.19-m spans supported by reinforced concrete bents. Each bent comprises a 762 × 711 mm cap beam and six steel H-piles spaced at 2.90 m, with an exposed length of 5.44 m and a 5.49-m average embedment. To accurately capture the foundation stiffness, the 1:8 batter of the outer piles and the steel angle cross-bracing between piles in the submerged bents were explicitly modeled, as illustrated in Figure 2. Importantly, the superstructure is simply supported on the bents via plain neoprene elastomeric bearing pads, with deck joints at each support ensuring flexural discontinuity between spans. This simply supported configuration effectively isolates the substructure’s transverse vibration, making it highly sensitive to foundation stiffness changes induced by scour.

Configuration of the finite element model for the bridge pier (unit: m).
For future research on vibration-based monitoring, a multi-level modeling strategy is recommended to ensure structural fidelity. The explicit modeling of foundation components, including specific pile batter and bracing configurations, is essential as they govern the baseline lateral stiffness. Soil-structure interactions can be effectively captured by identifying a depth of fixity based on soil properties and embedment data, while the pier-deck connection can be effectively idealized using boundary springs to represent the actual restraint provided by elastomeric bearings. This integrated modeling approach, focusing on the transverse vibration mode of simply supported spans, ensures that the resulting dynamic features are sufficiently sensitive to foundation-level changes while minimizing interference from the superstructure’s global response.
Field-Based Model Calibration
To ensure the numerical model accurately reflects the field dynamic behavior, a rigorous calibration was performed against field test data. A brake test was conducted using a 2,041 kg pickup truck to induce a lateral impulse, identifying a fundamental transverse frequency of 2.68 Hz. To avoid the non-uniqueness problem often encountered in model updating, the calibration was restricted to a single parameter: the equivalent lateral stiffness of the superstructure. While the bent geometry and pile fixity were locked based on as-built drawings, the superstructure stiffness was iteratively adjusted until the model’s natural frequency matched the field-observed 2.68 Hz. The final calibrated stiffness contributions were determined to be approximately 62.17 kN/mm for the bent structure and 31.52 kN/mm for the superstructure boundary.
Synthetic Sample Generation
Using this calibrated model, the physical effects of scour were then simulated. This was achieved by systematically increasing the unsupported length of the piles, which effectively reduced the foundation’s fixity depth and lateral stiffness. Crucially, to ensure the physical validity of the scour progression, the superstructure’s boundary stiffness was kept constant throughout the simulation, while the complex geometric features such as pile batter and cross-bracing were explicitly retained. This variable isolation ensured that the simulated dynamic shifts were solely driven by the physical mechanics of soil support loss. A linear time-history analysis was conducted for each prescribed scour condition. The model was subjected to synthetic white-noise excitation designed to replicate the frequency content and amplitude characteristics of ambient vibrations observed in the field. The resulting acceleration response time series from these simulations provided the labeled data for subsequent analysis.
Building on this, the calibrated model was subsequently employed to create a comprehensive synthetic data set. A total of 100,000 samples, encompassing non-flood and flood scenarios, were generated by supplying 10-s windows of river flow axis acceleration and prescribed scour depth values to the model and recording the simulated response at 75 Hz. To ensure the model’s robustness across various severity levels, the prescribed scour depth was varied from 0 cm to 61 cm in 3.05-cm increments. This range was designed to encompass the magnitude of scour observed in the field while providing sufficient margin to capture more substantial foundation stiffness changes. The resulting data set yielded a uniform distribution of scour depths across the entire specified range.
To affirm the fidelity of the generated data, a validation was performed by comparing the frequency-domain characteristics of a synthetic acceleration signal with a corresponding field measurement. A representative sample was selected from the synthetic data set and compared against a field-measured signal under an identical scour depth condition. As shown in Figure 3, the fast Fourier transform (FFT) was applied to both time series. While the synthetic spectrum exhibits a sharper peak as a result of the idealized low damping (2%) in the model compared with the broader response of the actual structure, a high degree of correspondence was observed between the two spectra, particularly in the locations of dominant frequency peaks (2.68 Hz) and the overall energy distribution. This strong alignment confirms that the numerical simulations effectively replicate the key dynamic characteristics of the physical structure. The result validates the suitability of the synthetic data set for evaluating the proposed PG-FFN model.

Comparison of fast Fourier transform (FFT) spectra for a representative synthetic acceleration signal and a corresponding field-measured acceleration signal.
Data Pre-Processing
The full synthetic data set contained 100,000 samples and was randomly shuffled. An 80% portion was allocated to training, while 10% served as validation and the remaining 10% formed the test set. The separation guaranteed that hyperparameter tuning and early stopping relied only on validation performance, and the test set remained unseen until final evaluation.
All input signals were standardized through z-score normalization. The mean and standard deviation were computed exclusively from the training set to avoid information leakage. These statistics were then applied to the training, validation, and test sets so that each sample possessed zero mean and unit variance under a common scale. The resulting normalized values provided consistent magnitudes across non-flood and flood scenarios and ensured that gradients of similar order were presented to the optimizer.
Method
For practical implementation in bridge health monitoring, a workflow for real-time scour depth prediction using PG-FFN was proposed. River flow axis acceleration recorded over a 10-s window is first directed to the pre-processing module shown in Figure 4. Within this module, physically meaningful features are extracted and the raw acceleration trace is normalized by the z-score method to ensure scale consistency. Both the engineered features and the normalized signal are then provided to the PG-FFN prediction module, which outputs an estimated scour depth. The estimate is continuously compared with a user specified depth limit. When the limit is exceeded, an alert is issued so that countermeasures such as field inspection or temporary traffic restrictions can be initiated. The procedure subsequently returns to data acquisition, forming a closed monitoring loop that tracks scour evolution in near real time.

Physics-Guided Feature Fusion Network (PG-FFN) scour prediction workflow.
The staged architecture addresses several complementary objectives. Normalization equalizes amplitude differences produced by varying hydrodynamic conditions and sensor gain drift, which permits fair comparison across non-flood and flood events. The inclusion of physics-motivated features injects domain knowledge related to vibration energy, frequency softening, and modal shifts, guiding learning toward physically plausible relationships. By fusing these engineered descriptors with the normalized time series, the PG-FFN leverages both global structural insight and local dynamic patterns to deliver robust predictions that remain stable in the presence of noise. The continuous alert mechanism transforms numerical outputs into actionable information that supports timely decision making for bridge safety.
Baseline BiLSTM Network
For comparative evaluation, a recurrent neural network baseline was established to serve as a conventional benchmark for sequence regression. Bidirectional long short-term memory (BiLSTM) networks are widely regarded as effective for capturing temporal dependencies in vibration signals, making them a suitable reference for this task. The configuration shown in Figure 5 is provided with a 10-s window of river flow axis acceleration that contains 750 scalar time steps. The signal is processed by a two-layer BiLSTM with an input size of 1 and a hidden size of 128 units for each direction, with a dropout rate of 0.2 inserted between the layers to prevent overfitting. The resulting hidden state sequence is transposed so that the feature dimension precedes the temporal dimension. An adaptive average pooling layer then condenses each feature channel to a single value. The pooled representation, a 256-dimensional vector, is passed to a fully connected (FC) layer with 256 inputs and one output, yielding a single scour depth estimate.

Baseline bidirectional long short-term memory (BiLSTM) network architecture.
Training was conducted with the mean squared error (MSE) loss function. Mini-batches of 256 samples were presented, and an initial learning rate of 8 × 10−4 followed a cosine annealing schedule that gradually reduced the step size during optimization. Ten epochs were completed, and convergence was monitored on the validation data set to ensure that generalization performance remained stable.
Physics-Guided Feature Fusion Network
To furnish the PG-FFN with interpretable inputs rooted in structural mechanics, 11 vibration descriptors were derived from each 10-s segment of river flow axis acceleration. Four quantities were calculated in the time domain, including the root mean square (RMS) value, the peak-to-peak range, kurtosis, and the zero-crossing rate, all of which characterize overall energy, extreme amplitudes, impulsiveness, and dominant oscillation count. Three spectral indicators were obtained from the FFT: the dominant spectral peak, the spectral centroid, and spectral entropy, which together quantify stiffness related frequency content and energy dispersion. Time frequency information was introduced through two Morlet wavelet energy measures covering 0 to 5 Hz and 5 to 12 Hz, while the Hilbert envelope mean captured average instantaneous amplitude. Finally, the first modal frequency shift relative to the scour free baseline was included to provide an explicit measure of stiffness degradation.
These descriptors (Table 1) were selected because they respond sensitively and in complementary ways to scour-induced changes. As sediment is removed, effective pier stiffness is reduced and hydrodynamic loading increases, leading to higher vibration energy and larger peak-to-peak excursions. Reduced stiffness also lowers natural frequencies, which is reflected in a downward movement of the dominant spectral peak, a leftward shift of the spectral centroid, and a negative modal frequency shift. Increased turbulence and flow variability raise spectral entropy and elevate low frequency wavelet energy, whereas local component responses concentrate energy in the higher wavelet band. The Hilbert envelope mean tracks the gradual growth of vibration amplitudes across the window. Collectively, these features provide a mechanistic link between acceleration signatures and scour depth, enabling the network to learn physically plausible relationships. Specifically, the inclusion of frequency-domain indicators ensures that the model remains robust to variations in excitation amplitude or angle, as these features are governed by structural stiffness rather than forcing functions.
Engineered Physical Features
Building on the physics-guided descriptors introduced earlier, a dual-branch architecture was devised to couple raw vibration signatures with engineered knowledge, as illustrated in Figure 6. This structure allows the network to learn from both raw time-series signals and engineered physical descriptors simultaneously, capturing a more comprehensive representation of the system’s dynamics.

Physics-guided feature fusion network (PG-FFN) architecture.
The first path, the temporal feature branch, is responsible for extracting latent patterns directly from the raw vibration data. It takes a 10-s window of river flow acceleration as input and processes it with a two-layer BiLSTM network. Since the inference is performed on a fully buffered window, the bidirectional processing uses only available historical data, ensuring that the causality requirement for real-time monitoring is satisfied. Structurally, the BiLSTM has a hidden size of 128 for each direction and a dropout rate of 0.2 to mitigate overfitting. The output of the BiLSTM is then passed through a transpose layer and an adaptive average pooling layer, which condenses the temporal sequence into a fixed-size 256-dimensional feature vector that summarizes the dynamic characteristics of the input signal.
Concurrently, the second path, the physical feature branch, processes the set of 11 engineered physical features. This branch functions as a multilayer perceptron. It first feeds the 11 features into a FC layer that expands them into a 64-dimensional space, followed by a rectified linear unit (ReLU) activation function and a dropout layer with a rate of 0.2. A subsequent FC layer further transforms this representation into a 32-dimensional feature vector.
The outputs of the two branches are then concatenated in a feature fusion step, creating a combined 288-dimensional vector. This fused representation is passed through a squeeze-and-excitation (SE) block, which adaptively recalibrates the channel-wise feature responses to emphasize more informative features. Following another dropout layer, the refined vector is processed by a final regression head, consisting of two FC layers (mapping from 288 to 64, and then from 64 to 1) with a ReLU activation in between. Finally, a ReLU activation is applied to the output layer to strictly enforce the non-negativity of the predicted scour depth.
Model Training and Optimization
To guide the dual-branch network toward solutions that respect basic physical principles, a composite loss function (Equation 1) was introduced that augments the conventional MSE with one regularization term. The MSE encourages agreement between predicted and true scour depths. The monotonicity term imposes the requirement that the estimated depth does not decrease when vibration energy rises by penalizing negative gradients of the prediction with respect to RMS.
where
y is the true scour depth,
λ is the weight for monotonicity penalty,
RMS is the root mean square of the acceleration window, and
Automatic differentiation was employed to evaluate the gradient term within each mini batch. The penalties are activated only when the constraints are violated, so optimization focuses mainly on data fitting while gently discouraging implausible behavior. The penalty weight λ was set to 0.3, which preserved prediction accuracy while enforcing the desired physical properties without exhaustive tuning.
Following z-score normalization, both the 10-s river flow axis acceleration windows and all 11 engineered physical features were standardized using statistics computed from the training set. Training of the PG-FFN was then conducted with the composite physics-guided loss function. Mini batches of 256 samples were processed to provide a balance between gradient variance and memory consumption. An initial learning rate of 7 × 10−4 was employed, allowing for rapid descent yet avoiding divergence. The learning rate was scheduled using a cosine annealing strategy, which provided a robust approach to learning rate decay by gradually reducing the step size over the course of training. Optimization proceeded for 10 epochs, and performance on the validation set was monitored after each cycle to guide early stopping when necessary.
Results and Discussion
Convergence of Training and Validation Loss
The convergence behavior of both models was examined to assess their learning dynamics and stability. The evolution of the loss functions for the baseline BiLSTM and the proposed PG-FFN is presented in Figures 7 and 8, respectively.

Training and validation loss of baseline bidirectional long short-term memory (BiLSTM).

Training and validation loss of physics-guided feature fusion network (PG-FFN).
Figure 7 shows that the baseline BiLSTM converges effectively, with both the training and validation MSE loss decreasing sharply within the first four epochs before stabilizing around a value of 0.05. The close tracking between the two curves indicates that the model learned to generalize well from the training data without significant overfitting. This establishes a strong benchmark for a standard deep learning approach.
Similarly, the PG-FFN exhibits rapid convergence, as depicted in Figure 8. A notable distinction, however, is that its composite loss reaches a lower final value of approximately 0.02. This suggests that the PG-FFN identified a more optimal solution than the baseline model. The discussion here centers on the role of the physics-guided constraints. The smooth and stable descent of the PG-FFN’s loss curves implies that the monotonicity penalty did not conflict with the MSE term but rather complemented it. By regularizing the solution space and penalizing physically implausible outputs, the composite loss function appears to have guided the optimizer toward a more robust and accurate minimum. This enhanced convergence demonstrates that incorporating domain knowledge not only improves the final prediction but can also contribute to a more stable and efficient training process.
Feature Correlation and Adaptive Weighting
To address the potential multicollinearity among the engineered physical descriptors, a Spearman correlation analysis was conducted on the training set. Figure 9 illustrates the correlation heatmap of the 11 physical features (the feature IDs, F1 to F11 correspond to the entries listed in Table 1).

Correlation heatmap of the 11 physical features.
As shown in the Figure 9, strong correlations exist within specific sub-categories of the physical descriptors. For instance, the energy-related features in the time and time-frequency domains exhibit high positive collinearity, reflecting their collective sensitivity to scour-induced vibration amplification. As highlighted by Parisi et al. ( 33 ), highly correlated input features can obscure traditional static feature importance evaluations and often necessitate manual feature selection or clustering to prevent overfitting in conventional machine learning models.
However, rather than relying on manual feature dropping or post-hoc static importance ranking, the proposed PG-FFN inherently mitigates the redundancy issue through its SE block. The SE block functions as an adaptive, channel-wise feature selector during the training process.
Comparison of Model Performance
To rigorously evaluate the proposed model, its predictive performance was benchmarked against a baseline BiLSTM and an ablated version of the network on the unseen test set. The comparison was based on three standard regression metrics: mean absolute error (MAE), root mean square error (RMSE), and normalized root mean square error (NRMSE). A summary of the evaluation results is presented in Table 2.
Performance Comparison of Different Models on the Test Set
Note: MAE = mean absolute error; RMSE = root mean square error; NRMSE = normalized root mean square error; BiLSTM = bidirectional long short-term memory; PG-FFN = physics-guided feature fusion network.
The physical feature branch is removed.
The results clearly indicate that the proposed PG-FFN significantly outperforms the baseline BiLSTM model across all evaluation metrics. Focusing on the primary metric, the PG-FFN achieved an NRMSE of 7.60%, which represents a 33% reduction in error compared with the 11.31% NRMSE of the baseline model. This substantial improvement highlights the effectiveness of the dual-branch architecture in capturing a more comprehensive and accurate representation of the underlying structural dynamics related to scour.
To further dissect the sources of this performance gain, an ablation study was conducted. In this study, the physical feature branch was removed from the PG-FFN, forcing the model to rely solely on the temporal features extracted from the raw acceleration signal by the BiLSTM branch. This ablated model yielded an NRMSE of 9.59%. While this is still an improvement over the baseline BiLSTM, it represents a considerable increase in error compared with the full PG-FFN.
The comparative analysis reveals two key insights. First, the superior performance of the full PG-FFN over the baseline model demonstrates the value of integrating engineered, physics-based features. The baseline model, which processes only the raw time series, struggles to consistently extract the subtle, scour-indicative patterns from the complex vibration signals. In contrast, the PG-FFN’s physical feature branch provides explicit, interpretable information related to changes in structural stiffness and energy, effectively guiding the model toward a more accurate solution.
Second, instead of evaluating the static importance of individual, highly correlated features, which can be misleading in dual-branch networks, this macroscopic ablation study confirms the collective contribution of the physics-guided descriptors. The performance degradation observed after removing the physical feature branch confirms that the feature fusion mechanism is the primary driver of the model’s enhanced accuracy. The engineered features act as a strong inductive bias, incorporating essential domain knowledge that the temporal branch alone may not learn effectively. Guided by the SE block’s dynamic channel recalibration, the fusion of these distinct but complementary feature sets allows the PG-FFN to effectively model the relationship between structural vibration and scour depth.
Beyond the numerical improvement, the PG-FFN offers a physically interpretable pathway for scour monitoring. The model’s performance is largely driven by its ability to associate the engineered vibration features with specific scour-induced physical changes. For instance, the physical feature branch effectively identifies the correlation between the loss of soil support and the reduction in the pier’s effective stiffness, which is physically manifested as a downward shift in the fundamental frequency and an increase in vibration amplitude caused by the lengthened cantilever effect. Unlike the pure BiLSTM baseline, which must implicitly infer these complex relationships from raw signals, the PG-FFN is explicitly guided by these domain-specific descriptors. Furthermore, the physics-guided loss function ensures that the computational predictions adhere to the physical reality of scour, specifically that the estimated depth must be non-negative and consistent with the energy growth trends, thereby preventing the model from outputting numerically optimal but physically impossible predictions.
Field Validation
To further assess the model’s practical applicability and its ability to generalize to real-world conditions, a field validation was conducted using data captured during a flood event at the monitored bridge in Georgia in January 2024. Hydrological records from a regional gauging station indicate that the river stage rose abruptly from a base level of approximately 0.9 m to nearly 2.8 m during this event, creating the high-flow conditions necessary to trigger significant bed erosion. While the ultrasonic depth sensor provided ground truth measurements, data availability was constrained by the system’s power settings, resulting in a sampling interval of 10 min. A 3-h window representing the most active phase of the flood, specifically corresponding to the rising stage of the hydrograph, was selected for analysis, during which the scour depth was observed to increase from approximately 3 cm to 25 cm. For each 10-min timestamp, a corresponding 10-s segment of acceleration data was processed by the PG-FFN to generate a scour depth prediction. As illustrated in Figure 10, the model’s predictions exhibit a strong correlation with the measured data, accurately tracking the upward trend of the scour development. This robust performance in a real-world scenario highlights the physics-guided framework’s capability to mitigate domain shift, offering a strong basis for future real-time monitoring implementations.

Comparison of field-measured and predicted scour depths during a real-world flood event.
Applicability and Limitations
The applicability of the PG-FFN approach depends on the structural design and configuration, as well as the sensitivity of the global response to localized foundation changes. For simply supported bridges, such as the T-beam structure in this study, the discontinuity at the bearings decouples the substructure’s transverse vibration from the overall superstructure, enhancing the observability of scour-induced stiffness loss. In contrast, for integral bridges or continuous-span structures, the strong global constraints may dilute the frequency shifts caused by local scour, potentially requiring more refined feature extraction to maintain accuracy. Furthermore, while the framed-bent configuration provides favorable lateral flexibility for monitoring, extremely stiff, large-diameter single-shaft piers might exhibit less pronounced modal changes under moderate scour levels. The proposed model also relies on a calibrated FE model for training data generation. Therefore, its deployment on complex bridge layouts requires site-specific calibration and sufficient ambient excitation to ensure high-fidelity predictions.
Practical Implementation and Economic Considerations
From an economic perspective, PG-FFN offers superior return on investment over traditional underwater sensors. Instruments like sonar require expensive underwater installation and frequent maintenance as a result of debris or biofouling. Conversely, vibration-based monitoring uses pier cap–mounted accelerometers, eliminating submerged maintenance and enhancing sensor survivability during floods. This low-cost, durable configuration provides a sustainable solution for large-scale bridge network monitoring.
With regard to practice, the proposed framework effectively supplements existing inspection protocols, especially when visual assessments are impractical during peak floods. As detailed in the workflow (Figure 4), the automated alert system enables owners to set data-driven safety thresholds for triggering site inspections or bridge closure, enabling proactive infrastructure risk management.
Conclusions
Bridge scour remains a critical threat to infrastructure safety, and the limitations of conventional underwater monitoring methods highlight an urgent need for robust, resilient, and cost-effective alternatives. To address this, a physics-guided feature fusion network was developed, its performance was benchmarked against a conventional recurrent model, and an ablation study was conducted to quantify the contribution of the physics-guided feature branch. Based on the comparative results, several conclusions are drawn:
1) The proposed PG-FFN demonstrated superior predictive accuracy, achieving a NRMSE of 7.60% on the unseen test set. This performance represents a 33% error reduction compared with the 11.31% NRMSE from the baseline BiLSTM model. This result underscores the effectiveness of fusing automatically learned features from raw signals with engineered, physics-based descriptors.
2) Based on the ablation study, the removal of the physical feature branch increased the prediction error to 9.59%. This confirms that the fusion of engineered physical features with learned temporal patterns is critical for the enhanced performance.
3) The use of a physics-guided composite loss function, which penalizes violations of monotonicity, proved beneficial for model training and stability. The PG-FFN exhibited a more stable training process and a lower validation loss.
4) The field validation study demonstrated the model’s robustness in a real-world scenario. The PG-FFN successfully captured the trend of scour depth progression during a flood event, confirming its effectiveness for practical monitoring applications.
Although the study yielded promising results, several limitations were identified. First, while field validation demonstrated the model’s potential, the current training phase relied on data from a calibrated numerical model. To further enhance robustness, future work should consider expanding the training dataset to include multiple bridges with diverse configurations and long-term field measurement data. Second, the model used only the transverse (river flow direction) acceleration component, neglecting potentially informative longitudinal and vertical responses. Incorporating multi-axis or multi-sensor vibration data could enrich the feature space and further improve prediction accuracy. Finally, while the SE block effectively manages feature redundancy, future research could also explore advanced static feature selection techniques to further streamline the physical inputs before training, thereby reducing computational overhead and enhancing explicit interpretability.
Footnotes
Author Contributions
The authors confirm contribution to the paper as follows: study conception and design: Jidong J. Yang, Penghao Deng; data collection: Tien Yee; data processing: Penghao Deng; analysis and interpretation of results: Penghao Deng, Jidong J. Yang; draft manuscript preparation: Penghao Deng, Metin Oguzmert; manuscript review and editing: Jidong J. Yang, Tien Yee, Metin Oguzmert. All authors reviewed the results and approved the final version of the manuscript.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The contents of this paper reflect the views of the authors, who are solely responsible for the facts and accuracy of the data, opinions, and conclusions presented herein. The contents may not reflect the views of the funding agency or other individuals.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work presented in this paper is part of a research project (RP 22-19) sponsored by the Georgia Department of Transportation, United States.
The contents of this paper reflect the views of the authors, who are solely responsible for the facts and accuracy of the data, opinions, and conclusions presented here. The contents may not reflect the views of the funding agency or other individuals.
