Data-driven structural health monitoring for early corrosion detection in the oil and gas pipelines: enhancing sensitivity with unsupervised dimensionality reduction

Abstract

Corrosion presents significant threats to structural integrity within the oil and gas industry, increasing maintenance demands and associated costs. However, traditional inspection techniques often face limitations in detecting early-stage corrosion, particularly under challenging environmental and operational conditions. This study investigates structural health monitoring using low computational cost data-driven approaches, specifically principal component analysis, t-distributed stochastic neighbor embedding, and locally linear embedding to enable early and autonomous corrosion detection capabilities in a scalable manner for embedded systems. Corrosion was induced in a steel pipe using an ionic solution, with nearly 400 GB of guided wave signal data recorded across well-defined corrosion stages via an array of piezoelectric transducers. The results demonstrate that the proposed approach enables the detection of corrosion with a sensitivity of up to 0.39% reduction in the cross-sectional area. Furthermore, the results validate the feasibility of establishing autonomous damage detection thresholds, mitigating the need for periodic inspections. The methodology proved robust, effectively isolating environmental effects without the need for environmental and operational condition compensation techniques. Unsupervised dimensionality reduction effectively detects early structural changes, such as corrosion and transduction loss, with guided waves and optimized parameters, offering strong support for traditional inspection methods.

Keywords

SHM data-driven algorithm dimensionality reduction guided waves corrosion test pipelines

Introduction

Corrosion is a major threat to structural integrity in the oil and gas sector, accounting for approximately 10% of pipeline and tubing failures due to wall thickness loss, which undermines safety margins.¹ Routine inspections are essential yet costly, especially for inaccessible areas. Guided waves (GW) structural health monitoring (SHM), employing permanently installed sensors and remote data transmission, reduces inspection frequency, thus lowering costs while enhancing operational safety and asset durability.^2,3

However, environmental variations often alter GW signals, creating challenges for the reliable detection of gradual defects within monitored structures.^3–7 Although advanced signal processing and data-driven techniques have shown promise in filtering noise and improving the accuracy of damage detection, interpretation of GW data is still largely dependent on operator analysis, which can introduce subjectivity and variability. Limited standardization further complicates consistency across analyses, sometimes resulting in unnecessary maintenance actions or missed critical defects.⁸

Recent advancements in data-driven analysis, along with the reduced costs of sensors and increased computational resources, highlight the potential for enhancing SHM through machine learning applications.⁹ Machine learning is particularly suited for GW data analysis as it facilitates pattern recognition with enhanced resilience to noise, allowing for more reliable anomaly detection even in dynamic environments. Unsupervised algorithms are especially useful in continuous monitoring scenarios where variable environmental and operational conditions are present.^10–12

The large data volumes in SHM require dimensionality reduction for efficient storage and transmission. Embedded SHM systems face constraints in computation, memory, and energy, limiting the feasibility of complex machine learning algorithms.¹³ Even simpler models, such as support vector machines and gradient boosting machines, can be impractical in such environments.¹⁴ To address these challenges, SHM systems are increasingly designed for low-power embedded hardware.^15,16 Patil et al.¹⁵ developed a damage detection algorithm using the common source method, processing guided wave data on an FPGA-based system. Techniques such as principal component analysis and autoencoders enable efficient data transmission while preserving critical structural health information, reducing computational overhead.^17–20

The automation and optimization of processes—key principles of Industry 4.0—are highly relevant to the oil and gas sector. The proximity between processing techniques and sensors enhances response times and reduces costs, particularly when decisions are made locally.²¹ Real-time data monitoring enables more accurate assessments than traditional guided wave inspections, facilitating intelligent maintenance and minimizing unnecessary interventions. However, these systems often operate in harsh environments, such as refineries and offshore platforms, with severe hardware and connectivity constraints. Thus, scalable real-time defect detection requires computationally efficient algorithms that provide clear diagnostic insights.

This study demonstrates the effectiveness of lightweight algorithms for embedded SHM, enabling early corrosion detection compared to traditional guided wave methods. As noted by Batzolis,¹⁴ integrating machine learning into embedded systems necessitates fast, resource-efficient algorithms. To validate this approach, dimensionality reduction techniques were applied to 400 GB of guided wave data to detect corrosion-induced anomalies, making it less than 1 MB. The PCA transformation matrix, with dimensions of $2 \times 6000$ , also occupies less than 1 MB of memory, leveraging its compact orthogonal basis to project raw guided wave signals into a low-dimensional latent space. Detecting minimal wall thickness reductions without environmental and operational condition (EOC) compensation enhances early defect sensitivity and supports conventional inspections.

This work advances pipeline corrosion monitoring through the development of a high-fidelity dataset capturing real corrosion defects and their temporal progression over a 13-month experimental period, including associated environmental and operational conditions. A comparative analysis of reduced-dimension signals generated by computationally efficient linear and nonlinear algorithms demonstrates their superiority over traditional methods in cost, stability, and robustness to environmental variability, reducing risks of misinterpretation. Complementing this, a validated anomaly detection framework is introduced, optimized for computational efficiency and scalability to address hardware and connectivity constraints in embedded systems.

Dataset

Sample

For the inspections, a collar of piezoelectric transducers was developed. The collar is composed of two independent rings, each with $16$ pairs of polarized transducers designed for the generation of $T (0, 1)$ mode in the pipe, resulting in most of the acoustic energy being concentrated in this mode or, in cases of mode conversion, in flexural modes that share shear movements, such as $F (1, 2)$ and $F (2, 2)$ .

The two-ring configuration, as described in the study of Oliveira HTHd,²² aims to optimize the directionality of the inspection, allowing for a detailed analysis of wave propagation and defect identification.

The transducers used are made from modified lead zirconate titanate (PZT), model PIC255 from PiCeramics, selected for their thermal and electromechanical properties, which favor operation in dynamic systems subject to large temperature variations.

The collar was built by brazing the transducers to a fiberglass plate coated with copper, a material chosen for its flexibility and ability to adjust to the curvature of the pipe, as can be seen in Figure 1.

Figure 1.

Flexible fiberglass transducer collar with a copper coating permanently coupled to the pipe.

The installation area was sanded to remove insulation, and the plate was fixed to the seam-welded steel pipe ( $8$ inches in diameter, schedule $40$ , and $5.3$ m in length) using Araldite^® AV 138 epoxy adhesive, combined with HV 998 hardener.²³ The adhesive was cured under controlled temperature conditions by means of a heat belt wrapped around the region near the collar, which was set to approximately $50^{°} C$ using a potentiometer. This heating process lasted for about $2$ h, ensuring adequate stability and coupling of the transducers.

In the tests conducted, the pipeline was exposed and not buried. If the pipeline were buried, the primary difference would be a reduction in the propagation range of guided waves.²⁴ This reduction occurs due to increased energy dissipation resulting from interactions with the surrounding medium, which attenuates the wave energy and restricts its effective transmission along the structure.

The sample was placed on two metallic rollers with a diameter of $9$ cm, with their centers positioned $36.5$ cm from the left edge and $60.5$ cm from the right edge. These rollers were mechanically locked to prevent any changes in the sample support conditions.

The positioning of the transducers and the corrosion site was determined after an analysis of wave propagation in a finite element model. In this model, a mesh of a pipe subjected to torsional force on its external surface was constructed, replicating the experimental conditions with the same geometry and frequency spectrum, simulating a shear piezoelectric crystal. The center of the collar was positioned $1.2$ m from the farthest edge of the defect (left edge) to avoid interference caused by edge reflections near the defect site.

For the excitation of the piezoelectric transducers, an Ormsby-type waveform was chosen, as expressed in Equation (1),²⁵ generated at a frequency of $3$ MHz, windowed with a Hanning function, and with a frequency range between $20$ kHz and $70$ kHz, as shown in Figure 2. The selection of this frequency range was based on the analysis of the dispersion curve of the waveguide (Figure 3), aiming to optimize the propagation of $T (0, 1)$ waves in the region of interest for defect detection. This frequency range is intended to operate in a region of the dispersion curve with minimal variation in group velocity, thereby minimizing the dispersion of the modes used for inspection ( $T (0, 1)$ , $F (1, 2)$ , and $F (2, 2)$ ).

\begin{matrix} f (t) = [\frac{{(π f_{4})}^{2}}{π f_{4} - π f_{3}} \sin c^{2} (π f_{4} t) - \frac{{(π f_{3})}^{2}}{π f_{4} - π f_{3}} \sin c^{2} (π f_{3} t)] \\ - [\frac{{(π f_{2})}^{2}}{π f_{2} - π f_{1}} \sin c^{2} (π f_{2} t) - \frac{{(π f_{1})}^{2}}{π f_{2} - π f_{1}} \sin c^{2} (π f_{1} t)] . \end{matrix}

Figure 2.

System excitation waveform—Ormsby in the frequency domain.

Figure 3.

Dispersion curve for an 8-inch diameter, schedule 40 steel pipe, illustrating the relationship between wave group velocity and frequency for the guided wave modes of interest.

Controlled corrosion test

A closed electrical circuit was established by connecting the pipe as the anode and a stainless steel electrode as the cathode to a DC power supply, as can be seen in Figure 4. In this setup, a synthetic seawater solution served as the electrolyte to enable electrochemical reactions at the metal-solution interface, while the application of a constant current from the DC power supply enhanced the ionic conductivity and actively promoted controlled corrosion in the pipe region.

Figure 4.

(1) Corrosion device with pumping system (2) Ionic solution (3) Direct current power supply used to force the corrosion through current control.

The corrosion device was printed in high-density ABS material, with a diameter of $130$ mm. It is equipped with a flat stainless steel plate and two pumps to circulate the synthetic seawater solution, which is intended to induce homogeneous corrosion in the designated region. The center of the circumference of the device was installed $2.3$ m from the transducer collar, and the corrosion was conducted over an area of approximately $144 c m^{2}$ .

Data generation and acquisition

The signal generation was performed using custom equipment, with the USB-6366 OEM board from National Instruments as the central element. This multichannel acquisition board features maximum acquisition rates of $2$ MS/s and generation rates of $3.33$ MS/s, with two generation channels and eight simultaneous sampling reception channels, allowing for the control and monitoring of the transducers. A power amplifier is connected to the generation channel, providing the necessary power for exciting the emitters.

The multiplexing circuit interconnects the sensors with the excitation and acquisition subsystems. The acquisition board has eight digitization channels, which, through the multiplexing circuit, allows for the acquisition of a total of $32$ channels, digitizing eight sensors simultaneously. The actuators are individually excited with $100$ Vpp, and the readings are taken in groups of eight sensors at a time, with sufficient intervals to ensure the complete dissipation of the wave emitted by the previously excited actuator.

The signals were acquired at a rate of $2 .$ MHz. The signals captured by the sensors are subjected to a $55$ dB pre-amplification stage, preceded by a filtering stage to prevent from aliasing.

The system is also equipped with four channels for temperature measurement, designed for reading sensors with a digital interface via the I2C protocol. Of these four channels, three monitor the temperature in the transducer-coupling region, contributing to thermal compensation procedures, and one monitors the relative humidity. Temperature is recorded before each data collection stage, with eight averages obtained per guided wave signal to reduce electromagnetic random noise.

The electronics of the equipment allow for voltage and electric current measurement at each transducer of the collar, enabling the evaluation of the electrical impedance. These parameters are important for analyzing the current condition of the sensors, potentially indicating component ageing in the collar or even a failure in one of the transducers.

Guided wave acquisition routine

The procedure was carried out over a period of $13$ months, aiming to monitor the progression of corrosion in the pipeline under real operating conditions. Data collection was conducted at ambient temperature, allowing the capture of natural thermal variations over the course of the experiment. In total, approximately $400$ GB of guided wave data were collected throughout the experiment, enabling detailed monitoring of the behavior of the system under corrosion. The experiment was divided as described below:

Coupling of the corrosion device: Installation of a controlled electrolytic corrosion device, with a precisely applied current to induce corrosion in the pipeline over a predetermined period.

Removal of the device and cleaning: After the corrosion period, the device was removed, and the affected area was cleaned to eliminate corrosion by-products.

System integrity check: Verification of the integrity of the transducers and connections by analyzing impedance and capacitance, ensuring the reliability of the acquired data.

Corroded structure data collection: Acquisition of ultrasound and guided wave data from the corroded area, enabling the analysis of structural changes after each corrosion stage.

The experiment was conducted in iterative stages until a reduction of at least 50% of the pipe wall thickness was achieved, as summarized in Table 1. The abbreviation CSA refers to the cross-sectional area, representing the area of the cross-section of the pipe removed due to corrosion. In this study, CSA is an estimate based on measurements taken at 17 points, as illustrated in Figure 5.

Table 1.

GW signal data collected to monitor corrosion progression.

Corrosion (%)	Number of GW tests	CSA%
0	598	0.00
2	297	0.39
3	205	0.58
11	132	2.14
20	410	3.89
30	118	5.81
50	131	9.62

GW: guided waves; CSA: cross-sectional area.

Figure 5.

Positions of $17$ ultrasonic thickness measurements on the corroded area of the pipe section. Measurements taken with a GE thickness gauge to assess wall loss.

The cumulative thickness loss at each stage was monitored using a GE DM5E series ultrasonic thickness gauge. The corrosion area was characterized by measurements at $17$ distinct points, as illustrated in Figure 5.

The signals collected by the sensors were initially subjected to a fourth-order Butterworth band-pass filter, with cut-off frequencies at $20$ kHz and $70$ kHz. These frequencies were chosen to isolate the band of interest in the guided wave signals, ensuring a flat response within this range while attenuating low and high frequency levels of noise. A Hanning window was then applied to the signals to smooth the edges and to prevent energy leakage in the frequency domain. This technique is effective in preserving the integrity of the characteristics of the signal while attenuating unwanted components introduced by the original signal truncation.

The raw data matrix is preprocessed to generate a matrix $Y$ with dimensions $n \times t \times m$ , where each row represents an acquisition, the columns indicate signal amplitude over $t$ time steps, and the tensor depth, comprising $m$ dimensions, corresponds to the different sensors. The full experimental schema is illustrated in Figure 6. After extracting the modes $T (0, 1)$ , $F (1, 2)$ , and $F (2, 2)$ , a new matrix $X$ with dimensions $n \times t$ is produced for each mode. Each mode-specific matrix $X$ is then used for dimensionality reduction as presented in this work.

Figure 6.

Experimental schema.

Classical SHM approach

Temperature compensation

To address this challenge, non-destructive evaluation techniques employ baseline signals as reference points.²⁶ An initial set of baseline measurements was acquired prior to defect introduction, and subsequently, interrogation measurements (obtained with the defect present) were subtracted from the baseline data to isolate the defect-related signals.

To mitigate thermal variation effects on the acquired signals, a series of temperature compensation techniques was applied to the data for the first three modes in the torsional family ( $T (0, 1)$ , $F (1, 2)$ , and $F (2, 2)$ ). These specific modes were selected due to their sensitivity and relevance for pipeline inspection.

The variations in temperature and air relative humidity to which the system was exposed throughout the experiment are presented in Figure 7.

Figure 7.

Variation of environmental conditions throughout the experiment: (a) temperature histogram for data collected along all the corrosion steps and (b) air relative humidity histogram for data collected along all the corrosion steps.

Initially, the optimal baseline selection (OBS) technique was used to select the most suitable baseline signal for comparison with the data from each corrosion stage. This technique involved subtracting the baseline signal from the acquired signal at the damaged stage, followed by calculating the root mean square (RMS) value of the resulting difference. The baseline signal that resulted in the lowest RMS value was selected as the reference for the subsequent stages.

After selecting the baseline, the baseline signal stretch (BSS) technique was applied. This method enhanced the signal-to-noise ratio and allowed the signal in the defect region to be highlighted, as shown in Figure 8. It is important to note that there are two dead zones in the signal, where no reliable information can be obtained. The first zone covers the initial region, up to approximately 1.8 m, and is affected by crosstalk and the shadow caused by reflection from the left edge. The second zone is located at the end of the duct caused by reflection and mode conversion at the right edge.

Figure 8.

A-Scan for BSS and optimal baseline subtraction for the pipeline.

Scientific machine learning in SHM applications

With the advancement of technology and computational power, data-driven methods have become increasingly relevant in SHM,^9,27 making the analysis of large volumes of data technically and financially feasible. The primary goal of these methods is to identify patterns, detect anomalies, and monitor the progression of damage as it occurs.²

Supervised machine learning is based on establishing mappable relationships between known input and output values, where each input is associated with a predetermined target value.^28,29 In SHM applications, this approach would require a large amount of data from various stages of structural damage to stablish a model and associate specific conditions with the corresponding system responses. This is unfeasible, as the actual state of the structure is often unknown, making it impossible to label or classify the data appropriately. Moreover, the model generated for a particular structure may not generalize to other structures, even if they are of similar types, due to intrinsic differences between them.^30–32

To overcome these limitations, unsupervised and hybrid deep learning methodologies have gained increasing attention. For example, a deep autoencoder combined with a one-class support vector machine has been shown to facilitate accurate damage detection utilizing only baseline acceleration data for training.³³ Likewise, a density peaks-based fast clustering algorithm has been proposed to effectively detect and localize structural damage without the need for labeled data.³⁴ Recent advancements have further broadened the applicability of unsupervised novelty detection approaches, including comprehensive comparative analyses evaluating the performance of diverse machine learning and deep learning methods for structural damage detection.³⁵ These approaches exploit normal data from undamaged conditions to develop statistical models, which are subsequently employed to identify anomalies in damaged structures, such as loosened bolts in steel bridges.

Unsupervised methods are fundamentally designed to autonomously infer the probability density characteristics from the data, extracting meaningful patterns and information from its underlying structure without the need for supervisory input or labels.³⁶ These methods offer a distinct advantage by requiring only the creation of a comprehensive database of the structure in its undamaged state during the initial phase. This reference database serves as a baseline for comparing future conditions that may indicate potential damage or anomalies. A key benefit of this approach is that data from damaged structures are not required during the initial data acquisition.^37,38 Furthermore, the elimination of the need for data labeling addresses a critical challenge in practical, real-world operational environments, where labeling can be infeasible.³⁸

Building on these advancements, a comprehensive review of deep learning-based SHM applications has underscored key innovations, including automated feature extraction, integration with digital twins, and computer vision techniques for detecting structural damage under diverse conditions.³⁹ These developments exemplify the transformative potential of deep learning in addressing traditional SHM challenges, while paving the way for more robust and scalable solutions.

Abbasi et al.¹⁹ investigated unsupervised machine learning techniques for the SHM of a carbon-fiber–reinforced polymer (CFRP) plate embedded with piezoelectric transducers to detect and localize damage under varying temperature conditions. The study modeled reversible damage using a 10 mm-thick aluminum disc placed at four distinct locations on the plate and evaluated the performance of four dimensionality reduction algorithms—principal component analysis (PCA), kernel PCA (KPCA), t-distributed stochastic neighbor embedding (t-SNE), and autoencoder (AE)—in distinguishing structural states. Two evaluation strategies were employed: score plots (utilizing latent dimensions) and damage index (DI) plots (based on Q-/T2-statistics).

The autoencoder demonstrated superior performance in the first strategy, successfully detecting and differentiating damage locations even under temperature fluctuations, whereas PCA, KPCA, and t-SNE achieved effective damage detection only within limited temperature ranges. For the DI plot strategy, all methods effectively identified damage across all temperature conditions, obviating the need for additional temperature compensation. The findings emphasize the exceptional capability of the autoencoder for damage localization and highlight the robustness of Q-/T2-statistics for temperature-invariant damage detection.

Curse of dimensionality

The curse of dimensionality refers to a set of phenomena that arise when working with data in high-dimensional spaces. This concept was initially described by the study of Bellman⁴⁰ as an optimization problem in multidimensional spaces that demands a significant amount of computational memory due to the complexity involved in analyzing each additional dimension. When the number of dimensions is high, datasets become very sparse.⁴¹ This occurs because, as dimensions increase, the search space expands, creating a matrix where the data are distributed more sparsely, with regions of high and low data density. This sparsity may prevent from drawing meaningful conclusions from the data. Consequently, models developed using sparse data may lack robustness, leading to over-fitting and poor generalization to unseen conditions.⁴²

Moreover, the concentration of measures in high-dimensional spaces diminishes the effectiveness of traditional distance metrics, such as Euclidean distance.^43,44 This deterioration in discriminative power complicates the identification of proximity and similarity among data points, which is the objective of an anomaly detection system. When distances converge to similar values, it gets challenging to distinguish between normal and abnormal behavior.

Additionally, as the dimensionality of the data increases, the volume of noise introduced into the system also rises⁴⁵ and data becomes less informative. This noise can obscure the underlying patterns within the data. As a result, it compromises the capacity of accurately identifying structural issues without incurring false positives or negatives, further complicating the decision-making process when detecting damages in the structure.

Dimensionality reduction methods

Dimensionality reduction, also known as manifold learning,⁴⁶ is an area within machine learning and data-driven methods that seeks to extract relevant features from a high-dimensional dataset to better represent it or to separate distinct information. Dimensionality reduction methods serve as an exploratory analysis tool, and through them, patterns related to defects and anomalies in the underlying data structure can be found. The idea is to represent the data in a reduced dimension without losing important information, eliminating noise and redundancies.⁴⁷ Dimensionality reduction can be classified into three major categories: spectral dimensionality reduction, probabilistic dimensionality reduction, and neural network-based dimensionality reduction.⁴⁶

Spectral dimensionality reduction involves eigenvalue decomposition, having a geometric bias. This category includes PCA, Isomap, and Locally Linear Embedding (LLE), among other methods. Probabilistic dimensionality reduction, on the other hand, uses statistical models that assume the data in the high-dimensional space are governed by a smaller set of latent variables in a lower-dimensional space, representing the underlying structure of the data. According to the study of Ghojogh,⁴⁶ these methods are more robust, compared to spectral ones, in dealing with missing data.

Nonlinear dimensionality reduction methods are reduced to the task of recovering low-dimensional structures within a higher-dimensional structure and are divided into two categories: global and local.⁴⁸ Global analysis enables the identification of separation and transitions across different corrosion levels, thus reflecting the progression of the corrosion process. Local analysis, in turn, captures specific variations and anomalies within each corrosion level, aiming to detect emerging patterns in the data that may indicate potential structural integrity issues or malfunctions in the operational or transduction systems.

According to the study of De Silva and Tenenbaum,⁴⁹ the effectiveness of dimensionality reduction methods heavily depends on inherent structure of the data. The primary goal, however, is to balance the trade-off between achieving accurate anomaly detection, specifically for early corrosion detection and the computational resources required.

Three low computational cost and well-established methods were selected for analysis: one linear, PCA, and two nonlinear methods, t-SNE and LLE. This selection aims to leverage the strengths of each approach, where PCA is designed to handle linear relationships, and identify and extract principal components that best represent the patterns associated with defects. This approach seeks to minimize data complexity and to highlight aspects that are indicative of anomalies, thereby enhancing early corrosion detection.

On the other hand, t-SNE and LLE methods provide a more intuitive visualization of relationships within the data, allowing anomalies to be visually distinguished. t-SNE provides detailed data visualization despite its higher computational demands,⁵⁰ and LLE serves as a computationally economical alternative for capturing nonlinear patterns.⁵¹ This direct visualization of anomalies in these methods can reveal patterns that may not be evident in the original data representation. Training neural-networks, however, demand significant computational resources. Although certain embedded electronics with advanced computational capabilities can support this task, the primary role of the embedded system in this context is to generate waveforms and ensure precise data acquisition. Consequently, computational resources must be allocated primarily to these critical functions, leaving minimal capacity for additional processes, such as training deep learning networks.

To optimize the performance of these methods, all parameter settings were determined heuristically, considering the unique characteristics and dimensionality transformation strategies of each approach.

Principal component analysis

PCA is a widely used statistical method for feature extraction, data compression, and as a preprocessing step.⁵² The algorithm was initially proposed by Pearson⁵³ and Hotelling,⁵⁴ and its primary goal is to project a high-dimensional dataset $R^{d}$ onto a reduced subspace $R^{p}$ , where $p << d$ .

The objective of PCA is to maximize the variance of the data projected onto a latent subspace of $p$ dimensions. To avoid distortions and ensure that the principal directions represent relative variations, the data need to be centered. Centering the data removes the mean, thereby eliminating interference caused by the mean shift.^46,51

The centered data matrix $X_{c} \in R^{n \times d}$ is obtained by subtracting the mean vector $μ \in R^{d}$ from each observation, as shown in Equation (2):

X_{c} = X - μ .

(2)

The covariance matrix $Σ \in R^{d \times d}$ is calculated using Equation (3):

Σ = \frac{1}{n - 1} X_{c}^{T} X_{c},

(3)

where $X_{c}^{T}$ is the transpose of the centered data matrix.

The next step is to project the data onto $p$ principal directions, that is, to find $p$ eigenvectors of the covariance matrix $Σ$ , associated with the $p$ largest eigenvalues $λ_{1}, λ_{2}, \dots, λ_{p}$ . The eigenvalue and eigenvector problem is expressed in Equation (4):

Σ U = U Λ,

(4)

where $U \in R^{d \times p}$ is the matrix whose columns are the eigenvectors $v_{1}, v_{2}, \dots, v_{p}$ , and $Λ = diag (λ_{1}, λ_{2}, \dots, λ_{p})$ is the diagonal matrix containing the $p$ largest eigenvalues $λ_{1}, λ_{2}, \dots, λ_{p}$ . The projection of the data onto the subspace generated by $U$ is given by Equation (5):

X_{pro j} = X_{c} U,

(5)

where $X_{pro j} \in R^{n \times p}$ contains the representation of the data in the $p$ -dimensional subspace.

To perform the data projection in multiple dimensions $p > 1$ , the problem becomes maximizing the variance projected simultaneously onto all $p$ principal directions, which is obtained by maximizing the trace of matrix $U^{T} Σ U$ :

max_{U} tr (U^{T} Σ U), such that U^{T} U = I,

(6)

where $I$ is the identity matrix.

In the study of Ma et al.,⁵⁵ a variant of PCA is employed for monitoring defects in pipes by adaptively selecting the appropriate number of components to reduce noise and eliminate redundancy in the data.

Figure 9 presents the results of PCA using two components, where the first two components are illustrated. The data projection is color-coded according to the degree of corrosion, ranging from 0% (base, intact pipe) to 50% of the average wall thickness loss of the pipe.

Figure 9.

Dimensionality reduction for each propagation mode, utilizing PCA with two components to illustrate well-defined corrosion stages. The data points are color-coded to represent varying degrees of corrosion, with colors transitioning from 0% intact (base, intact pipe) to 50% mean wall thickness loss. (a) $T (0, 1)$ . (b) $F (1, 2)$ . (c) $F (2, 2)$ .

Although PCA is a linear method, it still provides moderate manifold separation for the $F (2, 2)$ mode, which accounts for approximately 72% of the total variance.

Locally linear embedding

LLE is a dimensionality reduction method that aims to preserve the local relationships between data points when mapping them from a high-dimensional space to a lower-dimensional space.

Although LLE is a nonlinear method, it assumes that each data point can be represented as a linear combination of its nearby neighbors.⁵⁶ The global coordination of these local linear models, introduced in the study of Roweis et al.,⁵⁷ allows for the smooth modeling of high-dimensional data in a global coordinate system, promoting the seamless integration of multiple local linear models into a single global representation in reduced dimensions, particularly useful during unsupervised learning, where the structure of the data is unknown a priori.

LLE identifies the $k$ nearest neighbors of $x_{i}$ based on Euclidean distance and then calculates a set of weights $W_{ij}$ that minimize the reconstruction cost function presented in Equation (7).

E (W) = \sum_{i} ‖ x_{i} - \sum_{j \in N (i)} W_{ij} x_{j} ‖^{2},

(7)

where $N (i)$ denotes the set of the $k$ nearest neighbors of $x_{i}$ , ensuring that each point $x_{i}$ is reconstructed only from its neighbors, forcing the weight $W_{ij} = 0$ otherwise. The constraint $\sum_{j} W_{ij} = 1$ is imposed to ensure symmetry and invariance to transformations such as rotations, translations, and scaling of the data.

After calculating the weights $W_{ij}$ , LLE seeks to find a set of lower-dimensional points $y_{i}$ that minimize the cost function in Equation (8), which is based on locally linear reconstruction errors.⁵⁶

Φ (Y) = \sum_{i} ‖ y_{i} - \sum_{j \in N (i)} W_{ij} y_{j} ‖^{2}

(8)

To find the low-dimensional configuration that best preserves the neighborhood relations from the original space, this cost minimization can be converted into an eigenvalue problem.⁵⁸ In this context, the matrix $M = {(I - W)}^{T} (I - W)$ is constructed, where $I$ is the identity matrix. The eigenvectors corresponding to the smallest non-trivial eigenvalues of this matrix represent the coordinates in the new lower-dimensional space, ensuring that the local structure of the original data is preserved.

In Figure 10, the results of the LLE are presented. It is evident that, under the configurations used, for discrete corrosion stages, the $T (0, 1)$ and $F (1, 2)$ modes are able to more clearly distinguish between different levels of corrosion. In contrast, for the $F (2, 2)$ mode, the distinction is primarily limited to identifying the presence or absence of a defect in the region, without providing any qualitative indication of different stages of corrosion, as evidenced by the lack of visually distinct clusters.

Figure 10.

Dimensionality reduction using LLE with 50 neighbors for all wave propagation modes. Data points are color-coded to indicate corrosion levels, ranging from 0% to 50% average wall thickness loss. (a) $T (0, 1)$ . (b) $F (1, 2)$ . (c) $F (2, 2)$ .

Although there is no clear explanation for the superior performance of the $F (2, 2)$ mode in machine learning algorithms, similar behavior has been observed in traditional inspection methods. In Praetzel et al.,⁵⁹ the $F (2, 2)$ mode outperformed the $T (0, 1)$ and $F (1, 2)$ modes in defect detection, likely due to its sensitivity to geometric variations and mode conversion in asymmetric structures, such as areas affected by localized corrosion.

By adjusting the hyper-parameter that defines the number of neighbors, which determines how many nearby points are used to reconstruct the local structure in the lower-dimensional projection, it is possible to enhance the resolution for this mode. However, such a modification might degrade the resolution for the other modes.

T-Distributed stochastic neighbor embedding

t-SNE is a dimensionality reduction technique that is particularly effective for visualizing high-dimensional data in a two-dimensional or three-dimensional space. The method is widely used for visualization and exploratory analysis, and it performs excellently compared to other methods such as Isomap and LLE, particularly in preserving the structure of clusters.⁶⁰

In addition to preserving local structure, t-SNE is reasonably effective at capturing the global structure of the data, representing broader relationships between data groups. As noted by Maaten and Hinton in their study,⁶¹ the use of a t-Student distribution to calculate point similarities in the low-dimensional space allows for the modeling of larger distances more effectively, preventing points that do not belong to the same cluster in the high-dimensional space from being placed close together in the reduced dimension.

t-SNE optimizes the Kullback–Leibler (KL) divergence between the two similarity distributions, one in the high-dimensional space and the other in the reduced dimension, which allows the method to capture local relationships between points and maintain relative distances between similar points in the lower dimension.⁶²

Additionally, Kobak and Berens⁶³ show that using PCA on the data as a starting point for t-SNE can improve the balance between global and local relationships, and Agis and Pozo⁶⁴ use principal component reduction via PCA before applying t-SNE. Although PCA is a linear algorithm and may not fully capture complex nonlinear structures, it may still provide a useful initialization for t-SNE, which subsequently refines the data projection to preserve local relationships, but should be used with caution in order not to compromise the analysis.

According to Maaten and Hinton,⁶¹ the algorithm begins by modeling the similarities between data points in the original high-dimensional space using Gaussian probability distributions. For each pair of points $x_{i}$ and $x_{j}$ , the conditional probability $p_{j | i}$ , which indicates the similarity of $x_{j}$ being a neighbor of $x_{i}$ , is given by Equation (9):

p_{j | i} = \frac{\exp (- ∥ x_{i} - x_{j} ∥^{2} / 2 σ_{i}^{2})}{\sum_{k \neq i} \exp (- ∥ x_{i} - x_{k} ∥^{2} / 2 σ_{i}^{2})},

(9)

where $σ_{i}$ is the adjustable standard deviation for point $x_{i}$ that controls the width of the local neighborhood.

In the low-dimensional space, the similarities between points $y_{i}$ and $y_{j}$ are modeled using a Student’s t-distribution with one degree of freedom, to handle the tendency of points to cluster into a single dense mass. The similarity probabilities in the low-dimensional space are defined by Equation (10):

q_{ij} = \frac{{(1 + ∥ y_{i} - y_{j} ∥^{2})}^{- 1}}{\sum_{k \neq l} {(1 + ∥ y_{k} - y_{l} ∥^{2})}^{- 1}} .

(10)

Then, t-SNE minimizes the KL divergence between the distributions $p_{ij}$ and $q_{ij}$ using stochastic gradients for optimization, as shown in Equation (11):

KL (P ∥ Q) = \sum_{i \neq j} p_{ij} \log \frac{p_{ij}}{q_{ij}} .

(11)

The t-SNE method prioritizes the preservation of the local data structure as can be seen in Figure 11, where it is possible to identify the formation of well-defined clusters, with minimal overlap between the data points. However, it introduces distortion in the global structure, resulting in apparent nonlinearity in the spacing between clusters representing different levels of corrosion.

Figure 11.

Dimensionality reduction using t-SNE with perplexity adjusted to $30$ , for all wave propagation modes. Data points are color-coded to indicate corrosion levels, ranging from 0% to 50% average wall thickness loss. (a) $T (0, 1)$ .(b) $F (1, 2)$ . (c) $F (2, 2)$ .

t-SNE is effective in representing local structures but has limitations in representing global structures. This implies that inter-cluster distances may not have real significance, potentially leading to misleading interpretations.⁶⁵ As observed, the relative distance between clusters corresponding to 2% and 3% average thickness loss is greater than the distance between 30% and 50%.

By maintaining local relationships, the signals corresponding to the same corrosion level remain close in the reduced dimension, facilitating cluster identification. However, preserving global relationships ensures that the overall configuration of clusters in a high-dimensional space is maintained in low-dimensional space, following a reasonably consistent order of corrosion progression.

Empirical observations indicate that when multiple stages of corrosion are present, local differences tend to be “squeezed” or “compressed.” Notably, there is a tendency for the manifold to become uniform, even when only baseline data are considered. This phenomenon can be influenced by the choice of perplexity or the number of neighbors, parameters that are inherently linked to the distances between data points. Given that t-SNE and LLE emphasize local relationships, they may highlight local properties inherent in the baseline data, such as specific anomalies within a determined corrosion level that tends to become obscured when numerous corrosion stages are present. Consequently, if only defect-free data are included, there is a potential risk of misleading the inspector into perceiving these highlighted characteristics as defects.

SHM and parsimonious representation

Figure 12 illustrates the progression of various estimators used for corrosion damage assessment. The figure presents the traditional SHM approach (depicted by solid gray lines) alongside reduced-dimensional representations derived from PCA, LLE, and t-SNE, shown as blue, red, and black lines, respectively.

Figure 12.

Normalized estimators for corrosion damage are shown here. The solid gray lines represent the classical SHM approach. Among the dimensionality reduction methods, the solid line corresponds to the LLE method, the dashed line to PCA, and the dash-dotted line to t-SNE. The $T (0, 1)$ mode is depicted in blue curves, the $F (1, 2)$ mode in red curves, and the $F (2, 2)$ mode in black curves. The shaded region indicates the same damage level as presented in Figure 9 and Table 1. The horizontal lines represent the outlier detection threshold, the dashed black line corresponds to the SHM normalized distance, corresponding to the classical SHM approach; the dashed orange line correspond to a reduced dimension normalized distance.

To facilitate a meaningful comparison between the SHM data and the data-driven methodologies, all datasets were normalized to a standard normal distribution $N (0, 1)$ , followed by the subtraction of their minimum value to ensure that all data points were strictly positive. This transformation was crucial for converting the data into a strictly positive distribution, enabling proper application of subsequent analysis techniques.

After normalization, a classical $χ^{2}$ outlier detection algorithm was applied.⁶⁶ The dashed lines in the figure indicate the $95 %$ quartile thresholds, with the black dashed line corresponding to the normalized SHM distance and the orange dashed line representing the threshold for the reduced-dimensional representations.

It can be clearly observed that this outlier detection metric is not effective in identifying defects when analyzing the combined OBS + BSS data. However, when reduced-dimensional data are utilized, the algorithms demonstrate the capability to detect defects. In the worst-case scenario, defects as small as approximately 4% of the CSA were detected. Conversely, in the best-case scenario, using $F (2, 2)$ mode with t-SNE or PCA, defects were identified with a sensitivity as low as 0.39% CSA.

Additionally, it is evident that the normalized distance metric fluctuates during the initial data collections. This behavior is attributed to the influence of temperature, which impacts the stability of the system and the formation of the initial support manifold. This phenomenon will be addressed in detail in the subsequent section.

The estimator for the classical SHM approach is simply the maximum value in the defect region. This indicator reveals that, despite effective temperature compensation, a significant level of noise remains in the signal. A clear trend of increasing estimator levels can be observed as corrosion severity escalates; however, noise may interfere with accurate corrosion identification.

The estimator used for the dimensionality reduction methods is described as follows: each point $x_{n}$ is calculated using the formula:

x_{n} = \frac{1}{n} \sum_{i = 1}^{n} ‖ D_{n} ‖

(12)

where $D_{n}$ represents the Euclidean distance from point/test $n$ to the centroid of the pristine (non-corroded) pipeline data, and $x_{n}$ is the damage estimator for test $n$ . After calculating $x$ , it is normalized to a $N (0, 1)$ distribution. Following normalization, it is observed that all dimensionality reduction algorithms yield similar results, regardless of the mode: $T (0, 1)$ (blue curves), $F (1, 2)$ (red curves), and $F (2, 2)$ (black curves). The corrosion stages are also smoothed by the estimator used, resulting in no abrupt changes due to a rolling average window whose size incrementally increases, leading to a gradual increase.

The employed method does not provide spatial localization of corrosion but instead serves as an estimator of the signal trend over time, enabling the identification of deviations that suggest the need for further investigation. This approach aligns with the foundational level of the framework proposed by Burgos et al.⁶⁷ (illustrated in Figure 1 of Burgos et al.’s study). Once damage is identified, specific localization methods can be employed, such as the Common Source Method described by Davies.⁶⁸

Given the continuous nature of the damage monitoring process, the presence of outliers does not compromise the ability to identify damage. Instead, it is more critical to observe consistency and discernible trends in signal behavior over time. This becomes particularly apparent when comparing the reduced noise achieved through dimensionality reduction methods to classical SHM data, which benefits from the capacity of these algorithms to filter out noisier components, such as through truncated singular value decomposition (SVD).⁶⁹

This approach is especially significant in embedded systems, where the local processing unit requires a well-defined threshold to detect anomalies. Establishing such thresholds ensures the system avoids generating false positives, thereby preventing unnecessary alerts for system operators.

Temperature influence

Figure 13 illustrates how data-driven algorithms can distinguish environmental effects from structural anomalies without requiring compensation algorithms. The clustering patterns highlight this differentiation. Component 1 shows a strong temperature influence, while corrosion levels exhibit radial distance among them, indicating the ability of the method to differentiate between temperature-driven variations and structural changes. Each dimensionality reduction method uniquely captures these influences, effectively isolating temperature and corrosion effects in the data.

Figure 13.

t-SNE for wave mode $T (0, 1)$ , highlighting temperature conditions and their impact on data clustering.

Although it is possible to separate the thermal effects from those caused by damage, temperature still has a significant impact on the data structure, similar to traditional inspection techniques. Qualitatively, it can be observed that extreme temperatures tend to make it more difficult to distinguish between damage levels. This difficulty in distinguishing corrosion levels at extreme temperatures may arise from the unequal distribution of data across different damage levels. While it is not possible to definitively ascertain the specific cause of this issue, studies¹⁹ indicate there is a temperature range where inter-cluster separation is more evident.

System change detection

To evaluate the capability of the methods in detecting system changes, two transduction channels were intentionally damaged during the last two phases of data acquisition. The integrity of the connections and transducers was ensured through the analysis of individual capacitances throughout the data collection stages, as illustrated in Figure 14 for a temperature of approximately $25^{o} C$ .

Figure 14.

Integrity check for 32 transduction channels at 25°C. Capacitance at 200 kHz with channels 12 and 16 damaged.

In this figure, it is visually possible to differentiate the abrupt change in the capacitance of the damaged channels compared to the others. The measure was performed at 200 kHz to enhance sensitivity to capacitive variations. These changes are greater than those caused by thermal fluctuations.

The deterioration of the transduction channels occurred after reaching 53% of average wall thickness loss in the pipeline. Initially, channel $14$ presented a higher impedance compared to the others and a slightly lower capacitance. However, since this was the initial condition that persisted throughout the entire test, it was considered the integral condition of the transduction channels for robustness evaluation.

The deterioration of channel $16$ occurred due to a cold soldering of the electrical contact between the channel and the transducer, while the deterioration of channel $12$ was caused by the removal of the electrical connection between the channel and the transducer.

The results indicate the capability of the t-SNE method (Figure 15) to detect changes in data structure associated with transduction loss. This is evidenced by the significant separation of clusters that occurs as the transducers degrade. The method effectively captures subtle structural changes, particularly in the context of the collar containing $16$ pairs of transducers. Notably, transduction loss is accentuated by the formation of distinct clusters corresponding to each lost sensor.

Figure 15.

Dimensionality reduction using t-SNE with a perplexity of 30, illustrating well-defined corrosion stages and the impact of damaged transduction channels. The data points are color-coded as follows: 0 indicates all channels intact, 1 represents one damaged channel, and 2 denotes two damaged channels. (a) $T (0, 1)$ . (b) $F (1, 2)$ . (c) $F (2, 2)$ .

Field application in a refinery

The system developed in this work was designed as a digital twin of a SHM system installed in an oil refinery. It served multiple purposes, including validating the methodology and results obtained from field tests. However, its primary objective was to evaluate the method under a new set of environmental conditions. Care was taken to produce two systems that incorporate identical materials and transduction technologies, both installed on pipelines of matching diameters. This ensures that only the environmental conditions differ between the two systems.

The refinery system was installed in December 2022 and operated until September 2024, during which approximately 41 GB of data were collected and transmitted via wireless communication to a centralized storage and analysis hub (Figure 16).

Figure 16.

System installed at the refinery. This system serves as a twin to the one described in this study and incorporates wireless communication for remote data transmission to a central storage and analysis unit.

The collected data were analyzed using classical and data-driven methods, with no anomaly indicators detected. Figure 17 shows the t-SNE of the T(0,1) mode, where clustering patterns remained consistent throughout the operational period, indicating the absence of damage.

Figure 17.

t-SNE visualization of the refinery data, using two components and color-coded by year of operation. The absence of any indicators of damage over the 2 years of monitoring is demonstrated by the clustering patterns, which remain consistent throughout the operational period.

The environment in a refinery is highly aggressive for equipment, with a significant presence of electrical and mechanical noise. Nevertheless, the installed system has demonstrated robustness, withstanding the adverse operational conditions of a refining unit without the need for frequent maintenance.

Figure 18 demonstrates that, following the methodology used for the laboratory system, no defects were identified in the pipe. Data from 2022 were adopted as the pristine state for the calculations. Initially, the same phenomenon observed in Figure 12 can be seen, where the system undergoes a natural transition between low and high temperatures. This behavior is attributed to the significant influence of temperature on the manifold,⁷⁰ as further illustrated in Figure 13. Subsequently, as the temperature cycles stabilize, the values also stabilize with minor fluctuations. After the first year, around test 270, small oscillating fluctuations are observed. However, these remain within the established threshold based on the $χ^{2}$ percentile test. This indicates the absence of corrosion, a conclusion corroborated by the refinery inspection team, which confirmed that no corrosion was present in this pipeline.

Figure 18.

The normalized reduced-dimension data from guided wave tests are represented by the solid blue line, while the orange horizontal line indicates the outlier detection threshold based on the $χ^{2}$ quartile.

Conclusion

An electrolytic corrosion test was conducted on an 8-inch diameter pipeline to evaluate its structural integrity and detect signs of corrosion. This method employs an electrolytic solution that, upon contact with the pipeline material, allows for the observation of the behavior of the metal under simulated corrosion conditions. A classical structural health monitoring analysis using guided waves, applied to the pipeline after the test, confirmed the presence of corrosion and demonstrated effectiveness in early-stage identification of such damage.

Various data reduction techniques (PCA, LLE, t-SNE) were applied to the guided wave modes $T (0, 1)$ , $F (1, 2)$ , and $F (2, 2)$ to streamline the analysis and enhance interpretability of the collected signals. These dimensionality reduction methods successfully highlighted variations in both defective sensors and corrosion-related defects, making it possible to distinguish subtle differences associated with sensor integrity and areas of material degradation. This approach proved crucial for identifying and isolating specific changes in the sensor signals, providing clear indicators of corrosion presence and progression.

By applying a historical average to the dimensionally reduced data, a smoother variation in the data was observed. Furthermore, both linear and nonlinear techniques produced comparable results across the modes $T (0, 1)$ , $F (1, 2)$ , and $F (2, 2)$ , indicating that either approach, regardless of complexity, can effectively capture essential patterns for these modes. Dimensionality reduction also aids significantly in the damage identification process, as it allows for clearer differentiation between standard and anomalous data patterns, thus enhancing the detection and early diagnosis of structural faults.

The physical twin, deployed in a refinery under real-field conditions, demonstrated remarkable stability over a 2-year period, with no significant variations detected. This consistent behavior serves as a robust baseline, suggesting that no damage has occurred within this time frame. The role of the physical twin is crucial in providing continuous, real-time data on the operational integrity of the structure in its actual environment. By replicating the system in a controlled yet realistic setting, the twin allows for ongoing monitoring, validating that the structure remains unaffected by external stressors or corrosive agents that could otherwise lead to degradation or failure.

This study demonstrates the feasibility of establishing a threshold for autonomous anomaly detection using an extensive corrosion dataset from oil pipelines, including laboratory and field tests. Results show that CSA losses as small as $0.39 %$ can be detected in optimal conditions, with a worst-case detection limit of $4 %$ , aligning with literature values. The key advantage is continuous monitoring, eliminating the need for on-site inspections and a human operator.

Additionally, the findings validate the scalability of computationally efficient dimensionality reduction algorithms for embedded systems with hardware and connectivity constraints. These algorithms enable early corrosion detection, supporting predictive maintenance strategies that optimize schedules, reduce unnecessary interventions, and improve structural asset management.

Footnotes

Acknowledgements

The authors would like to thank the Brazilian National Agency for Petroleum, Natural Gas and Biofuels (ANP) and Petrobras for providing the resources for this project. Prof. Dr. Thomas G.R. Clarke would like to acknowledge funding provided through the CAPES-PROEX Program of PPGE3M and through the productivity scholarship provided by CNPq.

Declaration of generative AI in scientific writing

During the preparation of this work the author(s) used ChatGPT in order to improve the readability and language of the manuscript. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the published article.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Lúcio de Abreu Corrêa

References

Sharma

Tewari

Biswas

, et al. A comprehensive study of techniques utilized for structural health monitoring of oil and gas pipelines. Struct Health Monit 2024; 23(3): 1816–1841.

Farrar

Worden

Structural health monitoring: a machine learning perspective. Chichester: John Wiley Sons, 2012.

Rose

JL.

Ultrasonic guided waves in solid media. Cambridge: Cambridge University Press, 2014.

Moll

Kexel

Pötzsch

, et al. Temperature affected guided wave propagation in a composite plate complementing the Open Guided Waves Platform. Sci Data 2019; 6(1): 1–9.

Attarian

Cegla

Cawley

Long-term stability of guided wave structural health monitoring using distributed adhesively bonded piezoelectric transducers. Struct Health Monit 2014; 13(3): 265–280.

Lanzara

Yoon

Kim

, et al. Influence of interface degradation on the performance of piezoelectric actuators. J Intell Mater Syst Struct 2009; 20(14): 1699–1710.

Park

Farrar

Di Scalea

, et al. Performance assessment and validation of piezoelectric active-sensors in structural health monitoring. Smart Mater Struct 2006; 15(6): 1673–1673.

Mariani

Heinlein

Cawley

Compensation for temperature-dependent phase and velocity of guided wave signals in baseline subtraction for structural health monitoring. Struct Health Monit 2020; 19(1): 26–47.

Azimi

Eslamlou

Pekcan

Data-driven structural health monitoring and damage detection through deep learning: state-of-the-art review. Sensors 2020; 20(10): 2778.

10.

Bishop

CM.

Pattern recognition and machine learning. New York, NY: Springer, 2006.

11.

Liu

Yoo

Xing

, et al. Deep unsupervised domain adaptation: a review of recent advances and perspectives. APSIPA Trans Signal Inform Process 2022; 11(1): 25–25.

12.

Hassan

Lee

CG.

Alleviating confirmation bias in perpetually dynamic environments: continuous unsupervised domain adaptation-based condition monitoring (CUDACoM). Eng Appl Artif Intell 2024; 137: 109057.

13.

Lee

. Trends of modern processors for AI acceleration. In: 2021 18th International SoC design conference (ISOCC), Jeju, South Korea, 2021. pp. 227–227.

14.

Batzolis

Vrochidou

Papakostas

GA.

Machine learning in embedded systems: limitations, solutions and future challenges. In: 2023 IEEE 13th Annual computing and communication workshop and conference (CCWC), Las Vegas, 2023. pp. 0345–0350.

15.

Patil

Banerjee

Tallur

Smart structural health monitoring (shm) system for on-board localization of defects in pipes using torsional ultrasonic guided waves. Sci Rep 2024; 14: 24455.

16.

Tusun

Metin

Tigrel

, et al. Embedded machine learning system design for post-earthquake structural health assessment. In: 2024 32nd signal processing and communications applications conference (SIU), Mersin, Turkey, 2024. pp. 1–4.DOI:10.1109/SIU61531.2024.10600859.

17.

Severson

Ghosh

Unsupervised learning with contrastive latent variable models, https://arxiv.org/abs/1811.06094.1811.06094, 2018.

18.

Rébillat

Mechbal

Damage localization in geometrically complex aeronautic structures using canonical polyadic decomposition of lamb wave difference signal tensors. Struct Health Monit 2020; 19(1): 305–321.

19.

Abbassi

Römgens

Tritschel

, et al. Evaluation of machine learning techniques for structural health monitoring using ultrasonic guided waves under varying temperature conditions. Struct Health Monit 2022; 22(2): 1308–1325.

20.

Schneider

Xhafa

Anomaly detection and complex event processing over IoT data streams. Amsterdam: Elsevier, 2022.

21.

Wollschlaeger

Sauter

Jasperneite

. The future of industrial communication: automation networks in the era of the internet of things and industry 4.0. IEEE Ind Electron Mag 2017; 11(1): 17–27.

22.

Oliveira

HTHd

. Projeto de um colar de ondas guiadas para aplicação em tubulação enterrada. Dissertation, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil, 2017.

23.

Carbas

RJC

Marques

EAS

da Silva

LFM

, et al. Effect of cure temperature on the glass transition temperature and mechanical properties of epoxy adhesives. J Adhes 2014; 90(1): 104–119.

24.

Jacques

de Oliveira

dos Santos

RWF

, et al. Design and in situ validation of a guided wave system for corrosion monitoring in coated buried steel pipes. J Nondestr Eval 2019; 38(3): 65.

25.

Ryan

Ricker, Ormsby and Klander, Butterworth, a choice of wavelets. CSEG Recorder Mag 1994; 19(7): 8–9.

26.

de Sá Rodrigues

Giannakeas

Khodaei

, et al. Probability based damage detection on a composite fuselage panel based on large data set of guided wave signals. NDT E Int 2023; 139: 102924.

27.

Kerschen

Golinval

JC.

Dimensionality reduction using linear and nonlinear transformation. In: Encyclopedia of Structural Health Monitoring, vol. 33. Chichester: John Wiley & Sons, Ltd, 2009, pp. 1–13.

28.

Langer

Falsaperla

Hammer

. Chapter 6 - a posteriori analyses—advantages and pitfalls of pattern recognition techniques. In: Langer

Falsaperla

Hammer

(eds.) Advantages and pitfalls of pattern recognition, computational geophysics, vol. 3. Elsevier, 2020. pp. 237–259.

29.

Jung

. Machine learning: the basics (Machine Learning: Foundations, Methodologies, and Applications). 1st ed. Singapore: Springer, 2022.

30.

Canonaco

Roveri

Alippi

, et al. Corrosion prediction in oil and gas pipelines: a machine learning approach. In: 2020 IEEE international instrumentation and measurement technology conference (I2MTC). Dubrovnik, Croatia: IEEE, pp. 1–6.

31.

Fang

Cheng

Gai

, et al. Development of machine learning algorithms for predicting internal corrosion of crude oil and natural gas pipelines. Comput Chem Eng 2023; 177: 108358.

32.

Eltouny

Gomaa

Liang

Unsupervised learning methods for data-driven vibration-based structural health monitoring: a review. Sensors 2023; 23(6): s23063290.

33.

Wang

Cha

YJ.

Unsupervised deep learning approach using a deep auto-encoder with a one-class support vector machine to detect damage. Struct Health Monit 2021; 20(1): 406–425.

34.

Cha

Wang

Unsupervised novelty detection–based structural damage localization using a density peaks-based fast clustering algorithm. Struct Health Monit 2018; 17(2): 313–324.

35.

Wang

Cha

YJ.

Unsupervised machine and deep learning methods for structural damage detection: a comparative study. Eng Rep 2022; 7(1): e12551.

36.

Hastie

Tibshirani

Friedman

The Elements of statistical learning. 2 ed. Springer Series in Statistics, New York, NY: Springer New York, 2009.

37.

Dervilis

Shi

Worden

, et al. Exploring environmental and operational variations in SHM data using heteroscedastic Gaussian processes. In: Pakzad

Juan

(eds.) Dynamics of civil structures, Volume 2. Cham: Springer International Publishing, pp. 145–153.

38.

Malekloo

Ozer

AlHamaydeh

, et al. Machine learning and structural health monitoring overview with emerging technology and high-dimensional data source highlights. Struct Health Monit 2022; 21(4): 1906–1955.

39.

Cha

Ali

Lewis

, et al. Deep learning-based structural health monitoring. Autom Constr 2024; 161: 105328.

40.

Bellman

RE.

Adaptive control processes. A Guided Tour. Princeton: Princeton University Press, 1961.

41.

Banks

Fienberg

SE.

Data mining, statistics. In: Meyers

(ed.) Encyclopedia of Physical Science and Technology, 3 ed. New York: Academic Press, 2003, pp. 247–261.

42.

Kok

Koronacki

Lopez

Mantaras

, et al. (eds.) Machine learning: ECML 2007: 18th European conference on machine learning, Warsaw, Poland, September 17-21, 2007; proceedings. Number 4701 in Lecture notes in computer science Lecture notes in artificial intelligence, Berlin Heidelberg: Springer, 2007.

43.

Lee

Verleysen

Nonlinear dimensionality reduction. 1 ed. Information Science and Statistics, New York, NY: Springer, 2007.

44.

Lespinats

Colange

Dutykh

Nonlinear dimensionality reduction techniques: a data structure preservation approach. Cham, Switzerland: Springer Nature Switzerland AG, 2022.

45.

Fernandez Pierna

Baeten

Dardenne

, et al. Spectroscopic imaging. Compr Chemom 2009; 4: 173–196.

46.

Ghojogh

Crowley

Karray

, et al. Elements of dimensionality reduction and manifold learning. Cham: Springer International Publishing, 2023.

47.

Garzon

Yang

Venugopal

, et al. Dimensionality reduction in data science. Cham, Switzerland: Springer, 2022.

48.

De Silva

Tenenbaum

. Global versus local methods in nonlinear dimensionality reduction. In: Becker

Thrun

Obermayer

(eds.) Advances in Neural Information Processing Systems, British Columbia, Canada, vol. 15, pp. 705–712. Cambridge, Massachusetts, USA: MIT Press, 2002.

49.

Gering

Linear and nonlinear data dimensionality reduction. Area Exam Report, Massachusetts Institute of Technology, CSAIL, 2002. https://people.csail.mit.edu/gering/areaexam/gering-areaexam02.pdf

50.

Zhou

Sharpee

TO.

Using global T-SNE to preserve intercluster data structure. Neural Comput 2022; 34(8): 1637–1651.

51.

Anowar

Sadaoui

Selim

Conceptual and empirical comparison of dimensionality reduction algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE). Comput Sci Rev 2021; 40: 100378.

52.

Jolliffe

IT.

Principal component analysis. New York: Springer: Springer Series in Statistics, 2002.

53.

Pearson

LIII . On lines and planes of closest fit to systems of points in space. London Edinburgh Dublin Philos Mag J Sci 1901; 2(11): 559–572.

54.

Hotelling

Analysis of a complex of statistical variables into principal components. J Educ Psychol 1933; 24(6): 417–441.

55.

Tang

, et al. High-sensitivity ultrasonic guided wave monitoring of pipe defects using adaptive principal component analysis. Sensors 2021; 21(19): 6640.

56.

Roweis

Saul

LK.

Nonlinear dimensionality reduction by locally linear embedding. Science 2000; 290(5500): 2323–2326.

57.

Roweis

Saul

Hinton

GE.

Global Coordination of Local Linear Models. In: Dietterich

Becker

Ghahramani

(eds.) Advances in Neural Information Processing Systems, British Columbia, Canada, vol. 14, pp. 889–896. Cambridge, Massachusetts: MIT Press, 2001.

58.

Saul

Roweis

ST.

Think globally, fit locally: unsupervised learning of nonlinear manifolds. Technical report, Technical Report MS CIS-02-18, University of Pennsylvania, 2003.

59.

Praetzel

Clarke

Schmidt

, et al. Monitoring the evolution of localized corrosion damage under composite repairs in pipes with guided waves. NDT E Int 2021; 122: 102477.

60.

Liu

, et al. Acoustic data-driven framework for structural defect reconstruction: a manifold learning perspective. Eng Comput 2024; 40(4): 2401–2424.

61.

Maaten

Lvd

Hinton

. Visualizing data using t-SNE. J Mach Learn Res 2008; 9(86): 2579–2605.

62.

Maaten

Lvd

. Learning a parametric embedding by preserving local structure. In: Proceedings of the twelfth international conference on artificial intelligence and statistics, Clearwater, Florida, pp. 384–391. ISSN: 1938-7228. PMLR.

63.

Kobak

Berens

The art of using t-SNE for single-cell transcriptomics. Nat Commun 2019; 10(1): 5416.

64.

Agis

Pozo

A frequency-based approach for the detection and classification of structural changes using t-SNE. Sensors 2019; 19(23): 5097.

65.

Wattenberg

Viégas

Johnson

How to use t-SNE effectively. Distill 2016. Epub ahead of print October 2016. DOI: 10.23915/distill.00002.

66.

Garrett

RG.

The chi-square plot: a tool for multivariate outlier recognition. J Geochem Exploration 1989; 32(1–3): 319–341.

67.

Tibaduiza Burgos

Gomez Vargas

Pedraza

, et al. Damage identification in structural health monitoring: a brief review from its implementation to the use of data-driven applications. Sensors 2020; 20(3): 733.

68.

Davies

JO.

Inspection of pipes using low frequency focused guided waves. PhD Thesis, Department of Mechanical Engineering, Imperial College London, 2008.

69.

Halko

Martinsson

Tropp

JA.

Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev 2011; 53(2): 217–288.

70.

Abbassi

Römgens

Tritschel

, et al. Evaluation of machine learning techniques for structural health monitoring using ultrasonic guided waves under varying temperature conditions. Struct Health Monit 2023; 22(2): 1308–1325.