Abstract
Deposits prevention and removal in pipeline has great importance to ensure pipeline operation. Selecting a suitable removal time based on the composition and mass of the deposits not only reduces cost but also improves efficiency. In this article, we develop a new non-destructive approach using the percussion method and voice recognition with support vector machine to detect the sandy deposits in the steel pipeline. Particularly, as the mass of sandy deposits in the pipeline changes, the impact-induced sound signals will be different. A commonly used voice recognition feature, Mel-Frequency Cepstrum Coefficients, which represent the result of a cosine transform of the real logarithm of the short-term energy spectrum on a Mel-frequency scale, is adopted in this research and Mel-Frequency Cepstrum Coefficients are extracted from the obtained sound signals. A support vector machine model was employed to identify the sandy deposits with different mass values by classifying energy summation and Mel-Frequency Cepstrum Coefficients. In addition, the classification accuracies of energy summation and Mel-Frequency Cepstrum Coefficients are compared. The experimental results demonstrated that Mel-Frequency Cepstrum Coefficients perform better in pipeline deposits detection and have great potential in acoustic recognition for structural health monitoring. In addition, the proposed Mel-Frequency Cepstrum Coefficients–based pipeline deposits monitoring model can estimate the deposits in the pipeline with high accuracy. Moreover, compared with current non-destructive deposits detection approaches, the percussion method is easy to implement. With the rapid development of artificial intelligence and acoustic recognition, the proposed method can realize higher accuracy and higher speed in the detection of pipeline deposits, and has great application potential in the future. In addition, the proposed percussion method can enable robotic-based inspection for large-scale implementation.
Keywords
Introduction
In recent years, pipeline monitoring has attracted much attention1–4 due to their wide applications and importance to a nation’s economy. Most of the monitoring are related to impact detection, 5 leakage monitoring,6–9 foreign intrusion detection,10–12 crack detection,13–15 and corrosion and erosion monitoring.16–18 A much less monitoring topic is the sand deposits, which gather at the bottom of the pipeline and seriously affect the carrying capacity of a pipeline. Timely removal of deposits in the pipelines is important. However, unnecessary cleaning without knowing the deposit information, such as masses and composition, will unnecessarily increase costs. Therefore, knowing the properties of the deposits can help to apply more cost-effective strategy to remove the deposits from pipes. 19
The typical non-destructive inspection methods for deposits in pipelines include the visual inspection, the ultrasonic-based method, the radiography approach, and the infrared thermography method. Each method has their own advantages as well as disadvantages. For instance, the radiography approach 20 can detect pipes with complex geometries; however, its practical application is held back since it is harmful to human health. For the visual inspection, the size of pipeline (typically, the visual inspection only works well for large pipes) and applicable conditions (e.g. the pipe must be empty) have become major obstacles. The ultrasonic-based method always uses piezoceramic transducers to detect the damage in various structures, which includes bolts,21–26 wind turbine blade,27,28 composite plates, 29 fiber-reinforced-polymer (FRP)-reinforced concrete structures,30,31 subsea protective structures, 32 pin-connected structures, 33 prestressed concrete structures, 34 among others. Compared with radiography approach and visual inspection, the ultrasonic-based method35,36 and the infrared thermography method are more suitable for the deposits detection; however, they rely on the operator’s experience to set up the test and interpret the results. 37 Thus, a more feasible method is needed to detect the deposits in pipeline to reduce cost and improve efficiency.
As the most common natural acoustic wave, the sound wave is a superposition of harmonics with different frequencies, which are lower than 20 kHz, and will be obviously distinct if the frequencies of superimposing harmonics are different. Therefore, the sound wave actually contains much information that can be used to identify the required phenomenon, and the using of sound for phenomenon identification have been applied across multiple industries.
In the medical field, percussion has long been used to diagnose the health status of certain internal organs. 38 With the wide application of auscultation and percussion in modern medicine, “sound” shows great advantages in diagnosing some diseases, such as the pleural effusion. 39 In animal husbandry, researchers always monitor the physiological conditions of cows,40,41 pigs,42,43 and sheep44,45 by analyzing the sound of animal. In addition, the application of sound diagnosis in agriculture has also been widely applied.46–50
The structural integrity will change if there is a damage or change in the structure, and the physical properties such as mass, stiffness, and damping all change correspondingly. When a dynamic excitation is applied on the structure, the structural health state can be identified by the impact-induced signals, including sound. Among various test methods, the tapping method that uses hammers has attracted more attention due to the ease of operation in inspecting the integrity of various structures, including the railway,51,52 the aircraft, 53 the membrane structures, 54 and the tunnel lining. 55 However, in most existing researches, researchers pay more attention to the impact response of the structure rather than the impact-induced sounds. Recently, the inspection of bolt looseness using sound signals generated by impacting has achieved good results, 56 which inspires us to detect the deposits in the pipeline by impacting the structure and recording the sound signals.
In this article, the feasibility of percussion method in pipeline sandy deposits detection is investigated. Different sandy deposits were achieved by adding different masses of sand to a closed pipe. An impact hammer was used to excite the pipeline and the generated sound signals were recorded by a microphone. The pipeline was impacted and the initial 0.1 s of the impact-induced sound signal was selected. The features extracted from the sound signals include energy summation and Mel-Frequency Cepstrum Coefficients (MFCCs). A support vector machine (SVM) model was employed to identify the sandy deposits with different mass values. In addition, to select a more suitable feature for the pipeline deposits detection model, the classification accuracies of energy summation and MFCCs are compared. Furthermore, aiming to investigate the performance of the proposed method in a noisy environment, the noise rejection of MFCCs is performed by injecting white Gaussian noise to the signals. The experimental results demonstrated the effectiveness of MFCCs as the index for deposits detection is better than the energy summation, which shows great potential in the field of structural sound recognition. In addition, the MFCCs-based percussion method performs well in high signal-noise ratio (SNR) environment. Moreover, this proposed method is very easy to implement, and the deposits in pipes can be easily detected by combining with the machine learning. Furthermore, with the help of robotics which can carry out the impact in an automated fashion, the proposed percussion-based approach to detect the sandy deposition level in pipelines has a great potential for future implementation.
Theoretical background
SVM
SVM is a kind of learning machine based on the structural risk-minimization principle. 57 The research on SVMs can be traced back to the late 70s, 58 and the SVM has been successfully applied to classification and regression problems in non-destructive testing and structural health monitoring, including structural damage detection,59–62 dam safety prediction, 63 vortex-induced vibrations response prediction, 64 pipeline scour monitoring, 65 impact detection and location, 66 and so on.
When the SVM is used as a classifier, the data are transferred to the high-dimensional feature space and an optimal hyperplane is sought to maximize the margin between the two classifications,
67
shown in Figure 1. If these samples in Figure 1 are represented by {(
where f(x) is a separating hyperplane,

Support vectors and optimal hyperplane of linearly separable case.
For the linearly separable case, a separating hyperplane can be defined into an inequality as follow
where the value of yi can be +1 or −1. The points lying on these two hyperplanes are the support vectors.
In Figure 1, the margin between the hyperplane and dashed line is defined as geometrical margin, which is equal to 1/||
For the aforementioned optimization problem, the optimal solution of the original problem can be obtained by solving the dual problem. By adding a Lagrange multiplier αi to the constraint (αi≥ 0), the Lagrange function can be obtained as follows
To obtain the minimum value of L, partial differentiating with respect to
Substituting the above results into Equation (4) gives
The Lagrange function of Equation (7) contains only one variable α, and
After obtaining
where sgn(·) is the sign function.
For the case of nonlinear separability, the datasets cannot be divided into two categories just by linear decision functions. Therefore, to find a hyperplane with minimal error, a slack variable ξi and penalty variable C are introduced 68 as follow
Since the hyperplane cannot be determined by linear equations, the datasets are projected into high-dimensional feature space by nonlinear mapping functions Φ. However, the new spatial dimension of mapping will increase explosively with the increase of the original spatial dimension, which brings great difficulty to calculation. Due to that reason, kernel functions K were introduced to avoid direct computation in high-dimensional space, as shown in Figure 2, and the following equation can be obtained

A nonlinearly separable case.
Using a similar method to the linear decision function mentioned above, we can obtain the nonlinear decision function as follow
The kernel function in the nonlinear separable case of SVM includes polynomial, Gaussian radial basis function (RBF) and sigmoidal kernel function (SKF). 69 The cross-validation (CV) method is a commonly used method for kernel selection. 70 Table 1 shows the commonly used kernel functions and the parameters which have influence on the performance of classification. Improper parameter selection may cause over-fitting or under-fitting, which seriously affects the accuracy of classification. Therefore, it is indispensable to optimize the parameters before the model is trained.
The commonly used kernel functions and its parameters.
MFCCs
As an effective acoustic feature, MFCCs, which represent the result of a cosine transform of the real logarithm of the short-term energy spectrum on a Mel-frequency scale, 71 have been applied widely in the audio recognition due to its good performance.72–74 The relationship between the frequency in the Mel scale and Hertz scale is shown as follow
However, the application of MFCCs in civil engineering is not common, and the existing researches include the concrete defect detection 75 and the delamination detection of concrete bridge decks.76,77 Since the sound frequency that can be perceived by human ear is nonlinear (human ear can detect sound with frequencies lower than 1 kHz in linear scale and more than 1 kHz in logarithmic scale), MFCCs take this characters into consideration during the extraction process of the sound signal and constitutes a good representation of dominant features in acoustic information. 78
The feature extraction through MFCCs involves five main steps, including preprocessing (pre-emphasis, windowing), fast Fourier transform (FFT), Mel-frequency filterbank, logarithmic operation, and discrete cosine transform (DCT), as shown in Figure 3. The entire process can be expressed by the following equation
where X(n) is the signal at frame n after preprocessing, M(n) is MFCCs at frame n, Mel is Mel-frequency filterbank and ε is a small constant that prevents the occurrence of log0. It should be noted that if Mel(|FFT(X(n))|2) is greater than 0, ε is not needed in the equation. In this article, since Mel(|FFT(X(n))|2) was always greater than 0, ε was not added to the processing.

The flowchart of MFCCs.
The preprocessing contains pre-emphasis and windowing, and aims to obtain the short-term stationary signals for the subsequent operation. The pre-emphasis is a procedure that filters the sound signals with a high-pass filter, whose transfer function was shown as follow, to amplify the high frequency components
where the value of μ is usually taken as 0.9375. 79 Subsequently, the pre-emphasized signal is divided into short frame segments. Meanwhile, to ensure the stationarity of the signal, an overlapping area is needed between two adjacent frames. Then, the Hamming window is added to the short frames to minimize leakage effect and keep the continuity of the frame. After preprocessing, FFT is performed on each frame to transfer them from the time domain into the frequency domain and further obtain the energy of signals. Then, to change the frequency from the Hertz scale to the Mel scale, the energy is convolved with Mel-frequency filterbank. Finally, the Mel-energy is converted using DCT on the logarithm of the Mel-energy, which is called as MFCCs. 80 In addition, the first scalar of MFCCs (the 0th coefficient) is usually ignored since it is too sensitive to the amplitude of the signals.81,82
In the following processing and classification, the first half orders of MFCCs are usually taken as feature vectors in literatures. From this point of view, the DCT plays two key roles: (1) DCT is an important step to convert the features from the frequency domain to those in the time domain and (2) DCT can also be regarded as a way to reduce dimensionality of the feature by eliminating the high frequency part of the energy of the signals.
The principle of percussion-based pipeline deposit detection
The main devices used in this research are laptop, microphone, microphone power module, multifunction I/O device and impact hammer, as shown in Figure 4. Percussion was performed on the pipeline to assess the masses of deposit in pipeline with the help of SVM-based machine learning.

Schematic of the experimental setup.
The flowchart of the proposed method is shown in Figure 5. After recording the sound signals using microphone, the process can be divided to two processes: training process and testing process. The first process is training process, consists by feature extraction using MFCCs as well as energy summation, and model establishment using SVM. For testing process, the extracted features are matched with the established model. It should be noticed that two different models, that based on MFCCs and that based on energy summation, are established and compared in this article.

Schematic of the detection principle.
Experimental setup and procedure
The specimen used in this experiment was a 1-m long stainless steel pipeline, whose outer diameter is 64 mm, wall thickness is 3.5 mm, and maximum capacity of water is 2.5 kg. The impact position was located on the top surface at the mid-span. The number of impact-induced sound signals in the first experiment and the second experiment is shown in Table 2. In the first experiment, the impact hammer was used to impact 100 times under each condition. In the second experiment, the pipe was impacted by about 285 times under each condition. Therefore, the pipeline was impacted 1939 times in total. Before tapping, to prevent the uneven distribution of sand and make the sand fully deposited at the bottom of the pipeline, the pipe was shaken and kept stationary for 30 min. A G.R.A.S 40PP microphone (A-weight) was used at a specific location (about 5 cm away from the tapping position) to recorded the impact-induced sound. The experimental setup is shown in Figure 6. Since the microphone used in this experiment can acquire sound with frequencies between 10 Hz and 20 kHz, sampling rate was determined to be 200 kHz.
The number of impact-induced sound signals in first experiment and second experiment.

Experimental setup.
Sands of 0.01, 0.02, 0.03, and 0.04 kg with particle size between 2 and 2.5 mm were added to the pipeline to simulate different masses of sandy deposits. The deposits in the pipeline were measured using the sand–water mass ratio, which was calculated by dividing the mass of sandy deposits by the mass of water. According to different sand–water mass ratio, the experiment can be divided into five conditions, as shown in Table 3. The experiment was carried out in a quiet laboratory with no extraneous noise, and received signals were filtered with a band-pass filter that matched the frequency acquired by the microphone.
The conditions in experiment.
The SVM, specifically the LIBSVM toolbox, 83 was used in this article to classify the masses of sandy deposits in the pipeline. Additionally, RBF was used as the kernel function, and an exhaustive grid search using exponentially increasing c and γ was performed to obtain the optimal choice of c and γ by v-fold CV strategy. 84
In this article, two different features (energy summation 56 and MFCCs), which were extracted from selected signals, were used in classification and compared with each other. Sound signals in the first experiment were used in the process of comparison. In one condition, 50 sets were randomly taken as the training sets, and the other 50 sets were used as the testing sets. After comparison, the better feature is used to build a pipeline deposits detection model. Sound signals in the first experiment and the second experiment were all used in the process of model building. The training sets in the new model contain 50 signals in the first experiment and 100 signals in the second experiment. The 50 signals in the first experiment are randomly selected. The 100 signals in the second experiment are also randomly selected from the first 200 signals. The testing sets contain the rest of the signals in the first experiment and the second experiment.
In the noise robustness tests, MFCCs were tested under white Gaussian noise, and five noise levels (30, 35, 40, 45, and 50 dB, respectively) were added to the signals. Sound signals in the first experiment were also used in the noise robustness tests. Due to the fact that the property of noise cannot be known in field implementation, in this test, 50 sets of MFCCs without noise were randomly taken as training sets, and the remaining 50 sets with different noise levels were used as the verification sets. Furthermore, to prevent accidental events, the SVM classification in comparison and noise robustness tests was conducted 50 times repeatedly. Before the SVM classification, 5-time exhaustive grid searches are operated to obtain the optimal choice of c and γ in SVM.
Experimental results and discussion
PSD analysis
Sound signals recorded in the first experiment was used for the PSD analysis. Figure 7 shows one of the impact-induced sound signals recorded by the microphone. The first 0.1 s of the sound signal is taken as the selected signal (shown in Figure 7), and there are 100 selected signals in each condition with different sand–water mass ratio. Owing to the inherent randomness of the manual control of the impact, the magnitudes of the peaks are not uniform. Therefore, it is vital to extract suitable features for the following classification.

One of the sound signal recorded by the microphone.
The PSD of the selected signal in different condition was obtained to operate frequency domain analysis. As the energy of selected signals were mainly concentrated between 2000 and 3000 Hz, Figure 8 shows the energy in this area. However, since the value of the fundamental frequency is almost the same in five conditions, it is hard to classify the masses of sandy deposits in the pipeline using only the fundamental frequency.

The PSD of selected signals: (a) Condition 1; (b) Condition 2; (c) Condition 3; (d) Condition 4; and (e) Condition 5.
The comparison between energy summation and MFCCs
In this section, the performance of energy summation and MFCCs in pipeline deposits detection is compared. It should be noticed that sound signals recorded in the first experiment were used in this section.
The classification using energy summation
In classification using energy summation, the features were extracted based on the summation of energy from different frequency bands of the selected signal’s PSD plot. Four frequency bands were selected: 2000–2250 Hz, 2250–2500 Hz, 2500–2750 Hz, and 2750–3000 Hz. Figure 9 presents the three-dimensional (3D) scatterplots for different water-deposit mass ratio with 2000–2250 Hz, 2250–2500 Hz, and 2500–2750 Hz (shown in Figure 9(a)), and 2250–2500 Hz, 2500–2750 Hz, and 2750–3000 Hz (shown in Figure 9(b)). It is easy to distinguish 1.6% by only observation. However, it is hard to differentiate other conditions from each other. In addition, it should be noticed that parts of 0% and 0.8% are overlapped. An SVM-based machine learning was used to classify the nonlinear correlation between the energy summation and the deposit mass in pipeline.

3D scatterplots for different masses of sandy deposits with different frequency bands: (a) 2000–2250 Hz, 2250–2500 Hz, 2500–2750 Hz and (b) 2250–2500 Hz, 2500–2750 Hz, 2750–3000 Hz.
Figure 10 shows the results of exhaustive grid search using energy summation. Although there is still a small amount of fluctuation after the accuracy of CV tends to be stable, this slight fluctuation has little effect on classification accuracy. Consequently, the value of c and γ used in SVM classification are selected as 26 and 2−27, respectively.

The results of exhaustive grid search using energy summation.
Table 4 shows 50-time SVM classification results and one of them was shown in Table 5. The average accuracy of energy summation is 84.168% which illustrates that energy summation can be used as the feature in pipeline deposits detection. However, it should be noted that the classification between the sand–water mass ratio of 0% and 0.8% is very poor, as shown in Table 5, which is in line with the views we mentioned earlier. From this point of view, the energy summation does not extract the most appropriate features between 0% and 0.8%.
The results of SVM classification using energy summation.
The SVM classification results of No. 2.
The classification using MFCCs
MFCCs used in classification were calculated following the steps in section “MFCCs.” For the convenience of FFT and subsequent calculation, the frame length and step are taken as 2048 sampling points and 1024 sampling points, respectively. Using the following equation, a selected signal is divided into 18 frames
where fn is the number of frames that divided, fl is frame length, s is step, and xn is the length of signal. Furthermore, the Mel-frequency filterbank used in following step is a 24th-order triangular band-pass filterbanks, which range from 10 Hz to 20 kHz. Moreover, the half orders of MFCCs (12th-order) were taken as feature vectors in SVM classification.
Figure 11 shows the results of exhaustive grid search using MFCCs. It is clearly shown in Figure 11 that when γ is smaller than 2−10, the SVM achieves the best CV accuracy. Under these circumstances, in theory, the accuracy of classification can reach 100% after training with the proper training sets. Therefore, the value of c and γ are taken as 1 and 2−11, respectively. Table 6 shows the accuracy of SVM classification using MFCCs and the average accuracy is nearly 100%. Compared with the classification using energy summation, the accuracy of using MFCCs is much higher, which illustrates that MFCCs as an acoustic feature have great potential in structural health monitoring by the percussion approach.

The results of exhaustive grid search.
The accuracy of SVM classification using MFCCs.
Robustness of MFCCs to noise
Sound signals recorded in the first experiment was used to investigate the robustness of MFCCs to noise. In field implementation, the major factor influencing the accuracy of the MFCCs-based percussion method is environmental noise. Thus, it is important to study the robustness of MFCCs to noise. To simulate noise in field implementation, the white Gaussian noise with wide bandwidth is used in this section. The result of noise robustness tests is shown in Figure 12. When SNR is larger than 45 dB, the masses of deposit in pipeline can be detected with high accuracy using MFCCs. However, when SNR is lower than 40 dB, the deposit detection is not accurate enough. The results illustrated that current MFCCs-based percussion method is suitable for deposits detection in the environment with high SNR. Furthermore, the sound signals produced by tapping structures are not as complicated as speech signals, therefore, some improvements to MFCCs can be made to enhance their performance at a low SNR environment.

The accuracy of deposit detection under a noise environment.
The MFCCs-based pipeline deposits detection model
In section “The classification using MFCCs,” the accuracy of pipeline deposits detection is nearly 100%, however, this model is not perfect. Due to the small number of datasets, the generalization ability of the model is modest. Since MFCCs perform better than energy summation in pipeline deposits detection, MFCCs are used as the feature to build a pipeline deposit detection model. In addition, the impact-induced sound signals recorded in the first experiment and the second experiment are used in this section.
The training sets in the new model contain 50 signals in the first experiment and 100 signals in the second experiment. The 50 signals in the first experiment are randomly selected. The 100 signals are also randomly selected from the first 200 signals in the second experiment. The testing sets contain the rest of the signals in the first experiment and the second experiment. In addition, MFCCs of each signal is calculated according to the method in section “MFCCs” and “The classification using MFCCs.” After 5-time CV, the value of c and γ are taken as 2 and 2−13, respectively.
The classification accuracy of the MFCCs-based pipeline deposits detection model is shown in Table 7. When the sand–water quality ratio is 1.2%, the classification accuracy is relatively low. Nevertheless, in other conditions, the classification accuracies are all higher than 85%. In addition, the overall classification accuracy of the proposed model is 91.59%, which illustrates that the MFCCs-based pipeline deposits detection model can detect the deposits accurately.
The classification accuracy of the pipeline deposits detection model.
Discussion
The experiment result shows the high predicting accuracy for pipeline deposits monitoring with the minimum water-deposit mass ratio of 0.4%, that is, the current resolution of this method is 0.4%. Since the overall accuracy of the proposed model is high, to achieve a better resolution for sandy deposits detection requires further research.
In addition, in this article, we used the sand with particle size between 2 and 2.5 mm. That means that the sand in the pipeline is unlikely to adhere to the interior of pipe. However, in practice, sands with small particle size do have a great chance of adhering to the interior of the pipe. In the future, more experiments should be carried out using the sand with small particle size to investigate the influence of sand deposits which adhere to the interior of pipe on the accuracy of detection.
Furthermore, a finite-length and closed pipeline was used in this article and the sandy deposits in 1 m can be detected well using the proposed method. However, the pipelines in transportation (such as petroleum pipeline) is approximated as infinite-length, therefore, corresponding experiments should be carried out to determine the effective detecting-length of proposed percussion method.
In addition, though MFCCs perform well with white Gaussian noise, whose SNR greater than 45 dB, the influences of the environmental noise, such as vehicle noise and construction site noise, have not been investigated in this research. Actually, environmental noise is quite different from white Gaussian noise. In addition, the sound feature extraction algorithm should be improved to increase the accuracy of detection in the cases of lower SNR. In the authors’ future work, laboratorial and numerical work will be performed to explore the feasibility of detecting the properties of pipeline-deposit (e.g. composition) using percussion method. Moreover, the cases of low SNR will be considered in our future work.
Conclusion and future work
This article conducted an exploratory research to investigate the feasibility of percussion method in pipeline sandy deposits detection. To classify the masses of deposits more accurately, MFCCs, which was used to extract the transform domain feature of selected signals, were used as the main feature in SVM classification. Meanwhile, the energy summation is also used as a feature for classification and compare with MFCCs. In addition, to investigate the performance of MFCCs under a noisy environment, noise robustness tests were conducted. The results show that the SVM model developed by MFCCs and energy summation can both accurately detect the different masses of deposits. However, MFCCs-based classification model shows higher accuracy than the model-based energy summation, which demonstrates that compared with energy summation, MFCCs are more suitable as the feature in pipeline deposits detection and have strong robustness to noise in high SNR environment. In addition, the proposed MFCCs-based pipeline deposits detection model can detect the sandy deposits with an accuracy higher than 90%. Furthermore, compared with the commonly used methods in pipeline deposits detection, the percussion method is very easy to implement, and after combining with machine learning, the deposits in pipelines can be easily determined by untrained operators. In the future, a “tapping-and-detecting” robot can be developed utilizing a knocking equipment and a microphone to automate the percussion and signal processing. The robotic-assisted approach can further realize automatically determine the masses of deposits in pipeline, which has a great potential in field implementation. Moreover, our future work will consider the cases of acquired sound signals of low SNR as well as the cases of sands with much less particle sizes.
Footnotes
Acknowledgements
Funding
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article: This research was partially supported by the Major State Basic Development Program of China (973 Program, Grant Number: 2015CB057704), the General Project of the Natural Science Foundation of China (Grant Number: 51478080), and the General Project of Natural Science Foundation of Jiangsu Province of China (Grant Number: BK20181198).
