An adaptive weighting multimodal fusion classification system for steel plate surface defect

Abstract

In steel surface inspection, an accurate steel surface defect identification method is needed to evaluate the impact of defects on structural performance and system maintenance. Traditionally, the recognition accuracy of methods based on handcrafted features is limited, but the system performance can be improved by feature fusion extracted by different methods. Therefore, this research uses the pre-trained convolutional neural network (CNN) combined with transfer learning to extract effective abstract features, and carries out adaptive weighting multimodal fusion of three the abstract features and handcrafted feature sets at the decision-making level, that is, proposes an adaptive weighting multimodal fusion classification system. The system uses handcrafted features as a supplement to abstract features, and accurately classifies steel surface defects in completely different feature representation spaces. Based on the NEU steel plate surface defect benchmark database, the classification results of feature sets before and after fusion are compared and analyzed. The experimental results show that the classification accuracy of the fusion system is improved by at least 3.4% compared with that before fusion, and the final accuracy rate is 99.0%, which proves the effectiveness of the proposed system.

Keywords

CNN-based features feature extraction steel plate surface defect decision-level fusion

1 Introduction

In recent years, the process of world industrialization has been constantly promoted, and steel has been widely used in aviation, automobile, machine tool and other industries. It is a vital raw material in industrial production, and its demand is increasing year by year. In July 2022, the crude steel output of 64 countries in the world included in the statistics of the World Iron and Steel Association is 149.3 million tons [1]. In the context of such a large demand, users’ requirements for steel quality are also constantly improving. However, a series of quality problems appear on the surface of strip steel due to various factors, such as cracks, scabs, holes, etc. Moreover, more than 60% of strip steel quality problems are caused by surface defects [2]. Therefore, the surface defect of steel plate is an important factor affecting the quality of steel plate and strip. With the progress of optical instrument technology, the early detection and classification of steel surface defects based on intelligent visual recognition method is the key to prevent mass production of low-quality steel, which has become a research hotspot of scholars all over the world. Intelligent visual recognition methods mainly include two methods: feature-based method and depth learning-based method.

The depth learning-based method has strong representation ability and can realize end-to-end detection. Therefore, many researchers try to apply the depth learning-based method to achieve intelligent steel surface defect recognition. Zou et al. (2019) [3] proposed an end-to-end trainable deep convolutional neural network for automatic crack detection. He et al. (2020) [4] proposed a new defect detection system based on deep learning. The system uses multi-level feature fusion network (MFN) to combine multi-level features into a feature containing more defect location details, so as to achieve strong classification capability. Feng et al. (2021) [5] proposed a RepVGG algorithm and combined it with spatial attention (SA) mechanism to verify the impact on X-SDD dataset. The test results show that this algorithm has achieved good defect recognition effect. Deep learning can well process the original data, extract and classify the large data set by using salient features. However, it is purely data-driven and requires a large amount of labeled data sets for network training. Therefore, the recognition performance of the depth learning-based method on small data sets is lower than that of the features-based method [6].

The feature-based method mainly extracts the feature information of the image, and then constructs the corresponding classifier to classify the surface defects. A large number of documents have analyzed the visual features of surface defect images, such as geometric shape, gray scale and statistical texture features [7–9]. Considering that feature fusion from different sources can achieve higher classification accuracy, and also meet certain anti-interference capability. Therefore, some authors try to integrate multi-source feature sets to improve system stability. Zhao et al. (2011) [10] extracted gray level features, invariant moment features and texture features, and introduced fuzzy functions into support vector machine (FSVM) for strip surface defect detection. In Wang et al. (2015) [11], the author proposed a method based on multi feature fusion and improved random forest (OMFF-RF algorithm) to identify distributed defects. This method extracts the Gradient-oriented histogram (HOG) feature set and the gray level co-occurrence matrix (GLCM) feature set, and uses OMFF-RF algorithm to solve the problem that the dimension difference between the two feature sets leads to the low precision of serial fusion. Xu et al. (2021) [12] proposed a surface defect recognition method for lithium battery electrode based on multi feature fusion. The main idea is to fuse texture features, edge features and HOG features in series at feature level, and then input them into particle swarm optimization support vector machine (PSO-SVM) to automatically recognize and classify defect images, with an average recognition rate of 98.3%. In the above research, information feature fusion is realized by integrating features from different handcrafted descriptors. However, these handcrafted features have some limitations in high-precision defect detection. Currently, the rise of deep learning has made many scholars try to use convolutional neural networks to extract image features, and have achieved phased results [13–16]. Gamma Kosala et al. (2017) [17] used CNN as a trainable feature extractor, and SVM replaced the fully connected layer as a recognizer for license plate detection, achieving the best structure with an accuracy of 93%. Gayathri S et al. (2020) [18] proposed a CNN architecture to extract features from retinal fundus images, and input the extracted features into the classifier for two class and multi class classification. This method can minimize computational complexity and provide better classification performance. Yang et al. (2021) [14] combined migration learning to obtain multi-scale features of welding images, proposed a welding defect recognition algorithm based on multi feature fusion, and realized accurate defect detection based on X-ray images. As a feature extractor, CNN can obtain high-level abstract features, and performs well in visual recognition tasks [19–21]. Some experimental results show that it can get only when the deep learning model and handcrafted input features are used at the same time, can the deep learning features obtain more competitive performance in signal recognition [22, 23]. Therefore, in image recognition, this paper attempts to use automatically learned CNN features and handcrafted features for multimodal fusion to improve the recognition accuracy.

Aiming at the above issues, we propose a new adaptive weighting multimodal fusion classification system, which combines the abstract features based on CNN with handcrafted features. The handcrafted features are composed of three subsets, namely, the global texture statistical feature set based on GLCM represent the gray spatial distribution pattern, the image edge feature set based on SOBEL operator, and the local texture feature set based on the HOG. This system includes three steps to classify the surface defects of steel plates. Firstly, principal component analysis (PCA) is used to reduce the dimension of the SOBEL feature subset and HOG feature subset respectively, and they are input into SVM classifiers respectively. And the GLCM feature subset is directly input into SVM classifier. Then the features extracted from CNN through automatic learning are injects into SVM classifier. Finally, an adaptive weighting averaging fusion decision method is proposed. This method combines the features extracted from CNN and handcrafted features, and realizes the decision level fusion of defect image features in completely different representation spaces, that is, the individual decisions of four different classifiers are fused together in an adaptive weighting average way to achieve a unified decision on the classification results of the input image features. The results of NEU surface defect benchmark database show that the proposed system achieves higher classification accuracy than some advanced surface defect classification methods. The basic contributions of this study can be summarized as follows:

Novel multimodal fusion of handcrafted and CNN-based features.

An adaptive weighting average method is proposed for decision level fusion.

Compared with other advanced methods, the adaptive weighting multimodal classification system achieves better classification results.

The rest of this paper is organized as follows: Section 2 introduces the overall structure of the proposed adaptive weighting multimodal fusion classification system in detail. Section 3 gives a brief introduction to handcrafted feature extraction and deep learning feature extraction methods. Section 4 describes the implementation process of the adaptive weighting averaging fusion decision method. The data set introduction and related evaluation experiments are presented in Section 5, and the final conclusions are presented in Section 6.

2 Proposed adaptive weighting multimodal fusion classification system

In this study, we propose a new adaptive weighting multimodal fusion classification system for steel plate surface defects, which allows us to extract the most effective features from multiple fields. The overall schematic diagram of the system is shown in Fig. 1. This system mainly consists of three steps.

Fig. 1

Structure of the proposed system.

Firstly, it uses the handcrafted features extracted by well-known algorithms to classify defects respectively. Handcrafted feature sets include global texture features, edge features and local texture features. Because the SOBEL operator edge feature set and the local texture feature set based on HOG extraction have large dimensions, it will cause dimension disaster if the dimension is not reduced. Therefore, PCA is used to reduce the dimensions of these two feature sets respectively before classification to improve computing efficiency and classification performance. Then, the reduced dimension edge features and reduced dimension local texture features are input into the SVM classifier respectively, and the global texture features are directly input into SVM classifier for classification.

Secondly, we use CNN as an automatic feature extractor. VGG16, introduced in 2014 by Simonyan and Ziser-man of the Visual Geometry Group Laboratory of Oxford University, has 13 convolution layers and 3 fully connected layers [24]. Convolution layer is used for feature extraction, and all connected layer is used for classification. In order to take advantage of the advantages of transfer learning, we propose to use the pre-trained VGG16 model to extract the surface defect features of steel plate, and use the trained learning features to input SVM for classification.

Previous studies on multimodal classification systems have shown that multi-feature fusion can improve classification performance [25]. Considering the large intra class similarity and inter class differences between different types of surface defects, we fuse handcrafted feature descriptors and global learning CNN feature descriptors at the decision-making level, and use a new adaptive weighting averaging fusion method to achieve decision level fusion and predict classification results. This method is described in detail in Section 4.

3 Feature extraction methods

Feature extraction is a key step before realizing the classification of steel plate surface defects. Its main job is to extract feature sets that can reflect image properties and achieve high performance. Therefore, considering the factors such as computational efficiency, feature space size and robustness, this paper uses effective local and global feature extractors that have been successfully applied to the classification of steel plate surface defects. Basically, these feature extraction methods can be divided into two subsets: handcrafted-feature descriptors and learned-feature descriptors.

3.1 Handcrafted feature descriptors

This research starts from the direction of multi features and selects the GLCM, SOBEL algorithm and HOG to extract the global texture features, edge features and local texture features of the image from the surface defect area image of the steel plate.

GLCM is a joint probability matrix for statistical image analysis to describe the spatial correlation between pixels in a textured image and is very sensitive to texture variations across the image [26, 27]. In this paper, the appearance characteristics of surface defects of steel plate images and the properties of related parameters of GLCM are comprehensively considered, and on the basis of GLCM, four statistical attributes, contrast, entropy, inverse variance and energy, are calculated to quantitatively describe the texture features of defect recognition. In order to reduce the calculation amount of the matrix, when constructing the GLCM, four moving directions of 0°, 45°, 90°, and 135° are used for calculation.

The SOBEL algorithm is a discrete difference operator based on the first derivative, which is used to calculate the gray level approximation of the image brightness function. It calculates the gray weighted difference at the upper, lower, left and right adjacent points of the pixel, and reaches the extreme value at the edge. SOBEL algorithm can provide accurate edge contour information, and has a smoothing effect on noise. It is a common edge detection method [28]. Because the edge shapes of surface defects of steel plate are different, the edge features of defects can represent defect information. Therefore, considering the accuracy and efficiency of the edge operator, the classical edge detection operator SOBEL is selected to extract the edge features of defects.

For HOG feature, it divides the image sample into several connected small regions, collects the gradient or edge direction histograms of all pixels in each unit, and finally combines these histograms into multi-dimensional feature descriptors [29]. Since this method operates on the local grid cell of the image, it can maintain good invariance for two types of image deformation (geometric and optical) that only occur in the large space domain. Therefore, this paper uses HOG features to describe the local features of defects.

3.2 Learned feature descriptors

Transfer learning [22] is the ability to transfer learned information from the source domain to the target domain. It provides a good detection scheme for small-scale datasets by training models on large-scale image datasets. In order to obtain strong features from surface defect images of steel plates, this paper combines transfer learning and uses a pre-trained CNN network as a feature extractor for defect images. The learned features can be directly input into a classifier like SVM to predict output class labels.

VGG16 network is a typical CNN model. It has been proved that VGG16 network has the highest recognition rate on NEU compared with other deep learning networks [30]. Therefore, VGG16 is selected as the feature extraction framework. For the image recognition of steel surface defects, a pre-trained VGG16 network based on NEU is proposed. Figure 2 shows the special fine-tuning structure of the VGG16 network for the feature expression of steel surface defect images. The first and second convolutional layers are made up of 64 feature kernel filters, each of which is 3 · 3 in size. The dimensions of the input image are changed to 200 · 200 · 64 as it passes through the first and second convolutional layers. The output is then passed to the max pooling layer with a stride of 2. The third and fourth convolutional layers are made up of 128-feature kernel filters with a filter size of 3 · 3. After these two layers, a max pooling layer with stride 2 is applied, and the output is reduced to 50 · 50 · 128 pixels. Convolutional layers with a kernel size of 3 · 3 are used in the fifth, sixth, and seventh layers. Two hundred fifty-six feature maps are used in all these three layers. Following these layers is a max pooling layer with stride 2. There are two groups of convolutional layers from 8^th to 13^th, whose kernel size is 3 · 3. There are 512 kernel filters in each of these convolutional layers. Following these layers is a max pooling layer with a stride of 2. Finally, the full connection layer is replaced by small data set classifier SVM for classification.

Fig. 2

Transfer learning by VGG16.

4 Adaptive weighting averaging fusion decision

After training four classifiers with different feature sets, the prediction categories and the posterior probability vectors of different classifiers for sample x_i can be obtained by using the posterior probability prediction function. The posterior probability vectors of different classifiers can form the matrix as shown in (1). $P = {[\begin{matrix} P_{11} (x_{i}) & P_{12} (x_{i}) & \dots & P_{1 m} (x_{i}) \\ P_{21} (x_{i}) & P_{22} (x_{i}) & \dots & P_{2 m} (x_{i}) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ P_{n 1} (x_{i}) & P_{n 2} (x_{i}) & \dots & P_{nm} (x_{i}) \end{matrix}]}_{n \times m}$ (1) where n represents the number of classifiers, m represents the category number of steel plate surface defects.

For sample x_i, we know that the sum of posterior probabilities of each defect category is 1. The more average the posterior probability of each defect category, the lower the accuracy of judgment. Conversely, if the posterior probability value is biased to a certain category, the higher the reliability of determining the defect category is. Therefore, the standard deviation as show in (2) is used to measure the classification uncertainty of sample x_i by different classifiers. $S_{i} (x_{i}) = \sqrt{\frac{\sum_{j = 1}^{m} {(P_{ij} - \bar{P})}^{2}}{m - 1}}, i = 1, 2, \dots, n$ (2) where P_ij represents the posterior probability that the i^th classifier classifies sample xⁱ into the j^th category, P represents the average value of the posterior probability of the i^th classifier.

Obviously, if the standard deviation is smaller, the posterior probability of the classifier is more average. This indicates that of this classifier has the worse the classification ability for sample xⁱ and the smaller the adaptive weight of this classifier. Therefore, we calculate the adaptive weighting of different classifiers as shown in (3). $w_{i} = \frac{exp (s_{i} (x_{i}))}{\sum_{j = 1}^{n} exp (s_{j} (x_{i}))}$ (3)

After calculating the adaptive weighting fusion of each classifier for the sample xⁱ, we can get a new probability matrix P_new by multiplying the i^th row of the posterior probability output matrix P, that is, the output probability result of each classifier by the corresponding weight w_i. The new probability matrix P_new adds up each column and divides it by n to get the final posterior probability value of each category, and the highest value is the category of the sample xⁱ. The complete adaptive weighting averaging fusion decision method for defect inversion is shown in Algorithm 1.

Algorithm 1 Adaptive weighting averaging fusion decision

Input:

\tilde{P} = [P_{1}, P_{2} \dots P_{t}]

– The probability matrix composed of the posterior probabilities of the multiple classifiers for t samples

Output:

\hat{R} = [R_{1}, R_{2} \dots R_{t}]

– Forecast category matrix for t samples

1. Fori=1 to the samples number tdo

2. Fork=1 to the classifiers numbers ndo

3. Forj=1 to the category number mdo

4. Calculate

\bar{P}

– the average probability of each classifier

5. Calculate S_k – Calculate the standard deviation of each classifie r probability vector

6. End for

7. Calculate W_k – the adaptive weight of different classifiers

8. Get P_new – The posterior probability of each classifier multiplied by the corresponding adaptive weight W_k

9. Get P_result – the final posterior probability value of each category for the sample x_i

10. End for

11. Get R_i – the category of the sample x_i

12. End for

13. Get

\hat{R}

– the final category of all steel plate surface defect samples

5 Experiments

5.1 Experimental configuration

In this study, the NEU steel surface defect database created by Song et al. of Northeast University is taken as the research data target [31]. As shown in Fig. 3, there are six types of defects, namely, crazing, inclusion, patches, pitted surface, rolled-in scale and scratches. There are 300 sample data points for each defect category, thus, resulting in a total of 1800 data points. The image resolution is 200×200 pixels, with a gray value of 0 ∼ 255. For our experiments, the NEU database is divided into two parts with a ratio of 7 : 3. The training and testing samples are summarized in Table 1.

Fig. 3

Samples of NEU steel plate defect images. (a) Crazing (b) Inclusion (c) Patches (d) Pitted surface (e) Rolled-in scale (f) Scratches.

Table 1

Summary of the training and testing samples

Index	Types	Total samples	Training set	Testing set
1	Crazing (Cr)	300	210	90
2	Inclusion (In)	300	210	90
3	Patches (Pa)	300	210	90
4	Pitted Surface (PS)	300	210	90
5	Rolled-in Scale (RS)	300	210	90
6	Scratches (Sc)	300	210	90

The proposed system is implemented on a PC with four Intel Core i3 processors and no GPU. Its clock speed is 3.60 GHz, RAM capacity is 8 GB, and the operating system is Windows 10-based Python is the development environment and Keras library is used as a learning framework. To fairly compare the data results before and after feature fusion processing, the exact same hyperparameters are used to train the network as follows: Adam is used as the optimizer; ReLU is used as the activation function; an initial learning rate of 0.001; a batch size of 32; an epoch of 100. In order to verify the effectiveness and superiority of the proposed system, a comprehensive experimental analysis and comparison are conducted to evaluate the performance of the proposed system over the NEU data set. Detailed experimental details, evaluation indicators, fusion experimental results of different features, and comparison with other advanced methods are described in the following subsections.

5.2 Metrics

In order to quantitatively evaluate and analyze the advantages of the proposed system, we concentrate on those indicators used for classification problem. Therefore, confusion matrix and six performance parameters including Accuracy, Recall, Precision, Macro-recall, Macro-precision, and Macro-F1, were used as indexes to evaluate the classification results.

The confusion matrix consists of two classes, namely, the actual class and the prediction class. The actual classification is determined by the actual steel plate defect label, and the classification is predicted by algorithm simulation. In the confusion matrix, four parameters of the actual category are evaluated and listed, namely true positive (TP), true negative (TN), false positive (FP) and false negative (FN). According to these parameters, the calculation equations of Accuracy, Recall and Precision are as shown in (4) and (6). Macro-recall, Macro-precision, and Macro-F1 are as shown in (7) and (9), where N is the number of categories, P and R are abbreviations of Precision and Recall respectively. $Accuracy = \frac{(TP + TN)}{TP + TN + FP + FN}$ (4) $Recall = \frac{TP}{TP + FN}$ (5) $Precision = \frac{TP}{TP + FP}$ (6) $\begin{matrix} Macro - Recall = (\frac{T P_{0}}{T P_{0} + F N_{0}} + \frac{T P_{1}}{T P_{1} + F N_{1}} \\ + \dots + \frac{T P_{N - 1}}{T P_{N - 1} + F N_{N - 1}}) \times \frac{1}{N} \end{matrix}$ (7) $\begin{matrix} Macro - Precision = (\frac{T P_{0}}{T P_{0} + F P_{0}} + \frac{T P_{1}}{T P_{1} + F P_{1}} \\ + \dots + \frac{T P_{N - 1}}{T P_{N - 1} + F P_{N - 1}}) \times \frac{1}{N} \end{matrix}$ (8) $\begin{matrix} Macro - F 1 = (\frac{2 P_{0} R_{0}}{P_{0} + R_{0}} + \frac{2 P_{1} R_{1}}{P_{1} + R_{1}} \\ + \dots + \frac{2 P_{N - 1} R_{N - 1}}{P_{N - 1} + R_{N - 1}}) \times \frac{1}{N} \end{matrix}$ (9)

5.3 Principal Component Analysis (PCA)

Because in the prediction of single feature collection, HOG and SOBEL feature sets have features of 5000 and 40000 dimensions respectively. Too high data dimension will cause dimension disaster, which will affect both calculation efficiency and classification performance. Therefore, this experiment uses the widely used PCA dimensionality reduction method to remove the redundancy in the data set. The basic principle of PCA is to project data in high-dimensional space into low-dimensional space through linear mapping, and try to maximize the variance of data in low-dimensional space This can effectively reduce dimensions while maintaining the original data point relationship Based on this principle, this study uses PCA method to reduce the dimensions of HOG and SOBEL feature sets respectively, and the data of dimension and accuracy after dimension reduction are shown in Fig. 4.

Fig. 4

PCA dimension reduction.

As shown in Fig. 4 (a), after PCA dimensionality reduction of SOBEL feature set, when the dimension is reduced to 22, the precision is the highest, which is 3.99% higher than that without dimensionality reduction. As shown in Fig. 4 (b), after PCA dimensionality reduction of HOG feature set, when the dimension is reduced to 11, the precision is the highest, which is 13.11% higher than that without dimensionality reduction. The analysis of these two line charts shows that the process of dimensionality reduction has experienced two stages: slow rise and rapid decline. The reason for the rise stage is due to the removal of redundant information, and the reason for the decline stage is due to the lack of useful information. Based on the dimensionality reduction experimental results, the SOBEL and HOG feature sets have 22 and 11 dimensional features after dimensionality reduction.

5.4 Performance of proposed system

In order to compare and analyze the fusion classification results of the proposed method and other fusion methods for artificial features and features obtained by training convolutional neural networks, support vector machine classifier based on radial basis function kernel is used to uniformly predict the output class labels in this study. These SVM classifiers all use 5-fold cross validation to ensure the reliability of the experimental results.

The contrast experiment includes single feature set prediction, feature level fusion prediction and decision level fusion prediction including two groups of comparison experiments of simple probability matrix averaging and adaptive weighting probability matrix averaging, which are labeled as Case 1 to Case 4 respectively. First of all, the single feature set includes GLCM texture feature set, HOG feature set, SOBEL edge feature set and VGG16 learning feature set. In the prediction of single feature collection, because the HOG and SOBEL feature sets have features of 5000 and 40000 dimensions respectively, they have features of 22 and 11 dimensions after dimension reduction using PCA. Feature level fusion prediction includes serial fusion of handcrafted features (GLCM+HOG+SOBEL) group, serial fusion of handcrafted features and VGG16 learning feature (GLCM+HOG+SOBEL+VGG16) group. The decision level fusion methods are divided into simple probability matrix average and the proposed adaptive weighting average decision method. The average fusion case of simple probability matrix is represented as manual feature set (GLCM * HOG * SOBEL) group, manual feature set and VGG16 learning feature set (GLCM * HOG * SOBEL * VGG16) group. The main idea of simple probability matrix averaging is to use each classifier to add samples x_i and divide it by the number of classifiers, which is the final probability matrix. Adaptive weighted average fusion cases are represented as manual feature set (GLCM&HOG&SOBEL) group, manual feature set and VGG16 learning feature set (GLCM&HOG&SOBEL&VGG16) group. The comparison of defect recognition results between the single feature mode and the multi feature mode is shown in Tables 2, 3 and Fig. 5.

Fig. 5

The confusions matrix of different experimental groups after Classifier preferred. (a) GLCM features. (b) SOBEL features. (c) HOG features (d) VGG16 transfer learning features (e) Adaptive weighting decision fusion.

Table 2

The experimental result - Accuracy, Macro-recall, Macro-precision and Macro-F1

Case	System	Accuracy	Macro-recall	Macro-precision	Macro-F1
1	SOBEL	73.5%	72.7%	76.8%	71.2%
	GLCM	80.6%	78.6%	83.3%	79.5%
	HOG	86.7%	83.2%	86.7%	86.4%
	VGG16	95.6%	95.4%	95.8%	95.2%
2	SOBEL+HOG+GLCM	73.5%	70.2%	76.8%	71.9%
	SOBEL+HOG+GLCM+VGG16	80.4%	79.8%	80.2%	78.5%
3	SOBELHOGGLCM	93.1%	85.7%	90.5%	88.3%
	SOBELHOGGLCM*VGG16	97.8%	95.1%	96.3%	95.8%
4	SOBEL&HOG&GLCM	95.3%	90.4%	91.6%	90.0%
	SOBEL&HOG&GLCM&VGG16(ours)	99.0%	98.7%	97.2%	97.5%

The experimental results show that for the handcrafted feature set as shown in Table 2, the edge feature recognition accuracy extracted by Sobel operator is the lowest at 73.5%, the global texture feature recognition accuracy extracted by GLCM is 80.6%, and the local texture feature extracted by HOG recognition accuracy is the highest recognition accuracy of 86.7%. It can be seen that the local texture features contribute the most to the identification of surface defects of the steel plate. Compared with handcrafted features, the convolutional neural network VGG16 using transfer learning has better recognition performance for steel plate images, up to 95.6%, which indicates that deep learning has stronger feature expression ability and can obtain more effective image features.

In order to utilize both handcrafted and learned features, two feature fusion methods are compared, namely feature-level fusion method and decision-level fusion method. As we can see from Table 2, simple feature-level fusion does not improve the classification accuracy. This is because different types of feature sets have huge differences in dimensions, and the selection probability of feature variables in high-dimensional feature sets correspondingly increases, which leads to extreme imbalance in feature set fusion and loss of meaning in low-dimensional feature sets. Therefore, it is impossible to directly integrate these feature sets to obtain satisfactory classification and recognition performance. After using decision level fusion, it is clear that the classification accuracy of handcrafted feature set after average fusion of simple probability matrix is at least 6.4% higher than that of a single handcrafted feature set, and the handcrafted feature set after adaptive weighting average fusion is at least 8.6% higher than that of a single handcrafted feature set. In addition, after adding deep learning VGG16 feature set for decision level fusion, the classification accuracy of simple probability matrix average and adaptive weighting average fusion is 97.8% and 99.0%, respectively, which is 4.7% and 3.7% higher than that of full handcrafted feature set decision level fusion. This shows that when handcrafted features and deep learning features are used together, handcrafted features can be used as a supplement to learning features. Based on the results in Table 2, it can be seen that the proposed adaptive weighting multimodal fusion classification system achieves the highest classification accuracy of 99.0%, which proves that the proposed system has sufficient engineering practical value and can be used for actual strip defect classification. At the same time, the system also achieved the best performance in the other three indicators Macro Recall, Macro Precision and Macro-F1, which proved that the system also has certain advantages in processing the defect image of unbalanced hot rolled strip.

The recall and precision performance indicators of the proposed system are shown in Table 3. It can be seen from Table 3 that our system has a high classification performance for each category in the dataset. Even for Pitted surface and Scratches, which are difficult to identify, the identification rate can be close to 100%. One possible reason why the system can classify these two types of defect 100% accurately is that the characteristics of these two defects are more obvious, which is significantly different from those of other defects. In order to more intuitively see the classification accuracy of each defect category of our proposed system and compare it with the single feature system, we display it in the form of confusion matrix, as shown in Fig. 5.

Table 3

The experimental result - Recall and Precision (percentage, %)

Case	System	Cr		In		Pa		PS		RS		Sc
		R	P	R	P	R	P	R	P	R	P	R	P
1	SOBEL	98.9	67.9	93.3	65.1	63.3	100	46.7	60	96.7	79.1	42.2	88.4
	GLCM	94.4	71.4	92.2	88.3	82.2	87.1	41.1	100	77.8	65.4	95.6	87.8
	HOG	96.7	79.1	82.2	88.1	68.9	79.5	74.4	84.8	100	100	97.8	88.9
	VGG16	100	100	85.6	100	100	98.9	97.8	84.6	100	100	99.3	91.8
2	SOBEL+HOG+GLCM	98.9	67.9	93.3	65.1	63.3	100	46.7	60.0	96.7	79.1	42.2	88.4
	SOBEL+HOG+GLCM+VGG16	84.4	77.3	94.4	96.4	73.3	100	46.6	67.5	96.7	83.5	86.7	89.5
3	SOBELHOGGLCM	97.8	83.8	93.3	91.3	90.0	97.6	77.8	97.2	100	96.7	100	94.7
	SOBELHOGGLCM*VGG16	100	98.9	94.4	100	92.2	98.9	100	85.4	92.2	100	100	94.7
4	SOBEL&HOG&GLCM	98.6	97.6	96.0	95.4	92.2	98.3	84.8	89.4	100	98.8	100	96.4
	SOBEL&HOG&GLCM&VGG16(ours)	100	96.7	97.3	95.5	94.2	100	100	96.5	100	100	100	96.8

5.5 Time analysis

This section counts and discusses the running time of the above 10 systems in the previous section, including data loading time, model loading time, feature extraction time and recognition time. All experimental tests were conducted on a PC with Intel i3-9100F CPU and 8 GB memory. The final running time related experimental results of each system are shown in Fig. 6.

Fig. 6

The average running times of different models.

It can be seen from Fig. 6 that the adaptive multimodal classification system proposed by us takes a long time to extract features and has a large total computational complexity. Compared with the HOG single feature system with feature extraction and recognition complexity of 3.68 s and 1.23 s respectively, our model is 32.2 s and 1.81 s on these two indicators. This shows that the proposed system can achieve better classification accuracy, but at the same time, the computational complexity is higher. In future research, we will try to reduce the computational complexity of the model to reduce its deployment cost.

5.6 Comparisons with the other methods

In order to better demonstrate the advantages of the proposed system, the classification performance of the proposed system is compared with some current advanced classification methods for steel plate surface defects, including the OMFF-RF method integrating HOG feature set and GLCM feature set for feature-level fusion [11], and the AlexNet multi-scale fusion method integrating texture features [14], edge features and HOG features for feature-level fusion [12], ResNet50 as feature extractor for multi-level fusion [32] as shown in Table 4. All algorithms use the same experimental settings. Table 4 studies the robustness and effectiveness of the proposed method in terms of Accuracy and Recall.

Table 4
Performance of the proposed system (percentage, %)

Case Methods Accuracy Cr In Pa PS RS Sc

R P R P R P R P R P R P

1 Wang et al. [11] 93.5 100 93.4 80.0 88.5 100 94.5 87.8 91.4 91.1 95.4 100 96.9

2 Yang et al. [14] 97.1 100 95.6 92.0 94.4 100 99.6 94.8 95.3 96.6 97.4 100 97.8

3 Xu et al. [12] 95.4 100 93.8 84.4 90.4 100 96.5 96.4 94.2 100 93.5 99.0 100

4 Liu et al. [32] 97.2 100 96.4 93.5 96.5 100 98.7 100 95.7 95.7 97.6 99.2 99.6

5 Proposed system 99.0 100 98.7 97.3 99.5 94.2 100 100 96.5 96.8 100 100 96.8

Case	Methods	Accuracy	Cr	In	Pa	PS	RS	Sc
1	Wang et al. [11]	93.5	100	93.4	80.0	88.5	100	94.5	87.8	91.4	91.1	95.4	100	96.9
2	Yang et al. [14]	97.1	100	95.6	92.0	94.4	100	99.6	94.8	95.3	96.6	97.4	100	97.8
3	Xu et al. [12]	95.4	100	93.8	84.4	90.4	100	96.5	96.4	94.2	100	93.5	99.0	100
4	Liu et al. [32]	97.2	100	96.4	93.5	96.5	100	98.7	100	95.7	95.7	97.6	99.2	99.6
5	Proposed system	99.0	100	98.7	97.3	99.5	94.2	100	100	96.5	96.8	100	100	96.8

From the experimental results, we can see that the proposed multimodal fusion system based on manual features and VGG16 transfer learning features has high classification accuracy for each category in the data set, and the final classification accuracy is 99%. For case 1 and 3, if only handcrafted features are used for feature fusion, the classification accuracy of inclusion and pitted surface is not very high. This may be because handcrafted features cannot effectively extract abstract features of these two types of defects, resulting in poor classification effect. In case 2 and 4, the use of deep neural network for multi-scale feature fusion can effectively extract the abstract features of various defects and improve the recognition accuracy. In this study, the results of multimodal fusion of handcrafted features and learned features based on CNN have achieved the best results. This shows that when handcrafted features and learned features are used together, handcrafted features play a complementary role in learned features and can further improve the recognition accuracy of learned features. Therefore, this comparative experiment reflects the superior performance of the system and proves that the system has the potential to be applied to the recognition of steel plate surface defect images.

6 Conclusions

In the proposed work, an adaptive weighting multimodal fusion system is mainly used for defect recognition of steel plate surface defect images. This system combines simple handcrafted features with CNN-based learning features, and develops an effective adaptive weighting decision level fusion method for defect recognition. For this system, three different handcrafted features are used for separate classification of steel plate surface defect images in the first level of information fusion. At the second level, the deep network of transfer learning is used as the feature extractor, and the trained CNN learning features are input to SVM for classification. Finally, an adaptive weighting averaging decision fusion method is used to aggregate the classification results of the four feature sets. The aggregation results show that the learning features extracted from CNN are combined with domain-specific handcrafted features, and the handcrafted features play a complementary role to the learning features of CNN. Compared with the existing classification methods, the classification accuracy of the proposed method is significantly improved on NEU database.

However, for the further study of this paper, there is still much work to be considered. For example, the sample types and number of steel surface defects are limited, and the system’s ability to identify other types of defects is not considered. In addition, this study will also consider reducing the computational complexity of the system in the future, and further verify the effectiveness of the proposed algorithm on more data sets.

References

Brussels, Belgium, July 2022 crude steel production, World Steel Association (2022.7)

, Xu

, Zhou

, and Zhou

D.D.

, Surface defect classification of steels with a new semi-supervised learning method, Optics and Lasers in Engineering 117 (2019), 40–48.

Zou

, Zhang

, Li

Q.Q.

, Qi

X.B.

, Wang

, and Wang

, DeepCrack: Learning Hierarchical Convolutional Features for Crack Detection, IEEE Transactions on Image Processing 28(3) (2019), 1498–1512.

, Song

K.C.

, Meng

Q.G.

, and Yan

Y.H.

, An end-to-end steel surface defect detection approach via fusing multiple hierarchical features, IEEE Transactions on Instrumentation and Measurement 69(4) (2020), 1493–1504.

Feng

X.L.

, Gao

X.W.

, and Luo

, X-SDD: A new benchmark for hot rolled steel strip surface defects detection, Symmetry-Basel 13(4) (2021).

Jain

, Seth

, Paruthi

, Soni

, and Kumar

, Synthetic data augmentation for surface defect detection and classification using deep learning, Journal of Intelligent Manufacturing 33(4) (2022), 1007–1020.

Chen

Y.J.

, Chen

, Liu

X.M.

, Ding

, and Zhang

, Real-time steel inspection system based on support vector machine and multiple kernel learning, 6th International Conference on Intelligent Systems and Knowledge Engineering (ISKE Shanghai, PEOPLES R CHINA, pp. 185–, (2011), 2011–190.

Tang

, Kong

J.Y.

, Wang

X.D.

, and Chen

, Surface inspection system of steel strip based on machine vision, 1st International Workshop on Database Technology and Applications, Wuhan, PEOPLES R CHINA, 2009, pp. 359–362.

Yazdchi

, Yazdi

, Mahyari

A.G.

, and Society

I.C.

, Steel surface defect detection using Texture segmentation based on multifractal dimension, International Conference on Digital Image Processing, Bangkok, THAILAND, (2009), pp. 346–+.

10.

Jie

, Yang

, and Ge

, The cold rolling strip surface defect on-line inspection system based on machine vision, Second Pacific-Asia Conference on Circuits, Communications and System, (2010).

11.

Wang

Y.L.

, Xia

H.B.

, Yuan

X.F.

, Li

, and Sun

, Distributed defect recognition on steel surfaces using an improved random forest algorithm with optimal multi-feature-set fusion, Multimedia Tools and Applications 77(13) (2018), 16741–16770.

12.

C.L.

, Li

L.S.

, Li

J.W.

, and Wen

C.A.B.

, Surface defects detection and identification of lithium battery pole piece based on multi-feature fusion and PSO-SVM, IEEE Access 9 (2021), 8523–85239.

13.

Wang

X.H.

, Gao

L.L.

, Song

J.K.

, and Shen

H.T.

, Beyond frame-level CNN: saliency-aware 3-D CNN with LSTM for video action recognition, IEEE Signal Processing Letters 24(4)(2017), 510–514.

14.

Yang

, Fan

J.F.

, Huo

B.Y.

, and Liu

Y.H.

, Inspection of welding defect based on multi-feature fusion and a convolutional network, (4), Journal of Nondestructive Evaluation 40 (2021).

15.

Tun

N.L.

, Gavrilov

, Tun

N.M.

, Trieu

, and Aung

, Remote sensing data classification using a hybrid pre-trained VGG16 CNN-SVM Classifier, IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus), Saint Petersburg Electrotechn Univ, RUSSIA, 2021, pp. 2171–2175.

16.

G.H.

, Yu

L.S.

, Yuan

H.T.

, Xiao

W.B.

, and He

Y.S.

, A vision-based method for lap weld defects monitoring of galvanized steel sheets using convolutional neural network, Journal of Manufacturing Processes 64 (2021), 130–139.

17.

Kosala

, Harjoko

, Hartati

, and Assoc

, Comp, License Plate Detection Based on Convolutional Neural Network - Support Vector Machine (CNN-SVM), International Conference on Video and Image Processing (ICVIP), Singapore, SINGAPORE, (2017), pp. 1–5.

18.

Gayathri

, Gopi

V.P.

, and Palanisamy

, A lightweight CNN fordiabetic retinopathy classification from fundus images, Biomedical Signal Processing and Control 62 (2020).

19.

Wang

Z.M.

, He-Na

L.I.

, Zhang

, and Xia

H.J.C.E.

, Fusing convolutional neural network and support vector machine for expression recognition,, Computer Engineering and Design (2019).

20.

Liu

S.R.

, Tang

X.Y.

, and Wang

, IEEE, Facial expression recognition based on sobel operator and improved CNN-SVM, 3rd IEEE International Conference on Information Communication and Signal Processing (ICICSP), Electr Network, (2020), pp. 236–240.

21.

Pan

X.Z.

, Fusing HOG and convolutional neural network spatial-temporal features for video-based facial expression recognition, Iet Image Processing 14(1) (2020), 176–182.

22.

Ranipa

, Zhu

W.P.

, and Swamy

M.N.S.

, IEEE, Multimodal CNN Fusion Architecture With Multi-features for Heart Sound Classification, IEEE International Symposium on Circuits and Systems (IEEE ISCAS), Daegu, SOUTH KOREA, 2021.

23.

Abdoli

, Cardinal

, and Koerich

A.L.

, End-to-end environmental sound classification using a 1D convolutional neural network, Expert Systems with Applications 136 (2019), 252–263.

24.

Krizhevsky

, Sutskever

, and GE

, Imagenet classification with deep convolutional neural networks, Adv Neural Inf Process Syst 25 (2012), 1097–1105.

25.

Golrizkhatami

, and Acan

, ECG classification using three-level fusion of different feature descriptors, Expert Systems with Applications 114 (2018), 54–64.

26.

Pour

F.T.

, Saberi

, Rezaei

, and Ershad

S.F.

, Texture classification approach based on combination of random threshold vector technique and co-occurrence matrixes, International Conference on Computer Science and Network Technology (ICCSNT), Harbin Normal Univ, Harbin, PEOPLES R CHINA, (2011), pp. 2303–2306.

27.

Chaudhari

, and Kulkarni

, Cerebral edema segmentation using textural feature, Biocybernetics and Biomedical Engineering 39(3) (2019), 599–612.

28.

Mathur

, Mathur

, and Mathur

, A novel approach to improve sobel edge detector, 6th International Conference on Advances in Computing and Communications (ICACC), Rajagiri Sch Engn & Technol, Kochi, INDIA, (2016), pp. 431–438.

29.

Zhang

Y.J.

, Zou

Y.J.

, Fan

H.S.

, Liu

W.J.

, and Cui

Z.W.

, Pedestrian detection based on I-HOG feature, International Symposium on Artificial Intelligence and Robotics, Fukuoka, JAPAN, 2021.

30.

Tian

, Zhang

Q.C.

, Li

, and Wang

Z.D.

, Feature fusion-based preprocessing for steel plate surface defect recognition, Mathematical Biosciences and Engineering 17(5) (2020), 5672–5685.

31.

Song

K.C.

, and Yan

Y.H.

, A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects, Applied Surface Science 285 (2013), 858–864.

32.

Liu

, and Gao

, Surface defect detection method of hot rolling strip based on improved SSD model, 26th International Conference on Database Systems for Advanced Applications (DASFAA), Electr Network, (2021), pp. 209–222.