Abstract
Desulphurization is essential in the steelmaking process for high-quality steel production, and sulphide capacity has proven to be an effective index to evaluate the desulphurization ability of molten slag or flux. Several analytical or empirical models have been proposed to calculate the sulphide capacity. However, these models usually show insufficient generalization ability when new variables/data are introduced, which limits their practical application. In this work, experimental data were collected from the literature and a regularized extreme learning machine (RELM) model was established to predict the sulphide capacity of the CaO–SiO2–MgO–Al2O3 slag system. The results demonstrated that the proposed model is robust for the prediction of sulphide capacity under different conditions. The coefficient of determination (R 2), correlation coefficient (r), root-mean-square error (RMSE) of the optimal model reached 0.9763, 0.9881, 0.113, respectively, which outperform the results of the reported models.
Introduction
The demand for sulphur control is becoming increasingly stringent in molten steel to produce high-quality steel. A much more reliable slagging regime is also highly required in the desulphurization process due to the deteriorating ore quality [1–4]. The sulphide capacity (Cs) proposed by Richardson et al. [5], was a well-recognized index to evaluate the desulphurization ability of slag in the steelmaking and has attracted great attention to quantify the value of Cs under various conditions, such as temperature and basicity.
There are currently three strategies to calculate or predict the Cs in the literature. One is to develop empirical models and then obtain the correlation between Cs and other parameters, including composition of slag, optical basicity and processing temperature etc. [6]. Some typical models were proposed by Sosinsky et al. [7], Young et al. [8] and Zhang et al. [9]. The calculation results of these models are in good agreement with experimental results, but only within a limited range of composition and temperature. Once the values of these parameters exceeded the boundaries of the certain composition and temperature, the obtained calculate deviation with the high value resulted in the developed model fail for practical application. The second strategy based on the principle of Short Range Order (SRO) has been incorporated into some commercial software packages, such as FactSage© and ThermoSlag to calculate the Cs [10–12] under a large range of conditions. FactSage© is a multi-module software for thermodynamic simulation, which is well suited for calculating thermodynamic data with visualized diagram or figure, such as phase diagrams, phase equilibria, E-pH diagrams, heat balances, etc. However, the calculation process is complicated when the Cs is calculated [1]. ThermoSlag is a software developed based on KTH model for predicting the thermodynamics and thermophysical properties of slag. However, too many parameters need to be optimized in KTH model [9]. The third strategy is based on the burgeoning intelligent algorithm to establish the nonlinear correlation between Cs and other parameters, and this approach has gained a growing interest in metallurgical research field due to its features of simple operation and high precision. Recently, Derin et al. [13] predicted the sulphide capacity in the binary and multi-component melts system under various temperatures using a neural network approach. The calculated results matched well with the experimental results. Ma et al. [1] used an Artificial Neural Network (ANN) to predict the sulphide capacity and achieved promising performance by comparing with the known empirical models.
Another widely used algorithm is extreme learning machine (ELM), which has been proposed for training single hidden layer feedforward neural networks (SLFNs). The ELM has two main advantages [14,15]: (1) The connection weight between the input and the hidden layers and the threshold value of the hidden layers are randomly generated in the ELM algorithm, and the unique optimal solution is obtained quickly by setting the neuron number of the hidden layers. This arrangement can avoid the repeated adjustment of connection weight and threshold value in the conventional neural network. (2) The connection weight (β) between the hidden layer and the output layer can be determined by solving a system of equations. This solution could allow better generalization ability and higher calculation efficiency than that from the principle of the iterative algorithms, such as ANN. For instance, Chen et al. [16] applied the ELM model to predict the quality of continuous-casting billets, showing the prediction accuracy of ELM model is the highest by comparing with the back propagation (BP) neural network model and the BP neural network model improved by genetic algorithm. Guan et al. [17] used the intelligent algorithm combining ELM and the algorithm of image processing in image classification indicating superior to the conventional intelligent classification method by comparing label consistency, detail preservation, and computational speed. Zou et al. [18] applied the regularized extreme learning machine (RELM) model to predict the carbon segregation index of continuous-casting billets, showing the prediction accuracy of RELM model is the highest by comparing with the multiple linear regression model and the ELM model. However, to the best of our knowledge, few efforts have been made to use the ELM algorithm for the prediction of the Cs value.
In this work, an attempt has been made to develop a modified ELM to predict Cs in a popularly used desulphurization slag system under different conditions. According to experimental data that collected from the previous studies, the correlation was established between the input and output variables. The novel RELM model was concretely utilized to predict the Cs in CaO–SiO2–MgO–Al2O3 slag system and its performance was evaluated by using different statistical evaluation indexes. It is expected to build a more reliable and efficient mathematical model to calculate the Cs so as to promote high-efficiency desulphurization in the steelmaking process.
Cs calculation using RELM model
Analysing of database and data
The principal equilibrium reaction in the process of desulphurization between slag and gas is expressed as follows:
The equilibrium constant (K) of Reaction (1) is displayed as:
The mean sulphide capacity (Cs) works as a function of temperature and slag composition, and the definition of Cs is expressed by Richardson et al. [5] using Equation (3):
Experimental data used for the RELM prediction of Cs.
A scatter diagram and a Pearson correlation coefficient were used to determine the relationship between these parameters. The scatter matrix visualization of the data sets was performed using the Pandas Module and the Matplotlib Module in the Python language calculating program, as shown in Figure 1. In Figure 1, the histograms on the diagonal of the figure show the data distribution of a single variable, and the scatter plots on the upper and lower triangles show the relationship between two variables. For instance, the left-most graph in the bottom row of Figure 1 shows the relationship between the temperature and the logarithm of Cs (LogCs). The value of temperature ranged from 1313 to 1928K. The range of the weight per cent for CaO, SiO2, MgO, and Al2O3 was 6.77–68.02, 0–84.50, 0–23.10, and 7–61.72 wt-%, respectively. The range of the weight per cent for sulphur content in slag was from 2.94 × 10−4 to 2.70 wt-%. The value of Scatter plot matrix visualization of (wt-%CaO), (wt-%SiO2), (wt-%MgO), (wt-%Al2O3), (wt-%S)slag, temperature,

Figure 2 shows the correlation between input variables and output variables that analysed using the Numpy Module and the Matplotlib Module of the Python program. In Figure 2, the radial distance represents the value of the correlation coefficient between Log CS and temperature, (wt-%CaO), (wt-%SiO2), (wt-%MgO), (wt-%Al2O3), (wt-%S)slag, Correlation analysis results between input variables and output variables.

In the right-bottom of Figure 2, the value of Pearson correlation coefficient between the Cs and (wt-%MgO) was 0.08, indicating the Cs weakly correlated with the composition of (wt-%MgO) in this CaO–SiO2–MgO–Al2O3 slag system. Subsequently, the significance of correlation coefficient was tested by using the Student’s t-Test, as shown in Eq. (5). When the P-value of significance probability is less than 0.05, it means that the correlation of variables is significant. When the P-value is less than 0.01, it means that the correlation of variables was very significant. When the P-value is greater than 0.05, it means that the correlation of variables is not significant [30].
The calculation results of P-value between Log Cs and input parameters.
In the conventional neural network, the connection weight (β) between the hidden layer and the output layer is determined by using multiple iterations, which causes slow convergence rate of the network and even falls into local minimum easily. Therefore, Huang et al. [15,31] proposed a novel neural network – ELM, which exhibited fast computation speed compared with the traditional BP neural network.
Figure 3 shows the structure diagram of ELM networks with the three layers, including the input layer, the hidden layer and the output layer. Normally, the ELM model is constructed based on the least squares loss function in the statistics. In the ELM model, only the empirical risk is considered to minimize without the consideration of structural risk, which may lead to over-fitting of the model. In order to establish an excellent model, both empirical risk and structural risk should be considered in the ELM model. Therefore, for the sake of further improving the generalization ability of the conventional ELM, the regularization coefficient used for adjusting the proportion of empirical risk and structural risk was introduced to establish the RELM models [32]. Structure diagram of ELM networks.
The output function of the ELM model can be expressed by Equation (6):
The objective function of the ELM network is shown in Equation (7):
The minimum of the objective function can be obtained to calculate wi , βi , bi in this case. The introduction of the regularization coefficient can significantly enhance the generalization ability of the ELM approach by adjusting the proportion of empirical risk and structural risk. The RELM model can, therefore, be summarized in the following four sections:
(1) The objective function of the RELM model is shown in Equations (8) and (9):
(2) The Lagrangian function is constructed as shown in Equation (10):
(3) The
(4) The prediction model of the RELM obtained when combining Equation (11) and Equation (6) can be expressed as Equation (12).
In this study, the 298 groups of experimental data were divided into two categories, of which the 198 groups were used to train the model, and the left 100 groups were used to test the model. The flowchart of the entire process is illustrated in Figure 4. The prediction results of Cs values were calculated under the different input variables and activation functions. As shown in Table 3, shown in Model 1 to Model 3 are the input variables and activation functions, which do not contain (wt-%MgO); Model 4 to Model 6 show the input variables and activation functions, which contain (wt-%MgO). In the RELM model, the main activation functions include Sinusoidal (Sin), Sigmoid (Sig), and Hardlim. The regularization coefficient (λ) was determined according to the hit rate of the error between the calculated value and the experimental value of Log CS within the range of ±0.3. A flowchart of using RELM model to process the data.
Definition of RELM models 1–6 according to the input variables and activation functions.
In Figure 5, it has a regularization coefficient and a corresponding hit rate under different activation functions when the error between the Log Cs obtained by using experiments and the Log Cs obtained by using RELM model is within the range of ±0.3. The hit rate of models reaches the maximum when the regularization coefficient is 0.8. Regularization coefficient vs. hit rate under conditions of Sinusoidal (Sin), Sigmoid (Sig), and Hardlim.
The performance of RELM models was evaluated according to different statistical evaluation indexes, including the coefficient of determination (R
2), mean square error (MSE), and root-mean-square error (RMSE). The mathematical definitions of the above indexes are displayed in Equations (13)–(15):
The effect of MgO and activation function on Cs
Results of various models.
Results of various models.
The prediction results shown in Table 4 also indicate that the Model 1 has the best prediction effect, when (wt-%MgO) is not contained and the activation function is ‘Sin’. Besides, the model 4 has the best prediction effect, when (wt-%MgO) is contained and the activation function is ‘Sin’. The comparison results between the prediction value and the experimental value are shown in Figure 6. The dashed lines in these figures indicate the ideal line of Cs (prediction value = experimental value) as with Figure 6. In Figure 6, it can be seen that the hit rate from Model 1 to Model 3 is 99%, 93%, and 88%, respectively, when the variable of (wt-%MgO) is not included and the error between the Log Cs obtained by using experiments and the Log Cs obtained by using RELM model within the range of ±0.3; the hit rate from Model 4 to Model 6 is 99%, 93%, and 90%, respectively, when the variable of (wt-%MgO) is included and the error within the range of ±0.3. In summary, Model 1 and Model 4 are closer to ideal state than other models, and the activation function of sinusoidal shows better performance in the RELM models. Comparison between experimental and calculated Cs values obtained by (a) Model 1, (b) Model 2, (c) Model 3, (d) Model 4, (e) Model 5, and (f) Model 6.
In order to verify the accuracy of the RELM models, the results of RELM models were compared with the known empirical models and an intelligent model [1, 7–11]. The prediction results of available models are shown in Table 5. The Zhang et al.’s model [9] has the best prediction effect among the known empirical models, the R
2 and the RMSE of which are 0.9094, 0.1935, respectively. In the intelligent model, the R
2 and the RMSE of the ANN model [1] are 0.9378, 0.1860, respectively. In the present study, the R
2 and the RMSE of the RELM Model 4 are 0.9763, 0.1143, respectively. These results show that the RELM Model 4 is better than the known empirical models and intelligent model by comparing the R
2 and the RMSE. In the next section, the RELM Model 4 is compared and analysed in detail with the optimal empirical model and intelligent model as shown in Figure 7. Comparison between calculated and experimental sulphide capacities of (a) RELM Model 4 and Zhang et al.’s Model; (b) RELM Model 4 and ANN Model.
Comparison of the previous models.
The calculated values of sulphide capacity by Zhang et al.’s model and RELM- Model 4 are compared with the experimental values as shown in Figure 7(a). It is evident from this figure that the RELM Model 4 has higher prediction accuracy than the Zhang et al.’s model when the sulphide capacity is in the high range. The cause analysis is as follows: (1) In the RELM models, the number of neurons is large, and the model has a strong ability to process input variables. The RELM model can give a more accurate output when the input variables are close to the training samples. (2) The traditional regression model is difficult to be applied, when the relationship between the input variable and the output variable is not a direct linear correlation. However, RELM models have a strong nonlinear approximation ability, which can make the model prediction more accurate. (3) In the traditional linear regression model, the modelling process has been limited by the original features of the data (input variables). However, in the RELM models, instead of using the original features, the features of the hidden layer obtained by training network are used as new features, which is similar to the data pretreatment process, which thus brings about better prediction effect than the traditional linear regression model.
In Figure 7(b), the hit rate of the RELM Model 4 and the ANN model are 99% and 92%, respectively, when the error is within the range of ±0.3. The results show that the generalization ability of the RELM model outperforms the ANN model. Although RELM models have high prediction accuracy in this study, there still remains an error to improve. The contributing reasons could be as follows: (1) LF refining is a complicated physical and chemical process; there is a complex nonlinear relationship between the components, the temperature and the sulphide capacity. The complex process cannot be described accurately by using the RELM models. (2) In the process of obtaining the original experimental data, there are inevitably some errors of measurement and component analysis, which have a negative effect on the establishment of RELM model and the prediction results.
The Cs is a well-recognized index to evaluate the desulphurization ability of slag in the steelmaking process. Compared the results of different previous models with the current RELM models, the RELM models show the advantages of easy operation, fast speed of calculation and high accuracy of prediction. Thus, it can be used in predicting the Cs in real time under different composition and condition, which is of significance for quantifying the desulphurization ability of slag and optimizing the slag system. Meanwhile, intelligent iron/steel manufacturing has been drawing increasing attention in recent years. The calculation of Cs can be used to build a mathematical model for desulphurization in steelmaking practices. Specifically, when the Cs calculation model is obtained, a desulphurization model can be established by using the relationship between the Cs, the sulphur partition ratio, the mechanism of desulphurization, and the conservation of matter. The desulphurization model is able to realize the calculation and automatic charging of slagging agents (e.g. lime) in LF process. This will be detailed in our future work. In addition, the RELM model can also be applied to solve other multivariate, nonlinear and strong coupling problems in steelmaking practices, such as predicting the molten steel temperature and the quality of continuous-casting billets [16,33].
In the present study, the RELM intelligent algorithm was applied to predict the sulphide capacity in the CaO–SiO2–MgO–Al2O3 slag system under various conditions and the following conclusions can be drawn. Based on the analysis of Pearson correlation coefficient between the Cs and (wt-%MgO), the results indicated that the Cs is weakly correlated with the (wt-%MgO). Meanwhile, the computation speed and the RMSE of RELM models had little change with or without the (wt-%MgO). Thus, the (wt-%MgO) showed relatively little effect on the calculation of Cs by using RELM models. The activation function of sinusoidal has shown better performance in the RELM models. The comparison results showed that the sulphide capacity values of RELM Model 1 and RELM Model 4 are found in good agreement with the experimental results. When the errors between the predictive value and the experimental value were within ±0.3, the hit rate of both Model 1 and Model 4 were 99%. The RELM Model 4 tends to be more effective when the sulphide capacity is in the high range. The hit rate of the RELM Model 4 and the ANN model were 99% and 92%, when the error was within the range of ±0.3. The RELM model demonstrated better robustness and higher prediction accuracy. The RELM model is feasible to calculate the Cs of the CaO–SiO2–MgO–Al2O3 slag system with the advantage of easy operation, fast speed of calculation and high accuracy of prediction. Meanwhile, according to the calculation results of Cs based on RELM, the desulphurization ability of slag can be quantified and optimized, and thus a good foundation can be laid for the development of a mathematical desulphurization model, which is of great benefit to the intelligent control of steelmaking process.
Footnotes
Disclosure statement
No potential conflict of interest was reported by the author(s).
