A novel modeling approach and its application in polymer quality index prediction

Abstract

Accurate and reliable quality index prediction is indispensable in quality control of the industrial propylene polymerization (PP) processes. This paper presents a novel modeling approach for quality index prediction based on optimal fuzzy wavelet neural network (FWNN) with improved gravitational search algorithm (IGSA), where the constant or a linear function of inputs in conclusion part of traditional TSK fuzzy model is replaced with wavelet neural network (WNN). Then, an online learning algorithm of the FWNN model is derived by using gradient descent algorithm, and an IGSA algorithm is proposed to online adapt the learning rates of FWNN. Research on the proposed soft sensor is carried out with the data from a real industrial PP plant, and the results are compared among the WNN, FWNN and IGSA-FWNN models. The research results show that the proposed prediction model achieves a good performance in practical industrial quality index, melt index, prediction process.

Keywords

Soft sensor quality index prediction fuzzy wavelet neural network improved gravitational search algorithm propylene polymerization

Introduction

Polymerization process is one of the most important processes in petroleum chemical industry. The advanced monitoring and control of polymerization processes, in particular the quality index of polymer products, is of major strategic importance to the polymer manufacturing industry. The melt index (MI) of polypropylene, which determines different grades of the product, is one of the most significant quality indexes in practical propylene polymerization (PP) processes. Therefore, building a useful MI prediction model has significant importance. However, the MI of polypropylene is usually sampled online and then measured offline with an analytical procedure in the laboratory, which is not only costly but also time consuming (1.5–2h). Therefore, the development of MI on-line estimation model is significant, not only as an online sensor but also as a forecasting system. It is a usual practice to estimate MI values in terms of measurable key variables such as temperature, pressure, feed rates of each reactants, and so forth.

Many forecasting approaches have been proposed to increase MI forecasting accuracy. Statistical models, which are known as time-series-based models, only employ historical data. This method builds the relationship between the inherent characteristics of a system and the measured data with historical inputs data. The artificial neural network (ANN) is the most popular method for its excellent performance in approximating continuous nonlinear systems (Ding et al., 2015a; Graves et al., 2016; He, 2016; Kim and Lim, 2015; Liu and Wang, 2013; Meng and Chen, 2013), which has been used in many researches on MI forecasting. Shi et al. (2006) proposed a novel soft sensor model with principal component analysis, radial basis function networks and multi-scale analysis that provides promising prediction reliability and accuracy. Zhang et al. (2006) presented a bootstrap aggregated neural network (NN) model, which could overcome the problems of the single NN, such as over-fitting and the lack of generalization capability. Li et al. (2012) researched the MI prediction by adaptively aggregated RBF NN trained with novel ACO algorithm. For this reason, many other artificial intelligence methods were proposed (Ding et al., 2015b, 2015c; Guo et al., 2015; Shi et al., 2014; Zhou and Luo, 2015). Han et al. (2005) employed three approaches, which were supported vector machines (SVM), partial least squares (PLS) and ANN for MI estimation of SAN (styrene-acrylonitrile) and PP process. Ahmed et al. (2013) proposed two parameter update schemes that employ recursive update of PLS model parameters for MI prediction in high density polyethylene processes. Jiang et al. (2012) developed a new MI prediction method by introducing relevance vector machine (RVM). Zhang and Liu (2016) presented a real-time soft sensor based on optimized least squares support vector machine (LSSVM) and the online correcting strategy (OCS). In spite of improvements in MI forecasting approaches, MI forecast even suffers from high errors.

Fuzzy logic systems, first introduced by Zadeh (1973, 1974) and then applied by Kickert and Mamdani (1978), have been a powerful tool for modeling nonlinear processes (Affonso et al., 2015; Melo and Watada, 2016; Mohammed and Lim, 2015). To improve the learning abilities of the fuzzy logic-based systems, the fuzzy neural networks (FNNs), which possess the capability of fuzzy reasoning in handling uncertain information and the capability of ANNs in learning from processes, have been proposed. It has been a powerful tool for modeling nonlinear processes owing to its ability to approximate nonlinear and uncertainties systems without the requirement of mathematical models (Amjady, 2006; Pratama et al., 2016; Tang et al., 2017). Furthermore, owing to the ability of wavelet transformation for revealing the property of function in localize region, the wavelet neural networks (WNNs) have been developed by combining the wavelets with the NNs (Chen et al., 2006; Zhao et al., 2014). The WNN converges quickly and give high precision with reduced network size because of the time-frequency localization properties of wavelets. Taking account of the neural networks’ self-learning ability, fuzzy logic’s handling uncertainty capacity and wavelet transforms’ superiority of analyzing local details, a fuzzy wavelet neural network (FWNN) has been proposed (Abiyev et al., 2013; Solgi and Ganjefar, 2018; Yilmaz and Oysal, 2010). Their combination allows us to develop a system with fast learning capability that can describe a nonlinear system characterized with uncertainties (Abiyev and Kaynak, 2008).

Gravitational search algorithm (GSA) is a novel heuristic optimization method proposed by Rashedi et al. (2009). The attractive features of GSA include ease of implementation and the fact that no gradient information is required. However, like other heuristic algorithms, it takes more calculation time than traditional gradient descent algorithm when achieving the similar accuracy. The global search ability of GSA is strong, but its local search capability is insufficient. At the later stage of iteration, due to the appearance of heavier masses, the GSA becomes dull and requires more time to reach the optimal solution (Sombra et al., 2013). By introducing global memory and group communication, an improved gravitational search algorithm (IGSA) is proposed in our work.

In this paper, a novel fuzzy wavelet neural network (FWNN) is proposed, which uses the concepts of fuzzy logic in combination with WNN. The model replaces the constant or a linear function of inputs in conclusion part of traditional TSK fuzzy model with WNN. Each fuzzy rule corresponds to a WNN consisting of several wavelets with adjustable translation and dilation parameters. In the aspect of optimizing the FWNN, this paper combines the IGSA with the gradient descent algorithm in consideration of the IGSA’s disadvantage of taking more calculation time for a relatively good accuracy. First, the structure learning algorithm of the FWNN model is derived using gradient descent algorithm, which contains the relationship between all the parameters and their learning rates. In addition, an IGSA is proposed to online adapt the learning rates to further improve the online learning capability of FWNN. Thus, the optimal prediction model of MI, IGSA-FWNN, is obtained. Finally, research on the proposed model is carried out with the data from a real industrial PP plant.

The rest of this paper is organized as follows. Section 2 introduces architecture of the proposed FWNN, the structure learning algorithm and learning rates adjustment with IGSA. Section 3 presents the simulation results and performance analysis of the proposed model; conclusion is given in Section 4.

Methods and prediction

WNN

WNN, which incorporates wavelet functions into a neural network, has been proposed by Zhang and Benveniste (1992). It is built based on BP neural network structure, in which the transfer function of hidden layer nodes is the wavelet basis function. At the same time, error reverse transmission is employed to optimize the initial parameters of wavelet function of neural network. The basic wavelet theory is as follows.

For any function $ψ (x) \in L^{2} (R)$ , if it satisfies the admissibility condition (Daubechies, 1992)

C_{ψ} = \int_{0}^{+ \infty} \frac{| \hat{ψ} (ω) |^{2}}{ω} d ω < + \infty

(1)

where $ψ (x)$ is the mother wavelet function, $\hat{ψ} (ω)$ is the Fourier transform of $ψ (x)$ . A doubly parameter family of wavelets can be generated by translating and dilating this mother wavelet

ψ_{a, b} = | a |^{- 1 / 2} ψ (\frac{x - b}{a}), (a, b \in R, a \neq 0)

(2)

where a is the dilation parameter, and b is the translation parameter. They can be used to control the magnitude and position of $ψ (x)$ .

Grossmann and Morlet (1984) proved that any function $f (x)$ in $L^{2} (R)$ can be represented by equation (3)

f (x) = C_{ψ} \int \int Wf (a, b) | a |^{- 1 / 2} ψ (\frac{x - b}{a}) \frac{1}{a^{2}} dadb

(3)

where $Wf (a, b)$ is the continuous wavelet transform of $f (x)$ and is given by equation (4)

Wf (a, b) = | a |^{- 1 / 2} \int_{- \infty}^{+ \infty} ψ (\frac{x - b}{a}) f (x) dx

(4)

The network topology structure of the WNN is shown in Figure 1. As shown in Figure 1, $x = (x_{1}, x_{2}, \dots, x_{n})^{T}$ is the input vector of WNN, $y = (y_{1}, y_{2}, \dots, y_{m})^{T}$ denotes the network output of WNN. For the modeling of multivariable processes, multi-dimensional wavelets must be defined. In the present work, multi-dimensional wavelets are defined as equation (5)

Ψ_{i} (x) = Π_{j = 1}^{n} ψ (\frac{x_{j} - b_{ij}}{a_{ij}}), i = 1, 2, \dots, M

(5)

where $b_{i} = (b_{ij})$ and $a_{i} = (a_{ij})$ are the translation and dilation vectors, respectively. The network output $y = (y_{1}, y_{2}, \dots, y_{m})^{T}$ is calculated by equation (6)

y_{k} = \sum_{i = 1}^{M} ω_{ik} Ψ_{i}, k = 1, 2, \dots, m

(6)

where $ω_{ik}$ is the connection weights between the hidden layer and the output layer, M represents the number of hidden layer nodes, and m is the number of output layer nodes.

Figure 1.

Structure of the WNN.

Several wavelet mother functions have been proposed in the wavelet theory. Each mother function has its suitable application. In this work, the wavelet employed is the Morlet wavelet, because of its directional selectiveness capability of detecting oriented features, fine tuning to specific frequencies, and its good localization in time and frequency (Nason, 2008). This is a sinusoidal signal modulated by a Gaussian wave. It is characterized for its narrow frequency response, which offers a higher spectral resolution than the Mexican hat wavelet. This wavelet is particularly useful for filtering out the background noise of the images. In this paper, the wavelet employed is the Morlet wavelet, which is described as follows

ψ_{a, b} (t) = e^{- t^{2} / 2} \cos (1.75 t)

(7)

where $t = \frac{x - b}{a}$ .

Architecture of the proposed FWNN

FWNN is illustrated as a five-layer network structure in Figure 2, including the input layer, membership layer, rule layer, wavelet and consequent layer, and output layer. The fuzzy rules of the proposed FWNN are of the following form

\begin{matrix} R_{k} : IF x_{1} is A_{k 1} and x_{2} is A_{k 2} and x_{n} is A_{kn}, \\ Then ψ_{k} is \sum_{i} ω_{ik} ϕ_{ik} (x_{i}) . \end{matrix}

(8)

where $x_{1}, x_{2}, \dots, x_{n}$ are the input variables, $ψ_{1}, ψ_{2}, \dots, ψ_{M}$ are the output variables, $A_{kj}$ is the kth fuzzy set with Gaussian membership functions, $ω_{ik}$ are the connecting weights.

Figure 2.

Network structure of the proposed FWNN.

Furthermore, the relationship function between input and output in each layer is described as follows:

Layer 1 (Input layer): Each node in this layer represents one input variable and directly transmits the input values $\vec{x} = [x_{1}, \dots, x_{n}]$ to the next layer while n is the number of input variables.

Layer 2 (Membership layer): Each output from layer 1 is served as the input of a membership function. The values of the corresponding membership function are calculated through the following Gaussian function

{net}_{j}^{2} = - {(\frac{x_{j} - m_{j}}{σ_{j}})}^{2}

(9)

y_{j}^{2} = \exp ({net}_{j}^{2}), j = 1, 2, \dots, M \cdot n

(10)

where $m_{j}$ denotes the center and $σ_{j}$ denotes the standard deviation for membership function while M is the number of rules.

Layer 3 (Rule layer): Since the nodes in the rule layer represent the preconditioning part of one fuzzy logic rule, the node in this layer is denoted by $Π$ and multiplies the incoming signals from layer 2. For the kth rule node

y_{k}^{3} = \underset{j}{Π} ω_{jk}^{3} y_{j}^{2}, k = 1, 2, \dots, M

(11)

where $ω_{jk}^{3}$ denote the connective weights between the membership layer and the rule layer and they are set to be 1.

Layer 4 (Wavelet and consequent layer): The wavelet layer and the consequent layer are included in this layer. The wavelet layer accepts the variables $x_{1}, x_{2}, \dots, x_{n}$ as input signals, which consists of M WNNs and each network corresponds to a consequent part of a fuzzy rule. $ψ_{k}$ is the consequence of the wavelet layer and expressed as follows

ψ_{k} = \sum_{i} ω_{ik} ϕ_{ik} (x_{i})

(12)

ϕ_{ik} (x_{i}) = \underset{i}{Π} φ (\frac{x_{i} - b_{ik}}{a_{ik}})

(13)

Moreover, the nodes in the consequent layer multiply the incoming signals and can be represented as follows

y_{k}^{4} = \underset{k}{Π} ψ_{k} ω_{k}^{4} y_{k}^{3}

(14)

where $ω_{k}^{4}$ denote the connective weights between the rule layer and the consequent layer and they are set to be 1.

Layer 5 (Output layer): The node in this layer performs the defuzzification method to get the final output. Here, a weighted sum is used as the defuzzification function. The final network output can then be calculated according to the following function

\hat{y} = \sum_{k} ω_{k}^{5} y_{k}^{4}

(15)

Structure learning algorithm of FWNN

The parameter vector in proposed-FWNN to be updated is $Θ = (m_{j}, σ_{j}, b_{ik}, a_{ik}, ω_{ik}, ω_{k}^{5})$ consisting of the center parameters $m_{j}$ and standard deviation parameters $σ_{j}$ of the membership functions in layer 2; translation parameters $b_{ik}$ , dilation parameters $a_{ik}$ of wavelet functions, weight parameters $ω_{ik}$ in the wavelet layer of layer 4; the connective weights $ω_{k}^{5}$ in layer 5. In this section, the structure learning algorithm of the FWNN model is derived using gradient descent algorithm, which contains the relationship between all the parameters and their learning rates.

In gradient descent algorithm, parameters $Θ = (m_{j}, σ_{j}, b_{ik}, a_{ik}, ω_{ik}, ω_{k}^{5})$ are adjusted in the opposite direction of the gradient of the objective function defined by:

E (Θ, x, \hat{y}) = \frac{1}{2} (\hat{y} - y)^{2}

(16)

where $\hat{y}$ and y denote the predicted result and the actual measure value, respectively.

The update rules for the parameters in the FWNN are described as follows

Θ (t + 1) = Θ (t) + Δ Θ

(17)

\begin{matrix} Δ Θ = (- η_{m} \frac{\partial E}{\partial m_{j}}, - η_{σ} \frac{\partial E}{\partial σ_{j}}, - η_{b} \frac{\partial E}{\partial b_{ik}}, \\ - η_{a} \frac{\partial E}{\partial a_{ik}}, - η_{ω 1} \frac{\partial E}{\partial ω_{ik}}, - η_{ω 2} \frac{\partial E}{\partial ω_{k}^{5}}) \end{matrix}

(18)

where $η = (η_{m}, η_{σ}, η_{b}, η_{a}, η_{ω 1}, η_{ω 2})$ are the learning rates. The values of derivatives in equation (18) can be calculated using the back-propagation (BP) algorithm by the following formulas equations (19)–(28).

Layer 5: The error term to be propagated in this layer is calculated as follows

δ_{o}^{5} = - \frac{\partial E}{\partial \hat{y}} = - (\hat{y} - y)

(19)

Accordingly, the increment of the connective weight $ω_{k}^{5}$ is calculated by

Δ ω_{k}^{5} = - η_{ω 2} \frac{\partial E}{\partial ω_{k}^{5}} = - η_{ω 2} \frac{\partial E}{\partial \hat{y}} \frac{\partial y}{\partial ω_{k}^{5}} = η_{ω 2} δ_{o}^{5} y_{k}^{4}

(20)

Layer 4: The error term to be propagated in this layer is calculated as follows

δ_{k}^{4} = - \frac{\partial E}{\partial y_{k}^{4}} = - \frac{\partial E}{\partial \hat{y}} \frac{\partial y}{\partial y_{k}^{4}} = δ_{o}^{5} ω_{k}^{5}

(21)

The increment of the weight parameters $ω_{ik}$ in the wavelet layer is calculated by

Δ ω_{ik} = - η_{ω 1} \frac{\partial E}{\partial ω_{ik}} = - η_{ω 1} \frac{\partial E}{\partial y_{k}^{4}} \frac{\partial y_{k}^{4}}{\partial ψ_{k}} \frac{\partial ψ_{k}}{\partial ω_{ik}} = η_{ω 1} δ_{k}^{4} y_{k}^{3} ϕ_{ik}

(22)

The increment of dilation parameters $a_{ik}$ is calculated by

\begin{matrix} Δ a_{ik} = - η_{a} \frac{\partial E}{\partial a_{ik}} = - η_{a} \frac{\partial E}{\partial y_{k}^{4}} \frac{\partial y_{k}^{4}}{\partial ψ_{k}} \frac{\partial ψ_{k}}{\partial ϕ_{ik}} \frac{\partial ϕ_{ik}}{\partial a_{ik}} \\ = η_{a} δ_{k}^{4} y_{k}^{3} ω_{ik} \frac{- (x_{i} - b_{ik})}{{(a_{ik})}^{2}} \end{matrix}

(23)

The increment of translation parameters $b_{ik}$ is calculated by

Δ b_{ik} = - η_{b} \frac{\partial E}{\partial b_{ik}} = - η_{b} \frac{\partial E}{\partial y_{k}^{4}} \frac{\partial y_{k}^{4}}{\partial ψ_{k}} \frac{\partial ψ_{k}}{\partial ϕ_{ik}} \frac{\partial ϕ_{ik}}{\partial b_{ik}} = η_{b} δ_{k}^{4} y_{k}^{3} ω_{ik} \frac{- 1}{a_{ik}}

(24)

Layer 3: The error term to be propagated in this layer is calculated by

δ_{k}^{3} = - \frac{\partial E}{\partial y_{k}^{3}} = - \frac{\partial E}{\partial y_{k}^{4}} \frac{\partial y_{k}^{4}}{\partial y_{k}^{3}} = δ_{k}^{4} ψ_{k} ω_{k}^{4}

(25)

Layer 2: The error term in this layer is calculated by

δ_{j}^{2} = - \frac{\partial E}{\partial {net}_{j}^{2}} = - \frac{\partial E}{\partial y_{k}^{3}} \frac{\partial y_{k}^{3}}{\partial y_{j}^{2}} \frac{\partial y_{j}^{2}}{\partial {net}_{j}^{2}} = \sum_{k} δ_{k}^{3} y_{k}^{3}

(26)

Accordingly, the increment of the center parameters $m_{j}$ is calculated by

Δ m_{j} = - η_{m} \frac{\partial E}{\partial m_{j}} = - η_{m} \frac{\partial E}{\partial {net}_{j}^{2}} \frac{\partial {net}_{j}^{2}}{\partial m_{j}} = η_{m} δ_{j}^{2} \frac{2 (x_{j} - m_{j})}{{(σ_{j})}^{2}}

(27)

The increment of the standard deviation parameters $σ_{j}$ is calculated by

Δ σ_{j} = - η_{σ} \frac{\partial E}{\partial σ_{j}} = - η_{σ} \frac{\partial E}{\partial {net}_{j}^{2}} \frac{\partial {net}_{j}^{2}}{\partial σ_{j}} = η_{σ} δ_{j}^{2} \frac{2 {(x_{j} - m_{j})}^{2}}{{(σ_{j})}^{3}}

(28)

Learning rates adjustment with IGSA

For finding the optimal values for the parameters of FWNN, an IGSA algorithm is proposed to online adapt the learning rates $η = (η_{m}, η_{σ}, η_{b}, η_{a}, η_{ω 1}, η_{ω 2})$ .

The GSA is a novel heuristic optimization method proposed by Rashedi et al. (2009). In GSA, masses (agents) attract each other through gravitational attraction. The position of the mass corresponds to a solution of the problem and the gravitational forces and inertias are determined using a fitness function (Mirjalili et al., 2012; Li and Zhou, 2011). It is expected that masses are attracted by the heaviest masses which present the optimum solution in the search space by lapse of time. The algorithm achieves the sharing of optimized information through the interplay of universal gravitational interactions between agents. In the basic GSA algorithm, the velocity and position of agents are updated as follows

v (t + 1) = rand \times v (t) + a (t)

(29)

x (t + 1) = x (t) + v (t + 1)

(30)

where t is the number of iteration, v denotes the velocity of the agent, a denotes the acceleration of the agent and x denotes the position of the agent.

The global search ability of GSA is strong, but its local search capability is insufficient. At the later stage of iteration, due to the appearance of heavier masses, the GSA becomes dull and requires more time to reach the optimal solution. On the other hand, the GSA only considers the influence of the current position when updating the position of the agents without taking account of the agents’ memory. When the agents are near the optimum solution and move very slowly, gbest which represents the best previous position among all the agents is proposed to help them exploit the global best. Each agent can observe the best solution (gbest) and tends toward it. Thus, by introducing global memory and group communication, an IGSA is proposed.

The flow chart of the IGSA algorithm for learning rates adjustment is shown in Figure 3 and the procedure is described as follows

(1) Initialization: The IGSA algorithm randomly generates the initial population. The position of each agent (individual) that is a candidate solution for the learning rates $η = (η_{m}, η_{σ}, η_{b}, η_{a}, η_{ω 1}, η_{ω 2})$ is defined as follows

X_{i} = (x_{i}^{1}, x_{i}^{2}, \dots, x_{i}^{d}, \dots, x_{i}^{D}), i = 1, 2, \dots, N

(31)

Figure 3.

The flow chart of the IGSA algorithm with learning rates adjustment.

where N is the population size, and D is the dimension of the problem.

(2) Fitness function: For each agent, the fitness value should be evaluated. In fact, the training process of the proposed FWNN using the IGSA algorithm is a minimization process of the error between the desired output and the actual one. The root mean square error (RMSE) is chosen as the fitness function, which is given by

Fitness = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(32)

where $y_{i}$ and ${\hat{y}}_{i}$ denote the measure value and predicted result, respectively, and n is the sample number. Thus, the best solution (gbest) for the population is determined by comparing the fitness value.

(3) Agents’ masses: Agents’ masses are defined using fitness evaluation. The masses of all agents are updated using the following equations

M_{ai} = M_{pi} = M_{ii} = M_{i}, i = 1, 2, \dots, N

(33)

m_{i} (t) = \frac{fi t_{i} (t) - worst (t)}{best (t) - worst (t)}

(34)

M_{i} (t) = \frac{m_{i} (t)}{\sum_{j = 1}^{N} m_{j} (t)}

(35)

where $fi t_{i} (t)$ represents the fitness value of the agent i at time t, and $best (t)$ and $worst (t)$ are defined as follows (for a minimization problem)

best (t) = min_{j \in {1, \dots, N}} fi t_{i} (t)

(36)

worst (t) = max_{j \in {1, \dots, N}} fi t_{i} (t)

(37)

(4) Total force: The total force that acts on agent i is calculated by

F_{i}^{d} (t) = \sum_{j \in Kbest, j \neq i}^{N} ran d_{j} F_{ij}^{d} (t)

(38)

where $ran d_{j}$ is a random number in the interval [0,1], and Kbest is the set of first K agents with the best fitness values and the biggest masses. At a specific time t, the force on agent i from agent j is defined as follows

F_{ij}^{d} (t) = G (t) \frac{M_{pi} (t) M_{ai} (t)}{R_{ij} (t) + ε} (x_{j}^{d} (t) - x_{i}^{d} (t))

(39)

The gravitational constant $G (t)$ and the Euclidian distance between two agents i and j are calculated as follows

G (t) = G_{0} \exp (- α \cdot iter / ite r_{max})

(40)

R_{ij} (t) = ‖ X_{i} (t), X_{j} (t) ‖_{2}

(41)

where $ε$ is a small constant, $α$ is the descending coefficient, $G_{0}$ is the initial gravitational constant, iter is the current iteration, and $ite r_{max}$ is the maximum iteration.

(5) Acceleration: By the law of motion, the acceleration of the agent i with the inertial mass $M_{ii}$ is given by

a_{i}^{d} (t) = \frac{F_{i}^{d} (t)}{M_{ii} (t)}

(42)

(6) Update velocity and position: In the IGSA algorithm, the velocity and position of agents are updated as follows

v_{i}^{d} (t + 1) = ω v_{i}^{d} (t) + c_{1} r_{i 1} a_{i}^{d} (t) + c_{2} r_{i 2} (gbes t^{d} - x_{i}^{d} (t))

(43)

x_{i}^{d} (t + 1) = x_{i}^{d} (t) + v_{i}^{d} (t + 1)

(44)

where $ω = 1 - iter / ite r_{max}$ is the inertial factor, $c_{1}$ and $c_{2}$ are two acceleration coefficients, $r_{i 1}$ and $r_{i 2}$ are two random variables in the range [0,1]. Whenever the position of an agent goes beyond its lower or upper bound, the agent will take the value of its corresponding lower or upper bound.

Then, take the new candidate solution as the learning rates of the FWNN model to obtain the new prediction results. Then update the gbest according to the new fitness.

(7) Stopping rule: Repeat step 1 to 6 until the best fitness value is achieved or a preset count of the generation is reached. The solution gbest with the best fitness value is chosen as the best learning rates of the FWNN model. Therefore, by using the obtained online tuning learning rates from the IGSA algorithm, the final optimal MI prediction model is established.

Results and discussion

The proposed soft sensor based on FWNN with improved gravitational search algorithm (IGSA-FWNN) is applied to the prediction of quality index in the real industrial PP processes currently being operated for commercial purposes in China. The Hypol technology is used in the process, which is one of the most widespread commercial methods for producing polypropylene. Figure 4 illustrates a highly simplified schematic diagram of the process. The process consists of a chain of reactors in series: two continuous stirred tank reactors (CSTR) and two fluidized-bed reactors (FBR). The feed to the reactor is comprised of propylene, hydrogen and the Ziegler–Natta catalyst. In the first two reactors, the polymerization reaction takes place in a liquid phase, and in the third and fourth reactors, the reaction is completed in vapor phase to produce the powdered polymer products. The quality index, MI, which depends on the catalyst properties, reactant composition, reactor temperature and so on, can determine different brands of products and different grades of product quality. In this application, a commercial brand named F401 with MI values between 1.4 and 3.2 is considered. It should be noted that the current MI data are from the same grade of polypropylene production, not from the grade transition for the multistage productions in polypropylene manufacturing process. The former is a slowly varying dynamic process; the latter is dynamic process with intense disturbance.

Figure 4.

General scheme of PP.

To develop a prediction model for the MI, nine process variables (t, p, l, a, f1, f2, f3, f4, f5), which influence the process greatly, have been chosen, where t, p, l, a stand for the process temperature, pressure, level of liquid and percentage of hydrogen in vapor phase in the first CSTR reactor, respectively; f1-f3 are flow rates of three streams of propylene into the reactor, and f4 and f5 are flow rates of catalyst and aid catalyst respectively. The data used for the prediction model have been acquired from the discrete control system (DCS) historical log recorded in a real PP plant. It has been considered that the sampling time of MI is 2 hours. Data are filtered to discard abnormal situations and to improve the quality of the prediction results. The variables are normalized with respect to their standard deviation value and mean value. Finally, 50 data sets are used for training, 20 data sets are used for testing, and the remaining 15 are used for generalization.

The performance of the proposed model is evaluated with several statistical methods. These are the mean absolute error (MAE), the mean relative error (MRE), the root of mean square error (RMSE), Theil’s inequality coefficient (TIC) and standard deviation of absolute error (STD). All performance measures are defined as follows respectively

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(45)

MRE = \frac{1}{n} \sum_{i = 1}^{n} \frac{| y_{i} - {\hat{y}}_{i} |}{y_{i}} \times 100 %

(46)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(47)

TIC = \frac{\sqrt{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}}{\sqrt{\sum_{i = 1}^{n} y_{i}^{2}} + \sqrt{\sum_{i = 1}^{n} {\hat{y}}_{i}^{2}}}

(48)

STD = \sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n} {(e_{i} - \bar{e})}^{2}}

(49)

where $\bar{y} = \frac{1}{n} \sum_{i = 1}^{n} y_{i}$ , $e_{i} = y_{i} - {\hat{y}}_{i}$ , $\bar{e} = 1 / n \sum_{i = 1}^{n} e_{i}$ , $y_{i}$ and ${\hat{y}}_{i}$ denote the measure value and predicted result, respectively. The MAE, MRE and RMSE confirm the prediction accuracy of the prediction models. The smaller the value of these indicators is, the higher the accuracy of the prediction model is. The STD indicates the stability of the prediction model. The smaller the value is, the more stable the prediction model is. The TIC indicates a good level of agreement between the proposed model and the studied process.

In this case, the IGSA-FWNN is employed to construct the quality index prediction model. At the same time, the WNN model and the FWNN model with fixed learning rates have also been considered to compare with the IGSA-FWNN model proposed in this paper. The learning rates of the WNN model are designated as $η_{ω 1} = 0.01$ , $η_{b} = 0.015$ and $η_{a} = 0.015$ , while those of the FWNN model are designated as $η_{m} = 0.02$ , $η_{σ} = 0.01$ , $η_{b} = 0.015$ , $η_{a} = 0.015$ , $η_{ω 1} = 0.01$ and $η_{ω 2} = 0.01$ . The initial parameters of the IGSA-FWNN model are configured as recommended by the corresponding article (Jiang et al., 2014) and given as follows: the maximum iteration number $ite r_{max} = 100$ , the population size $N = 50$ , the initial gravitational constant $G_{0} = 100$ , the descending coefficient $α = 20$ , and the acceleration coefficients $c_{1}$ and $c_{2}$ are set to 2. The prediction results acquired among the different models are listed in the following section. The final learning rates of the IGSA-FWNN model adapted by the IGSA algorithm are given as follows: $η_{m} = 0.002$ , $η_{σ} = 0.008$ , $η_{b} = 0.012$ , $η_{a} = 0.012$ , $η_{ω 1} = 0.02$ and $η_{ω 2} = 0.008$ . Because the learning rates directly affect the forecasting ability of the models, the prediction results acquired among the different models can reflect the advantage of the learning rates.

The data listed in Table 1 show the prediction results of different models on the testing dataset and the IGSA-FWNN model performs better than other models. More clearly, the WNN model gives MAE of 0.0404, MRE of 1.56%, RMSE of 0.0722, TIC of 0.0144 and STD of 0.0716. The FWNN model, gives MAE of 0.0296, MRE of 1.15%, RMSE of 0.0446, TIC of 0.0088 and STD of 0.0455. As can be seen from the above data, the FWNN model performs better than the WNN model. Furthermore, the IGSA-FWNN model shows even better results. The MAE, MRE, RMSE, TIC and STD are 0.0179, 0.71%, 0.0233, 0.0046 and 0.0237 respectively. Compared with the WNN model, the error measurement of the proposed model has percentage decreases of 55.7%, 54.5%, 67.7%, 68.1% and 66.9%. When compared with the FWNN model with fixed learning rates, the proposed model exhibits a percentage decrease of 38.3% in MRE from 1.15% to 0.71%.

Table 1.

Performance of the prediction models on the testing dataset of brand F401.

Model	MAE	MRE(%)	RMSE	TIC	STD
WNN	0.0404	1.56	0.0722	0.0144	0.0716
FWNN	0.0296	1.15	0.0446	0.0088	0.0455
IGSA-FWNN	0.0179	0.71	0.0233	0.0046	0.0237

A visual comparison illustration between the analysis value and prediction value is shown in Figure 5 on how better the IGSA-FWNN model performs than the other models do on the testing dataset. The curve marked with crosses is the real MI value obtained from analysis in laboratory, while the results predicted by the IGSA-FWNN model are depicted by curves marked with circles. It is observed that the IGSA-FWNN model’s results better approximate the true MI value on the most data.

Figure 5.

Performance of the proposed models on the testing dataset of brand F401.

A good inferential model should have good generalization ability in addition to good fitting accuracy. Table 2 shows that the proposed model obtains the best generalization ability among the WNN model, the FWNN model and the IGSA-FWNN model. Compared with the WNN model, the proposed model exhibits a percentage decrease of 56.5% in MRE from 2.14% to 0.93%. The error reduction also happens in terms of MAE, RMSE, TIC and STD. Figure 6 shows how the proposed model works on the generalization dataset. Clearly, the result strongly supports the conclusion that the IGSA-FWNN model performs well not only on the testing dataset but also on the generalization dataset. In other words, the application of the proposed method on the testing and generalization datasets got from the practical industrial polymerization plant demonstrated its effectiveness and reliability.

Table 2.

Performance of the prediction models on the generalization dataset of brand F401.

Model	MAE	MRE(%)	RMSE	TIC	STD
WNN	0.0512	2.14	0.0709	0.0144	0.0667
FWNN	0.0354	1.43	0.0431	0.0087	0.0333
IGSA-FWNN	0.0228	0.93	0.0269	0.0055	0.0215

Figure 6.

Performance of the proposed models on the generalization dataset of brand F401.

Table 3 compares the proposed IGSA-FWNN model with other models presented in the open literatures (Ahmed et al., 2013; Cao et al., 1999; Kim and Yeo, 2010; Shi and Liu, 2006; Xu and Liu, 2014). Note that the research data used in Shi and Liu (2006) and Xu and Liu (2014) are the same as that in this paper, while Ahmed et al. (2013), Kim and Yeo (2010) and Cao et al. (1999) apply different data set. So the results of Ahmed et al. (2013), Kim and Yeo (2010) and Cao et al. (1999) are for reference only. With the same research data, our work improves the prediction precision from MRE 3.27% presented in Shi and Liu (2006) to 0.71%. These data prove that the IGSA-FWNN model could be a powerful tool for online quality index prediction in the PP process.

Table 3.

Comparison between the current work and the published literatures.

Literatures	Methods	MAE	MRE(%)	RMSE	TIC	STD
Cao et al. (1999)	Adaptive RBF	0.10	-	0.62	-	-
Shi and Liu (2006)	WLS-SVM	0.0754	3.27	-	0.0223	0.1055
Kim and Yeo (2010)	Mechanism Model	-	-	0.46	-	-
Ahmed et al. (2013)	New RPLS	-	-	0.1466	-	-
Xu and Liu (2014)	FF-D-FNN	0.0210	0.83	0.0289	0.0058	-
This paper	IGSA-FWNN	0.0179	0.71	0.0233	0.0046	0.0237

The above calculations are performed on a personal computer with an Intel(R) Core(TM) i5-4210M CPU at 2.60 GHz. The computation time of the WNN, FWNN, IGSA-FWNN models are 1.26 s, 2.73 s and 4.92 s, respectively, which increases with the complexity of models. It can be seen that owing to the adoption of the IGSA, the computation time (4.92 s) of the proposed IGSA-FWNN is a little longer than the FWNN (2.73 s). Since the sampling time of the industrial MI prediction is about 2 hours, the proposed method qualifies the online soft sensor for the MI prediction. The computational complex of the IGSA algorithm is O(N) (N is the size of the IGSA population).

To clarify the generality of the proposed IGSA-FWNN approach, another dataset of Brand F400 from a real PP plant is selected to train and test the proposed model, which is divided into 50 pairs for training and 30 pairs for testing. The results are shown in Table 4 and in Figure 7. Table 4 shows the prediction results of different models on the second dataset. In detail, the IGSA-FWNN model outperforms the other models again with a decrease of 44.4% in MRE from 2.16% to 1.20%, compared with the WNN model. The error reduction also happens in terms of MAE, RMSE, TIC and STD. A more clearly contrast for different models is demonstrated in Figure 7. As can been see, the proposed IGSA-FWNN model exhibits fewer estimation errors compared to other models. The experiment results show the proposed IGSA-FWNN model has better performance in the efficiency of predicting MI and generalization ability.

Table 4.

Performance of the prediction models on another testing dataset of brand F400.

Model	MAE	MRE(%)	RMSE	TIC	STD
WNN	0.0551	2.16	0.0679	0.0131	0.0437
FWNN	0.0410	1.62	0.0533	0.0103	0.0346
IGSA-FWNN	0.0307	1.20	0.0403	0.0078	0.0265

Figure 7.

Performance on another testing dataset of brand F400.

Conclusion

This paper presents a novel forecasting approach for quality index prediction of PP process, integrating FWNN and the improved gravitational search algorithm (IGSA-FWNN). First, the FWNN model that uses the concepts of fuzzy logic in combination with WNN is developed. Moreover, the structure learning algorithm of the FWNN model is derived using gradient descent algorithm. Then, an IGSA algorithm is proposed to online adapt the learning rates of FWNN. Furthermore, the proposed method is applied to predict the MI in a real industrial PP plant. The forecasting results of the proposed model are compared to those of the WNN and FWNN models. Comparative tests illustrate that the proposed soft sensor outperforms the other comparison models. The application of the proposed model on the testing and generalization data shows its good performance and excellent generalization ability. The soft sensor predicts the MI with an MRE of 0.71% on the testing dataset, which is much more accurate than the FWNN model with an MRE of 1.15%, while much better than the WNN model with an MRE of 1.56%. The proposed soft sensor is therefore supposed to have a promising potential for the practical use.

Footnotes

Declaration of conflicting interest

The authors declare that there is no conflict of interest.

Funding

This work is supported by Zhejiang Province Natural Science Foundation (LY16B040003, LY18D060002), National Natural Science Foundation of China (Grant No. 61603336, 61590921), Shanghai Aerospace Science and Technology Innovation Fund (E81502) and Aerospace Science and Technology Innovation Fund of China Aerospace Science and Technology Corporation (E81601), and their supports are thereby acknowledged.

References

Abiyev

Kaynak

(2008) Fuzzy wavelet neural networks for identification and control of dynamic plants—A novel structure and a comparative study. IEEE Transactions on Industrial Electronics 55(8): 3133–3140.

Abiyev

Kaynak

Kayacan

(2013) A type-2 fuzzy wavelet neural network for system identification and control. Journal of the Franklin Institute 350(7): 1658–1685.

Affonso

Sassi

Barreiros

(2015) Biological image classification using rough-fuzzy artificial neural network. Expert Systems with Applications 42(24): 9482–9488.

Ahmed

Kim

Yeo

(2013) Statistical data modeling based on partial least squares: Application to melt index predictions in high density polyethylene processes to achieve energy-saving operation. Korean Journal of Chemical Engineering 30(1): 11–19.

Amjady

(2006) Day-ahead price forecasting of electricity markets by a new fuzzy neural network. IEEE Transactions on Power Systems 21(2): 887–896.

Cao

Guizeng

Bowen

(1999) Prediction of polypropylene melt index based on robust and adaptive RBF networks. Control and Decision 14(4): 339–343.

Chen

Yang

Dong

(2006) Time-series prediction using a local linear wavelet neural network. Neurocomputing 69(4–6): 449–465.

Daubechies

(1992) Ten lectures on wavelets. Philadelphia, PA: SIAM.

Ding

Meng

Wang

(2015a) The model equivalence based parameter estimation methods for Box-Jenkins systems. Journal of the Franklin Institute 352(12): 5473–5485.

10.

Ding

Wang

Chen

Xiao

(2015b) Recursive least squares parameter estimation for a class of output nonlinear systems based on the model decomposition. Circuits Systems & Signal Processing 35(9): 1–16.

11.

Ding

Wang

Ding

(2015c) Recursive least squares parameter identification algorithms for systems with colored noise using the filtering technique and the auxilary model. Digital Signal Processing 37(1): 100–108.

12.

Graves

Wayne

Reynolds

et al . (2016) Hybrid computing using a neural network with dynamic external memory. Nature 538(7626): 471–476.

13.

Grossmann

Morlet

(1984) Decomposition of Hardy function into square integrable wavelets of constant shape. SIAM Journal on Mathematical Analysis: 15(4): 723–736.

14.

Guo

Zhou

Luo

(2015) Kinetic insight into electrochemically mediated ATRP gained through modeling. AIChE Journal 61(12): 4347–4357.

15.

Han

Chung

(2005) Melt index modeling with support vector machines, partial least squares, and artificial neural networks. Journal of Applied Polymer Science 95(4): 967–974.

16.

(2016) Adaptive neural network control of an uncertain robot with full-state constraints. IEEE Transactions on Cybernetics 46(3): 620–629.

17.

Jiang

Xiao

Liu

(2012) Prediction of the melt index based on the relevance vector machine with modified particle swarm optimization. Chemical Engineering & Technology 35(5): 819–826.

18.

Jiang

Shen

(2014) A novel hybrid particle swarm optimization and gravitational search algorithm for solving economic emission load dispatch problems with various practical constraints. Electrical Power and Energy Systems 55(2): 628–644.

19.

Kickert

WJM

Mamdani

(1978) Analysis of a fuzzy logic controller. Fuzzy Sets & Systems 1(1): 29–44.

20.

Kim

Lim

(2015) Fast sparsely synchronized brain rhythms in a scale-free neural network. Physical Review E 92(2): 022717. DOI: 10.1103/PhysRevE.92.022717

21.

Kim

Yeo

(2010) Development of polyethylene melt index inferential model. Korean Journal of Chemical Engineering 27(6): 1669–1674.

22.

Zhou

(2011) Parameters identification of hydraulic turbine governing system using improved gravitational search algorithm. Energy Conversion & Management 52(1): 374–381.

23.

Liu

Jiang

Xiao

(2012) Melt index prediction by adaptively aggregated RBF neural networks trained with novel ACO algorithm. Journal of Applied Polymer Science 125(2): 943–951.

24.

Liu

Wang

(2013) A one-layer projection neural network for nonsmooth optimization subject to linear equalities and bound constraints. IEEE Transactions on Neural Networks and Learning Systems 24(5): 812–824.

25.

Melo

Watada

(2016) Gaussian-PSO with fuzzy reasoning based on structural learning for training a Neural Network. Neurocomputing 172(1): 405–412.

26.

Meng

Chen

(2013) Event based agreement protocols for multi-agent networks. Automatica 49(7): 2125–2132.

27.

Mirjalili

Hashim

SZM

Sardroudi

(2012) Training feedforward neural networks using hybrid particle swarm optimization and gravitational search algorithm. Applied Mathematics and Computation 218(22): 11125–11137.

28.

Mohammed

Lim

(2015) An enhanced fuzzy min-max neural network for pattern classification. IEEE Transactions on Neural Networks & Learning Systems 26(3): 417–429.

29.

Nason

(2008) Wavelet Methods in Statistics with R. New York: Springer.

30.

Pratama

Anavatti

et al . (2016) An incremental meta-cognitive-based scaffolding fuzzy neural network. Neurocomputing 171(C): 89–105.

31.

Rashedi

Nezamabadi-Pour

Saryazdi

(2009) GSA: A gravitational search algorithm. Information Sciences 179(13): 2232–2248.

32.

Shi

Chen

Shi

(2014) An event-triggered approach to state estimation with multiple point- and set-valued measurements. Automatica 50(6): 1641–1648.

33.

Shi

Liu

(2006) Melt index prediction by weighted least squares support vector machines. Journal of Applied Polymer Science 101(1): 285–289.

34.

Shi

Liu

Sun

(2006) Melt index prediction by neural networks based on independent component analysis and multi-scale analysis. Neurocomputing 70(1): 280–287.

35.

Solgi

Ganjefar

(2018) Variable structure fuzzy wavelet neural network controller for complex nonlinear systems. Applied Soft Computing 64(3): 674–685.

36.

Sombra

Valdez

Melin

Castillo

(2013) A new gravitational search algorithm using fuzzy logic to parameter adaptation. In: IEEE Congress on Evolutionary Computation, Cancun, Mexico, 20–23 June 2013, pp. 1068–1074. IEEE Congress on Evolutionary Computation (CEC).

37.

Tang

Liu

Zou

et al . (2017) An Improved fuzzy neural network for traffic speed prediction considering periodic characteristic. IEEE Transactions on Intelligent Transportation Systems 18(9): 2340–2350.

38.

Liu

(2014) Melt index prediction by fuzzy functions with dynamic fuzzy neural networks. Neurocomputing 142(1): 291–298.

39.

Yilmaz

Oysal

(2010) Fuzzy wavelet neural network models for prediction and identification of dynamical systems. IEEE Transactions on Neural Networks 21(10): 1599–1609.

40.

Zadeh

(1973) Outline of a new approach to the analysis of complex systems and decision processes. IEEE Transactions on Systems Man & Cybernetics smc-3(1): 28–44.

41.

Zadeh

(1974) The concept of a linguistic variable and its application to approximate reasoning. Information Sciences 8(4): 301–357.

42.

Zhang

Jin

(2006) Inferential estimation of polymer melt index using sequentially trained bootstrap aggregated neural networks. Chemical Engineering & Technology 29(4): 442–448.

43.

Zhang

Liu

(2016) A real-time model based on optimized least squares support vector machine for industrial polypropylene melt index prediction. Journal of Chemometrics 30(6): 324–331.

44.

Zhang

Benveniste

(1992) Wavelet networks. IEEE Transactions on Neural Networks 3(6): 889–898.

45.

Zhao

Gao

et al . (2014) Identification of nonlinear dynamic system using a novel recurrent wavelet neural network based on the pipelined architecture. IEEE Transactions on Industrial Electronics 61(8): 4171–4182.

46.

Zhou

Luo

(2015) An old kinetic method for a new polymerization mechanism: Toward photochemically mediated ATRP. AIChE Journal 61(6): 1947–1958.