Feature extraction and fault detection scheme via improved locality preserving projection and SVDD

Abstract

Manifold learning is widely adopted for the fault detection of industrial processes. However, the quality of low-dimensional embedding coordinates can be adversely affected by ill-constructed graph Laplacian. An improved locality preserving projection (ILPP) scheme is proposed. ILPP is built on a geometrically inspired Laplacian, and the Riemannian metric is used to find the suitable bandwidth parameter. The proposed approach combines the advantages of ILPP in preserving manifold data structures and those of support vector data description (SVDD) in handling complex process data distributions. Case studies on helix data, hot strip mill, and Pensim benchmark processes demonstrate the utility and feasibility of the proposed approach. The average fault detection rate for proposed ILPP is 99%, which is higher than locality preserving projection (LPP; 87.8%), local tangent space alignment (LTSA; 74.9%), and principal component analysis (PCA; 90.6%).

Keywords

Dimensionality reduction manifold learning process monitoring metric learning fault detection

Introduction

The progress in industrial processes is resulted in an increase in demand for process safety and reliability. Process monitoring can ensure safe operation of the plant by earlier detection and management. With the advancement in software and hardware technologies, large amounts of data are collected for multivariate statistical process monitoring (MSPM) (Chong et al., 2020; Guo et al., 2019). MSPM methods aim to seek the intrinsic relationship among variables. Dimensionality reduction has become a fundamental premise for most data-driven approaches due to the intrinsic features of process data, which are non-linearity, high-dimension, and complicated connection among process variables (Yin and Yan, 2021). Dimensionality reduction aims to condense high-dimensional data into a relatively small set of variables, while retaining the original data structure (Lu et al., 2021).

There are two major research directions based on the data structure. The first direction considers a single structure of entire process data, that is, global or local. Principal component analysis (PCA), partial least square (PLS), and independent component analysis (ICA) capture the global structure of the data (Basha et al., 2020). PCA is the widely adopted technique in actual industrial processes due to its concise derivation and easy implementation. PCA only captures the global structure of the original dataset through preserving variance information. Nevertheless, the industrial process data are often considered as time series, containing local geometric structures (Bounoua et al., 2020). Therefore, significant information is lost, resulting in unsatisfactory monitoring results. This limitation has prompted the construction of more complicated and non-linear methods underneath the framework of manifold learning.

Manifold learning offers a different viewpoint based on the interpretation of data lying on a manifold, embedded in a much higher dimensional space (Chen et al., 2019). The main idea of manifold learning is to seek local neighborhood structure, such that, the geometric features and characteristics of the manifold data are retained in low-dimensional subspace (Orzechowski et al., 2020). These techniques are often referred to as spectral embedding methods (Ayesha et al., 2020). They include, among others, isometric feature mapping (Isomap) (Tenenbaum et al., 2000), local tangent space alignment (LTSA) (Dong et al., 2020), Laplacian eigenmaps (LEs) (Wang, 2012), local linear embedding (LLE) (Li et al., 2019), locality preserving projection (LPP) (He and Niyogi, 2004), and neighborhood preserving embedding (NPE) (He et al., 2005). Among all, LPP and NPE are in wide usage for data-driven process monitoring (He et al., 2018). Because they can generate a detailed map, which is linear and can easily be obtained like PCA. Several variations have been proposed to classical LPP and NPE (Guo et al., 2022; Yao et al., 2022), such as improved local entropy locality preserving projections (ILELPP; (Guo et al., 2019), which forms the local entropy LPP. The method removes the non-Gaussian characteristics using entropy LPP and showed improved process monitoring.

The second approach is inspired by the idea of combining PCA with LPP or NPE to use both global and local information. Inspired by this, Zhang et al. (2011) proposed global–local structure analysis (GLSA), combining PCA with LPP for fault detection. Yu (2012) proposed local and global PCA (LGPCA) by considering both the local and global information. Following this novel approach, several new methods have been formulated (He and Xu, 2016; Luo et al., 2016; Tong and Yan, 2014; Zhan et al., 2019) recently. The major difference among these methods is located in a way; they integrate the objective function of PCA and LPP. Similarly, Chen et al. (2019), Cui et al. (2021), Miao et al. (2015), and Tan et al. (2019) combine PCA with NPE. These new manifold learning methods showed significant improvement in fault detection performance.

Despite all the advances mentioned above, there are still some gaps in these methods (Orzechowski et al., 2020; Perraul-Joncas and Meila, 2017; Shah et al., 2022). The most important criticism is that manifold learning methods fail to reveal underlying geometric structure in many cases (Perraul-Joncas and Meila, 2017). Our motivation focuses on geometrical aspects of process data through manifold learning. Almost all the manifold learning methods are constructed based on graph Laplacian $L$ as their preliminary step. Empirically, graph Laplacian $L$ stores the intrinsic geometry of any manifold. The construction of $L$ requires bandwidth $ϵ$ parameter, which defines the size of a local neighborhood, However, there seems to be a lack of consistent structure for computing $ϵ$ . The most common approach is based on cross-validation in supervised problems. Unfortunately, in an unsupervised problem, there is no clear context of how to apply cross-validation (McQueen et al., 2017). Consequently, manifold learning methods and extensions defined based on $L$ suffer from this shortcoming. It is argued that an ill-constructed $L$ leads to the degraded process monitoring performance.

The significance of this method is that, First, it provides a framework to construct a geometrically inspired Laplacian, which is critical in all manifold learning settings. Second, support vector data description (SVDD) is adopted, as it can handle more complex process data distributions than traditional statistical tools. The diagram of the improved locality preserving projection (ILPP) method is shown in Figure 1.

Figure 1.

Diagram of the ILPP method.

The contributions and novelties are listed as follows:

To overcome the problem of geometric distortion in existing manifold learning literature, the paper put forward ILPP-based manifold learning method using Riemannian metric.

A novel qualitative metric for comparing the obtained embeddings with desired embeddings is discussed. In particular, when the Riemannian metric converges to the unit matrix, the geometric distortion is minimized.

The concept behind ILPP is general and has the potential to be extended to Laplacian-based manifold learning methods; however, this work is limited to the LPP.

In addition to fault detection of the hot strip mill process (HSMP) and the Pensim benchmark process, ILPP is applied for feature extraction of the Helix data. Experimental results are validated using the three different quality indices.

The remainder of the paper is structured as follows. The “Preliminaries” section provides preliminary information, which is followed by the core ideas of manifold and graph Laplacian. Details about our proposed framework are discussed in the “Algorithm formulation” section, which includes LLP, Riemannian metric estimation, computing bandwidth $ϵ$ parameter for graph Laplacian, and evaluation criteria is defined. The “Process monitoring model based on the ILPP” section establishes process monitoring procedures based on the proposed ILPP method. Case studies related to feature extraction of the helix data and fault detection of two different industrial processes are discussed in the “Experimental verification” section. Finally, the “Conclusion” section provides the conclusions.

Preliminaries

The central theme of manifold learning relies on the manifold assumption, which states that data exist on a low-dimensional manifold embedded in higher dimensional Euclidean space. The Riemannian metric embodies the geometry of the underlying manifold. Following that, some related concepts are defined, used in this research work.

Riemannian manifold

Formally, consider a Riemannian manifold $M$ having manifold dimension lying in higher dimensional ambient space. That is, $M \subset R^{D}$ . The Riemannian metric is a symmetric positive definite metric, which embodies the geometry of Riemannian manifold and defines an inner product on the tangent space $T_{p} M$ for the point $p$ ∈ $M$ . The inner product computes the geometrical quantities; for instance, angles, lengths, volumes, and so on. Thus, the Riemannian manifold $(M, g)$ has Riemannian metric $g$ attached to it. The Riemannian metric $g$ is typically not available for an unknown manifold. Considering this, Perraul-Joncas and Meila (2017) put forward an approach to compute Riemannian metric, provided graph Laplacian L. Further details on Riemannian geometry can be consulted from Lee (2018).

Graph Laplacian

Graph Laplacian $L$ has become increasingly popular in machine learning, because of their wide usage in semi-supervised learning, spectral clustering and dimensionality reduction problems. Graph Laplacian $L$ uses local neighborhood graphs to model low-dimensional data lying on a manifold $M$ .

Mathematically, consider the $n$ data points $X_{n} = {x_{i}, i = 1, 2, . . ., n}$ , where $M$ is Riemannian manifold lying in $D$ -dimensional space, that is, $M \subset R^{D}$ (Calder and Slepčev, 2019). Construct a weighted graph $G (V, E)$ that connects the nearby points with $W = [w_{ij}]_{ij} = 1 : n$ . Then, the weight $w_{ij}$ between $x_{i}$ and $x_{j}$ is the heat kernel as given in equation (1)

W_{ij} = \exp (\frac{| | x_{i} - x_{j} | |_{2}^{2}}{ϵ^{2}})

(1)

where $ϵ$ denotes the user provided bandwidth parameter. There are few attempt to derive the optimal rate for $ϵ$ . But the most relevant is (Singer, 2006) for the sample size $n$ with $d$ denoting the manifold intrinsic dimension

ϵ = \frac{C (M)}{n^{(\frac{1}{3 + d / 2})}}

(2)

The problem to compute $ϵ$ from equation (2), is that the $C (M)$ represents the constant, that depends on manifold $M$ geometry, which is rarely known in advance. So, method (Singer, 2006) to compute $ϵ$ is quite impractical. In contrast to that, Joncas et al. (2017) proposed a framework to approximate $ϵ$ using Riemannian metric $g$ . We extend this work further to develop improved LPP.

Following equation (1), the similarity $W$ matrix is obtained using a diagonal matrix, such that: $D_{ii} = \sum_{j = 1}^{n} W_{ij}$ . Then, graph Laplacian $L$ is formed as follows

L = D^{- 1} W - I

(3)

where $I$ represents $n \times n$ identity matrix.

Algorithm formulation

LPP

LPP is a linear dimensionality reduction method that uses data points to generate an adjacency map (He and Niyogi, 2004). Then, the graph Laplacian $L$ projects the higher dimensional data to a lower dimensional Euclidean space. This projection matrix typically preserves the local information of manifold data. LPP seeks a projection matrix $P$ , which projects a set of high-dimensional data $X = {x_{1}, x_{2}, . . ., x_{n}}$ ∈ $R^{n \times D}$ to a lower dimensional representation $Y = {y_{1}, y_{2}, . . ., y_{n}}$ ∈ $R^{n \times d}$ and $d << D$ (Lu et al., 2018). Then, the algorithm for the LPP can be formed as follows:

Construct an adjacency graph: Let $G (V, E)$ denote a graph with $n$ nodes. Then, put an edge between nodes $i$ and $j$ if $x_{i}$ and $x_{j}$ are close. Adjacency graph is formed from the k nearest neighbors, that is, the nodes $i$ and $j$ are connected together, if node $i$ is in the k nearest neighbors of node $j$ or node $j$ is in the k nearest neighbors of node $i$ .

Calculate weighted graph Laplacian: Construct a weighted graph Laplacian as described in the “Graph Laplacian” section. The most important parameter is bandwidth $ε$ . The procedure to compute $ϵ$ is discussed in the “Computing bandwidth for Laplacian” section.

Compute projection matrix: Projection matrix $P \in R^{D \times d}$ is constructed by solving the generalized eigenvalue problem.

XL X^{T} P = λ XD X^{T} P

(4)

The eigenvalue decomposition of equation (4) generates the eigenvector $P = p_{1}, p_{2}, . . ., p_{d}$ arranged according to their eigenvalues $λ_{1} < λ_{2}, . . ., < λ_{d}$ . Given the data matrix $X$ ∈ $R^{n \times D}$ , the low-dimensional embeddings $Y$ ∈ $R^{n \times d}$ $(d << D)$ are formed by equation (5)

Y = P^{T} X

(5)

Rest of the LPP method is summarized in Algorithm 3.1 as adapted from He and Niyogi (2004).

Computing the Riemannian metric

Riemannian manifold $(M, g)$ is a smooth manifold $M$ having a Riemannian metric $g$ defines at tangent space $T_{p} M$ of the manifold at point $p$ . The Laplace–Beltrami operator $Δ_{M}$ is essential in obtaining the Riemannian metric $g$ for a given manifold $M$ . As given in equation (6), there is correlation between the Laplace–Beltrami $Δ_{M}$ operator and the Riemannian metric $g$ (Lee, 2018)

Δ_{M} = \frac{1}{\sqrt{\det (g)}} \sum_{i, j = 1}^{n} \frac{\partial}{\partial x^{i}} g^{ij} \sqrt{\det (g)} \frac{\partial}{\partial x^{j}}

(6)

Following the work of Perraul-Joncas and Meila (2017), Riemannian metric $g$ can be computed by inverting equation (6) for a given Laplace–Beltrami operator $Δ_{M}$ . Then, the Riemannian metric is defined in equation (7)

\begin{matrix} H {(p)}^{ij} = \frac{1}{2} Δ_{M} (x^{i} - x^{i} (p)) (x^{j} - x^{j} (p {)) |}_{x = x (p)} \end{matrix}

(7)

In the differential geometry, there is a powerful proposition that states that the graph Laplacian $L$ converges to the Laplace–Beltrami operator $Δ_{M}$ if the data are sampled from a Riemannian manifold $M$ (He and Niyogi, 2004). In other words, the graph Laplacian $L$ replaces the Laplace–Beltrami operator $Δ_{M}$ in equation (7).

Laplacian $L$ ∈ $R^{n \times d}$ is constructed from the weighted neighborhood graph as given in equation (3) discussed under the “Graph Laplacian” section. In fact, constructing the graph Laplacian is central to several manifold learning methods.

Given the input data matrix $X$ ∈ $R^{n \times D}$ to the LPP method, the low-dimensional embeddings $Y$ ∈ $R^{n \times d}$ are obtained. Inspired from Perraul-Joncas and Meila (2017), equation (6) can also be written in the low-dimensional embeddings $Y$ and graph Laplacian $L$ (replaces with $Δ_{M}$ ) are demonstrated in equation (8) (see Perraul-Joncas and Meila, 2017 for more details)

H^{ij} = \frac{1}{2} [L (Y^{i} . Y^{j}) - Y^{i} . (L Y^{j}) - Y^{j} . (L Y^{i})]

(8)

From the perspective of matrix notation, the Jacobian $J = dfp$ , and the Riemannian Metric $G = g$ are obtained. $H$ represents the discrete variant of the push-forward Riemannian metric, which is expressed as $n \times d \times d$ and calculated for data point $n$ . Whereas $i$ and $j$ represent values ranging from 1 to $d$ . Rest of the Riemannian metric computation procedure is summarized in Algorithm 2 as adapted from Perraul-Joncas and Meila (2017). The algorithm takes graph Laplacian and low-dimensional embedding coordinates as input.

Algorithm 1. LPP $(X, d, L, D)$ .
Input: Data matrix $X$ , no of dimensions $d$ , Laplacian $L$ , degree matrix $D$ Compute $P_{1; d}$ using equation (4) $[P, λ] \leftarrow eig (XL X^{T} P = λ XD X^{T} P)$ $Y \leftarrow P_{1; d}^{T} X$ Return $Y$

Algorithm 2. Riemannian metric computation $(Y, L)$ (Perraul-Joncas and Meila, 2017).
Input: Laplacian $L$ , low-dimensional embeddings $Y$ from the LPP $H^{ij} \leftarrow \frac{1}{2} [L (Y^{i} . Y^{j}) - Y^{i} . (L Y^{j}) - Y^{j} . (L Y^{i})]$ Return $H$

To this end, the method to compute the Riemannian metric using graph Laplacian and low-dimensional embeddings has been discussed.

Computing bandwidth for Laplacian

Riemannian metric is a powerful tool because of the fact, that it embodies the manifold geometry. Whereas, Laplacian $L$ stores this geometry in form of a local neighborhood graph and the quality of $L$ is highly dependent on bandwidth $ϵ$ parameter. Riemannian metric converges to the unit matrix $I$ , if the geometry is preserved in low-dimensional space (McQueen et al., 2017). By this fact, the distortion measure $Q$ is defined to improve the bandwidth $ϵ$ parameter by computing the deviation of the Riemannian metric from unit matrix. Mathematically, this distortion measure $Q$ is expressed as $∥ H - I ∥$ . As this expression converges to 0, good value of $ε$ is achieved (Joncas et al., 2017)

Q = \frac{1}{N} \sum_{i = 1}^{N} ∥ H (x_{i}) - I_{d} ∥

(9)

The good value for bandwidth $ε$ parameter of the graph Laplacian $L$ can be searched by minimizing the distortion measure of equation (9) as given below.

ϵ = min_{ϵ} Q .

(10)

Algorithm 3, provides a framework to compute bandwidth $ϵ$ parameter for Laplacian $L$ , motivated from Joncas et al. (2017). Laplacian $L$ is used as a preliminary step for constructing LPP and consequently helps in generating geometrically inspired low-dimensional embedding coordinates $Y$ . The idea is to compute distortion $Q$ for a set of $ϵ$ values using equation (9) and then choose $ϵ$ for which the distortion $Q$ value is minimized following equation (10). Finally, the best chosen $ϵ$ value is used to generate $L$ for LPP, resulting in recovering the original data geometry. The diagram for the proposed ILPP method is also demonstrated in Figure 1.

Algorithm 3. Algorithm for the ILPP method $(X, ϵ, d)$ .
Input: Data matrix X $\in R^{n \times D}$ , no of dimensions $d$ , $ϵ = set {ϵ_{1}, ϵ_{2}, . . ., ϵ_{N}}$ Initialize $ϵ, Q$ for each $ϵ_{i}$ do Compute weight matrix $W$ using equation (1) $L \leftarrow D^{- 1} W - I$ $Y \leftarrow Locality preserving projection (X, d, L, D)$ $H \leftarrow Riemannain metric computation (Y, L)$ Calculate distortion $Q_{i}$ using equation (9) Save each $Q_{i}$ end for $ϵ \leftarrow min_{ε} Q$ Update equation (2) with $ϵ$ value. Recompute $Y$ with updated $ϵ$ . Return $Y$

Algorithm 3. Algorithm for the ILPP method

(X, ϵ, d)

Input: Data matrix X

\in R^{n \times D}

, no of dimensions

d

ϵ = set {ϵ_{1}, ϵ_{2}, . . ., ϵ_{N}}

Initialize

ϵ, Q

for each

ϵ_{i}

do
Compute weight matrix

W

using equation (1)

L \leftarrow D^{- 1} W - I

Y \leftarrow Locality preserving projection (X, d, L, D)

H \leftarrow Riemannain metric computation (Y, L)

Calculate distortion

Q_{i}

using equation (9)
Save each

Q_{i}

end for

ϵ \leftarrow min_{ε} Q

Update equation (2) with

ϵ

value.
Recompute

Y

with updated

ϵ

.
Return

Y

Quality indices

In this work, we consider three different evaluation metrics to justify the integrity of proposed ILPP approach.

Distortion: Distortion $Q$ is explained in the “Computing bandwidth for Laplacian” section. It considers the fact, that the Riemannian metric embodies the geometry of any Riemannian manifold. Then, $H$ is used to select suitable bandwidth $ϵ$ parameter. We experimented with $ϵ$ parameter for the range of 1000 values between 0 and 10, and computed corresponding distortion $Q$ against each $ϵ$ using equation (9). Then, value of $ϵ$ is chosen for which the distortion $Q$ is minimum. Lower the $Q$ , better is the quality of embedding space.

Trustworthiness and continuity: These rank-based metrics (Lee and Verleysen, 2009) are utilized to assess the effectiveness of the manifold learning methods. The distance from point $i$ in high-dimensional space to its $k$ closest neighbors is calculated using rank order, and the extent to which each rank varies in low-dimensional space is calculated. Consider $R_{X_{i, j}}$ indicates the distance between samples $i$ and $j$ for the higher dimensional space $X$ for $n$ samples. Similarly, consider $R_{Y_{i, j}}$ is the rank of the distance in lower dimensional space $Y$ between samples $i$ and $j$ . If the $k$ neighbors are near to sample $i$ in lower dimensional space, then this criterion is referred to trustworthy.

However, continuity $C$ is the inverse of trustworthiness, which measures the degree to which the actual data geometry is preserved. It involves exploring the data samples from the perspective of distance in the lower dimensional space $Y$ only. Trustworthiness and continuity are written mathematically as shown in equation (11)

\begin{matrix} T_{k} = 1 - \frac{2}{nk (2 n - 3 k - 1)} \sum_{i = 1}^{n} \sum_{X} (R_{X_{i, j}} - k) \\ C_{k} = 1 - \frac{2}{nk (2 n - 3 k - 1)} \sum_{i = 1}^{n} \sum_{Y} (R_{Y_{i, j}} - k) \end{matrix}

(11)

The $T$ and $C$ scores range between 0 and 1, and these scores are closer to 1 in ideal scenario of geometric preservation. The three evaluation criteria are calculated and demonstrated in the “Experimental verification” section for each dataset.

Process monitoring model based on the ILPP

The idea of the SVDD is based on the one-class classification (Yuan et al., 2020). It seeks to find the lowest volume hyper-sphere in a high-dimensional ambient space that encompasses the majority of the fault-free data. However, during the faulty event, the fault data reside outside the hyper-sphere (Huang and Yan, 2016).

In industrial process, relationship among process variables are quite complex. Some industrial variables can be linearly related, whereas others are non-linearly to each other and similarly, some variables are independent or not correlated with each other. In comparison to the classical statistical tools, SVDD can handle more complex data distributions and relationships among different process variables (Li et al., 2021).

Mathematically, consider $y_{i} \in R^{d} (i = 1, 2, \dots, n)$ , SVDD projects the training samples $y_{i}$ to high-dimensional feature space $f$ using a non-linear mapping function $ϕ : y \to f$ . In the feature space, SVDD is used to determine the smallest sphere of radius $R > 0$ , which encloses all the normal class as follows

\min (R^{2} + C \sum_{i = 1}^{n} Ψ_{i}), s . t . ∥ ϕ (y_{i}) - a ∥^{2} \leq R^{2} + Ψ_{i}

(12)

where $a$ represents the center of the hyper-sphere and $R$ represents the radius of hyper-sphere. $Ψ_{i}$ is relaxation factor and $C$ is the trade-off between volume of the hyper-sphere and fault samples. The dual form of the optimization problem (Tax and Duin, 2004), can be obtained as follows

\begin{matrix} mi n_{α_{i}} \sum_{i = 1}^{n} K (y_{i}, y_{j}) - \sum_{i = 1}^{n} \sum_{j = 1}^{n} K (y_{i}, y_{j}) \\ s . t . 0 \leq α_{i} \leq C, \sum_{i = 1}^{n} α_{i} = 1 \end{matrix}

(13)

where $α_{i}$ is a Lagrange multiplier and $K (y_{i}, y_{j}) = < ϕ (y_{i}), ϕ (y_{j}) >$ indicates a kernel function to compute the inner product in subspace. There are several different choices of kernel functions as listed below:

Linear: $y_{i}^{T} y_{j}$ ;

Polynomial: $(y_{i}^{T} y_{j} + c)^{b}$ ;

RBF: $\exp \frac{- ∥ y_{i} - y_{j} ∥^{2}}{δ^{2}}$ ;

Gaussian: $\exp \frac{- ∥ y_{i} - y_{j} ∥^{2}}{σ}$ ;

Sigmoid: $\tanh (k (y_{i} - y_{j}) + ν))$ .

where $c, k$ , and $ν$ are constants, $b$ is the degree of polynomial, δ and σ are the widths of radial basis function (RBF) and Gaussian kernels, respectively.

Following equation (13), the suitable solution of $α$ is described and the training data are described as support vectors (SVs). Then, the radius of hyper-sphere is calculated (Dong et al., 2020) as follows

R = \sqrt{1 - 2 \sum_{i = 1}^{n} α K (y_{s}, y_{i}) + \sum_{i = 1}^{n} \sum_{j = 1}^{n} α_{i} α_{j} K (y_{i}, y_{j})}

(14)

where $y_{s}$ denotes one of the SVs.

Consider $z$ as a test sample. Then, the distance between the testing sample $z$ and the center of hyper-sphere is as follows

dis (z) = \sqrt{1 - 2 \sum_{i = 1}^{n} α K (z, y_{i}) + \sum_{i = 1}^{n} \sum_{j = 1}^{n} α_{i} α_{j} K (y_{i}, y_{j})}

(15)

The process monitoring model is developed based on equation (15). The monitoring statistics $dis$ is the distance between the testing samples and the hyper-sphere’s center. Then, the fault detection logic is constructed as follows

\begin{matrix} Normal sample : dis (z) \leq R \\ Faulty sample : dis (z) > R \end{matrix}

(16)

In summary, low-dimensional embedding coordinates $Y$ are obtained from the ILPP, following the procedure mentioned in Algorithm 3. Then, SVDD is modeled over it for fault detection. The monitoring procedure is illustrated in the flowchart of Figure 2.

Figure 2.

Flow chart of the ILPP method.

Monitoring procedures

Monitoring procedure is described as follows:

Offline modeling:

Step 1: Normalize the training data $X_{train}$ to zero mean and unit variance.

Step 2: Choose the bandwidth $ϵ$ parameter based on Algorithm 3.

Step 3: Generate low-dimensional embedding coordinates $Y_{train}$ using Algorithm 1.

Step 4: Setup monitoring model based on SVDD and calculate radius $R$ of hyper-sphere using equation (14).

Step 5: Define monitoring statistics $dis$ and set $R$ limit using equation (16).

Online monitoring:

Step 1: Normalize testing data $X_{test}$ based on the mean and variance of training data.

Step 2: Compute testing sample embedding coordinates $Y_{test}$ based on projection matrix $P$ of the original model.

Step 3: Calculate $dis$ for the testing sample in feature sample using equation (15).

Step 4: Compare results against the $R$ limit based on equation (16). Fault is detected, if the limit is violated.

Experimental verification

The simulation results consider three case studies from two different industrial processes. First case is fault detection of real HSMP. Whereas, second and third case studies are related to fault detection of fouling faults in steam turbine system. Before proceeding further, it is important to define evaluation criteria for our method.

Case study 1—helix synthetic data

Consider toroidal helix dataset with manifold dimension in higher dimensional ambient space with $D = 3$ . Helix dataset having sample size $n = 2000$ and number of neighbors, $K = 10$ is projected into $d = 2$ dimensions using the four-dimensionality reduction methods. Prior to proceed further, the first step is to search for suitable bandwidth $ϵ$ parameter, using Algorithm 3. Lowest distortion $(Q = 0.22)$ is achieved against $ϵ$ value of 2.1. Therefore, bandwidth $ϵ = 2.1$ is chosen to construct the ILPP. Distortion plot can be seen in Figure 3. Second, score plots based on trustworthiness $T$ and continuity $C$ are shown in Figure 4. High quality of manifold learning method is justified, as score values are closer to 1 for both T and C at k neighbors equal to 10. However, for other methods, their T and C are comparatively poor.

Figure 3.

Distortion versus bandwidth parameter for the helix data.

Figure 4.

Trustworthiness and continuity against four different methods for the helix data: (a) trustworthiness and (b) continuity.

The low-dimensional projections based on the ILPP, LPP, LTSA, and PCA are demonstrated in Figure 5. Data points are colored differently in each subplot based on their locations. The projected samples are not separated by LPP, LTSA, or PCA. ILLP, on the other hand, is able to successfully separate the samples. More importantly, the ILPP results closely resembled the desired projections.

Figure 5.

The 2D projections of 3D helix data using four different methods.

Case study 2—HSMP

The HSMP is steel manufacturing process (Zhang et al., 2019), which is fully automated. Thus, the real-time monitoring of this process is required to ensure highest efficiency and consistent production. The schematic for the HSMP is illustrated in Figure 6.

Figure 6.

The hot strip mill process schematic.

The finishing mill process (FMP) is an essential part in HSMP because it guarantees the continuous operation, consistency, and higher accuracy of the end product. A typical FMP constitutes seven stands. Each stand is made up of two working rolls in center and both sides of each stand include the two supporting rolls. Furthermore, hydraulic system is adopted in each stand to generate rolling and bending forces to achieve the optimum strip thickness. FMP is made up of 20 variables. Table 1 lists the process variables.

Table 1.

Process variables in the finishing mill process.

Variable #	Description	Unit
1–7	Average roll gap	mm
8–14	Rolling force	MN
15–20	Bending force	MN

This case study utilizes FMP to illustrate the efficacy of the ILPP approach proposed here. The finished strip thickness is the most critical consideration in determining the product quality. For modeling and monitoring purposes, two different sets of data with 3500 samples each belonging to normal and faulty operation are collected with a sample rate of 10 ms. The gap control loop fault in the $F_{4}$ stand is investigated and added into the testing data for 12 seconds ranging from 10 to 22 seconds.

The distribution of first 15 process variables in FMP dataset is shown in Figure 7. The results verified that the variables follow non-Gaussian distribution. SVDD model is adopted considering the non-Gaussian distributions present in that dataset. The model is constructed on low-dimensional embeddings space based on the “Process monitoring model based on the ILPP” section and Gaussian kernel K is adopted as well.

K = \exp \frac{- ∥ y_{i} - y_{j} ∥^{2}}{σ}, with width σ = 0.1

Figure 7.

The distribution of process variables in the finishing mill process.

LPP, LTSA, and PCA are chosen as comparison methods. It is important to mention that, all of these manifold learning methods are implemented in python environment with their standard parameters and Gaussian kernel is chosen to develop SVDD model, identical to the one adopted for our proposed ILPP.

Before proceeding, it is essential to determine the intrinsic dimension (Einbeck et al., 2020) for the FMP data. Contrary to the cumulative percentage variance (which is commonly used for PCA), dimensionality from angle and norm concentration (DANCO; Ceruti et al., 2014) is utilized in computing the suitable number of dimensions for lower dimensional space. In case of FMP, we choose number of dimensions to be “3” for improved LPP method. Similar to the “Case study 1—helix synthetic data” section, lowest distortion is achieved against $ϵ$ value of 4.23. Therefore, bandwidth $ϵ = 4.23$ is chosen to construct the ILPP model.

Fault of gap control loop

This fault has direct impact on finished thickness of strip. It jumps to higher value as shown in Figure 8. To have the faithful comparison, the results based on SVDD statistics are constructed. The fault detection rate (FDR) and false alarm rate (FAR) are defined as follows

FDR = \frac{No . of detected fault samples}{Total no . of fault samples} \times 100 %

FAR = \frac{No . of normal samples identified as fault}{Total no . of normal samples} \times 100 %

Figure 8.

The thickness of during healthy and faulty conditions.

Process monitoring statistics from the four-dimensionality reduction methods, which are LPP, LTSA, and PCA, are illustrated in Figure 9. All the faults are detected by the proposed ILPP approach with highest FDR closer to 99%. In addition FAR value is satisfactory and closer to 0.17%. However, other methods produce poor results comparatively. The results based on FDR and FAR values are listed in Table 2.

Figure 9.

Monitoring charts for the fault of gap control loop: (a) ILPP, (b) LPP, (c) LTSA, and (d) PCA.

Table 2.

Fault detection rate/false alarm rate of the finishing mill process (%).

Methods	Monitoring results, % age
	FDR	FAR
ILPP	99.2	0.17
LPP	85.4	4.60
LTSA	82.1	2.62
PCA	83.3	2.48

ILPP: improved locality preserving projection; LPP: locality preserving projection; LTSA: local tangent space alignment; PCA: principle component analysis.

Case study 3—Pensim benchmark process

Pensim is a widely adopted non-linear chemical process for penicillin manufacturing. The schematic of Pensim benchmark process is demonstrated in Figure 10 and it has often been applied to fault detection (Peng et al. 2020). The process consists of two main operating regions: (a) the pre-culture and (b) batch feeding. The first region takes 50 hours and the second region consumes 350 hours in total. Therefore, the total simulation time for the entire penicillin fermentation process is 400 hours.

Figure 10.

The Pensim benchmark process schematic.

The Jarque–Bera (JB) (Thadewald and Büning, 2007) test function in Python is used to verify the non-Gaussian distribution of Pensim process variables. The JB test returns two values, which are JB statistic and corresponding p-value. If JB statistic = 0, it means that the variable obeys the Gaussian distribution, else the non-Gaussian distribution. Whereas, the p-value is the probability value of accepting the hypothesis. If the p-value is closer to 0.05, then the data reject the hypothesis of Gaussian distribution. The verification results are shown in Table 3. All the variables obey the non-Gaussian distribution except the substrate feed temperature variable. Since the penicillin fermentation process has non-Gaussian characteristics (Peng and RuiWei, 2021), the SVDD-based fault detection is adopted in this work.

Table 3.

JB test result of process variables in the Pensim benchmark process.

Variable #	Description	Unit	Variable type	JB statistic	Corresponding p value
1	Aeration rate	L/h		1	0.007
2	Agitator power	W		1	0.002
3	Substrate feed rate	L/h	Input	1	0.001
4	Substrate feed temperature	K		0	0.052
5	Substrate concentration	g/L		1	0.001
6	Dissolved oxygen	mmol/L		1	0.001
7	Biomass concentration	g/L		1	0.001
8	Penicillin concentration	g/L	Internal	1	0.001
9	Culture volume	L		1	0.001
10	CO₂	mmol/L		1	0.001
13	Generated heat	kJ		1	0.001
11	pH	–	Controlled	1	0.001
12	Temperature	K		1	0.001
14	Base water flow rate	mL/h		1	0.001
15	Acid water flow rate	mL/h		1	0.001
16	Cold water flow rate	L/h	Manipulated	1	0.001
17	Hot water flow rate	L/h		1	0.001

The process simulation is performed using the software Pensim V2.0. It contains 17 variables, as listed in Table 3. Two datasets are collected based on normal and fault conditions, each set consist of 400 hours with a sampling time of 0.1 hour. Two fault conditions are considered in the simulation example. A ramp fault with an amplitude of 0.1 is added in variables 1 and 2 for a total duration of 150 hours from 150 to 300 hours. Faults description is given in Table 4.

Table 4.

Faults in the Pensim benchmark process.

Fault #	Description	Fault type	Amplitude	Duration (hour)
1	Reduction in aeration rate	Ramp	0.1	150–300
2	Reduction in agitator power	Ramp	0.1

The ILPP is adopted to develop a fault detection model. Then, SVDD model is constructed following the “Process monitoring model based on the ILPP” section and Gaussian kernel is adopted. The fault detection results based on proposed ILPP are compared with LPP, LTSA, and PCA.

Fault 1 is ramp disturbance that causes a reduction in aeration rate. Figure 11 demonstrates the monitoring chart of all the methods. The proposed ILPP successful identifies all the faults with highest FDR closer to 99.9%. In case of LPP, the fault was initially detected at 170th hour. Therefore, its FDR lies at 85.6%. Monitoring results based on LTSA illustrate the poor FDR in comparison to the rest of dimensionality reduction methods. Finally, in the case of PCA, FDR of 89.5% is comparable to that of the proposed ILPP method. However, FAR is highest among other methods. The monitoring results based on four-dimensionality reduction methods are listed in Table 5.

Figure 11.

Monitoring charts for the Pensim benchmark process—fault 1: (a) ILPP, (b) LPP, (c) LTSA, and (d) PCA.

Table 5.

Fault detection rate/false alarm rate of the Pensim benchmark process (%).

Fault indices	Monitoring results, % age
	ILPP		LPP		LTSA		PCA
	FDR	FAR	FDR	FAR	FDR	FAR	FDR	FAR
Fault 1	99.9	0.00	85.6	0.80	77.2	1.31	89.5	3.23
Fault 2	100	0.00	90.0	0.00	72.7	0.00	91.7	2.54
Average	99.9	0.00	87.8	0.40	74.9	0.65	90.6	2.88

ILPP: improved locality preserving projection; LPP: locality preserving projection; LTSA: local tangent space alignment; PCA: principle component analysis; FDR: fault detection rate; FAR: fault alarm rate.

Similar to fault 1, fault 2 is a ramp disturbance that causes a reduction in agitator power. Monitoring chart for all the methods is illustrated in Figure 12. The proposed ILPP identifies all the faults with FDR at 100%. In case of LPP, the fault was initially detected at 160th hour. Therefore, its FDR lies at 90%. Monitoring results based on LTSA demonstrate the worst FDR at 72% compared to the rest of dimensionality reduction methods. Finally, in case of PCA, FDR of 91% is comparable to that of proposed ILPP method. However, the FAR produces by PCA method is highest in comparison to other methods. The monitoring results based on four different methods are also listed in Table 5.

Figure 12.

Monitoring charts for the Pensim benchmark process—fault 2: (a) ILPP, (b) LPP, (c) LTSA, and (d) PCA.

In summary, the ILPP demonstrates the improved fault detection. After ILPP, PCA produces better FDR and FAR scores. The average scores based on both the faults against all four methods are also listed in Table 5. The ILPP produces the best average compared to the rest of the methods.

The significant advantage to the proposed ILPP method based on Riemannian metric is that we can extend this approach to several other manifold learning methods that use graph Laplacian as their building block and compute its geometrically inspired bandwidth $ϵ$ parameter. Then, develop a process monitoring method using SVDD or other related statistical tools.

Conclusion

This paper proposed a fault detection approach that departs from existing manifold learning literature in process monitoring. It is motivated by geometrical aspects of the data, which are typically ignored or poorly understood in most cases. There are some notes to be pointed out. First, through this approach, the ILPP is developed using geometrically inspired graph Laplacian, and bandwidth parameter for Laplacian is computed through Riemannian metric. Second, the approach can be applied to a wide variety of manifold learning methods that rely on Laplacian as their preliminary step. Third, ILPP is combined with SVDD to handle complex process data distributions. The effectiveness of ILPP is demonstrated through the feature extraction of the helix data and fault detection of the HSMP and the Pensim benchmark process. The quantitative results demonstrate the method’s superiority compared to the other methods. A few limitations of the ILPP are higher computational complexity $O (n^{3})$ compared with PCA $O (d n^{2})$ and the ability to only recover the data’s local geometry. Future research will optimize the computational cost and global structure preservation of the method.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Muhammad Zohaib Hassan Shah

Zahoor Ahmed

References

Ayesha

Hanif

Talib

(2020) Overview and comparative study of dimensionality reduction techniques for high dimensional data. Information Fusion 59: 44–58.

Basha

Ziyan Sheriff

Kravaris

, et al. (2020) Multiclass data classification using fault detection-based techniques. Computers and Chemical Engineering 136: 106786.

Bounoua

Benkara

Kouadri

, et al. (2020) Online monitoring scheme using principal component analysis through kullback-leibler divergence analysis technique for fault detection. Transactions of the Institute of Measurement and Control 42(6): 1225–1238.

Calder

Slepčev

(2020) Properly-weighted graph Laplacian for semi-supervised learning. Applied Mathematics & Optimization 82: 1111–1159.

Ceruti

Bassis

Rozza

, et al. (2014) Danco: An intrinsic dimensionality estimator exploiting angle and norm concentration. Pattern Recognition 47(8): 2569–2581.

Chen

Tong

Lan

, et al. (2019) Dynamic process monitoring based on orthogonal dynamic inner neighborhood preserving embedding model. Chemometrics and Intelligent Laboratory Systems 193: 103812.

Chong

Huang

Mukherjee

, et al. (2020) Performance comparisons of distribution-free shewhart-type lepage and cucconi schemes in monitoring complex process distributions. Transactions of the Institute of Measurement and Control 42(14): 2787–2811.

Cui

Wang

Yang

(2022) Nonparametric manifold learning approach for improved process monitoring. The Canadian Journal of Chemical Engineering 100: 67–89.

Dong

Zhang

Peng

(2020) A novel industrial process monitoring method based on improved local tangent space alignment algorithm. Neurocomputing 405: 114–125.

10.

Einbeck

Kalantan

Kruger

(2020) Practical considerations on nonparametric methods for estimating intrinsic dimensions of nonlinear data structures. International Journal of Pattern Recognition and Artificial Intelligence 34(09): 2058010.

11.

Guo

Wang

(2019) Fault detection based on improved local entropy locality preserving projections in multimodal processes. Journal of Chemometrics 33(5): e3116.

12.

Guo

Liu

Tan

, et al. (2022) A multimode process monitoring strategy via improved variational inference gaussian mixture model based on locality preserving projections. Transactions of the Institute of Measurement and Control 44: 1732–1743.

13.

(2016) A novel process monitoring and fault detection approach based on statistics locality preserving projections. Journal of Process Control 37: 46–57.

14.

Wang

Fan

SKS

(2018) Nonlinear fault detection of batch processes based on functional kernel locality preserving projections. Chemometrics and Intelligent Laboratory Systems 183: 79–89.

15.

Niyogi

(2004) Locality preserving projections. In: Kearns

Solla

Cohn

(eds) Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, pp. 153–160.

16.

Cai

Yan

, et al. (2005) Neighborhood preserving embedding. In: Proceedings of the IEEE international conference on computer vision, Beijing, China, 17–21 October, Vol. II, pp. 1208–1213. New York: IEEE.

17.

Huang

Yan

(2016) Related and independent variable fault detection based on KPCA and SVDD. Journal of Process Control 39: 88–99.

18.

Joncas

Meila

McQueen

(2017) Improved graph Laplacian via geometric self-consistency. In: Guyon

Luxburg

Bengio

al.

(eds) Advances in Neural Information Processing Systems. Curran Associates, Inc. Available at: https://arxiv.org/abs/1406.0118

19.

Lee

Verleysen

(2009) Quality assessment of dimensionality reduction: Rank-based criteria. Neurocomputing 72(7): 1431–1443.

20.

Lee

(2018) Riemannian Manifolds: An Introduction to Curvature (2nd edn). New York: Springer.

21.

Zhang

(2019) A survey on Laplacian Eigenmaps based manifold learning methods. Neurocomputing 335: 336–351.

22.

Zhou

Shi

, et al. (2021) Dynamic non-Gaussian hybrid serial modeling for industrial process monitoring. Chemometrics and Intelligent Laboratory Systems 216: 104371.

23.

Zhang

Lan

, et al. (2021) Fault detection and isolation for discrete-time markovian jump systems with generally bounded transition probabilities: A zonotope-based method. Transactions of the Institute of Measurement and Control 43: 2948–2959.

24.

Jiang

Gopaluni

, et al. (2018) Locality preserving discriminative canonical variate analysis for fault diagnosis. Computers and Chemical Engineering 117: 309–319.

25.

Luo

Bao

Mao

, et al. (2016) Nonlocal and local structure preserving projection and its application to fault detection. Chemometrics and Intelligent Laboratory Systems 157: 177–188.

26.

McQueen

Meila

Perrault-Joncas

(2017) Nearly isometric embedding by relaxation. In: NIPS’16, pp. 2639–2647. Red Hook, NY: Curran Associates Inc. DOI:10.5555/3157382.3157393.

27.

Miao

Song

, et al. (2015) Nonlocal structure constrained neighborhood preserving embedding model and its application for fault detection. Chemometrics and Intelligent Laboratory Systems 142: 184–196.

28.

Orzechowski

Magiera

Moore

(2020) Benchmarking manifold learning methods on a large collection of datasets. In: Hu

Lourenço

Medvet

(eds) Genetic Programming. Cham: Springer International Publishing, pp. 135–150.

29.

Peng

RuiWei

(2021) Process monitoring of batch process based on overcomplete broad learning network. Engineering Applications of Artificial Intelligence 99: 104139.

30.

Peng

Chunhao

Qiankun

(2020) Fault diagnosis of microbial pharmaceutical fermentation process with non-Gaussian and nonlinear coexistence. Chemometrics and Intelligent Laboratory Systems 199: 103931.

31.

Perraul-Joncas

Meila

(2017) Metric learning and manifolds: Preserving the intrinsic geometry. Metric Learning and Manifolds. Available at: https://sites.stat.washington.edu/mmp/geometry/reading-group17/html/samKslidesRMetric.pdf

32.

Shah

MZH

Ahmed

(2022) Modified LPP based on Riemannian metric for feature extraction and fault detection. Measurement 193: 110923.

33.

Singer

(2006) From graph to manifold Laplacian: The convergence rate. Applied and Computational Harmonic Analysis 21(1): 128–134.

34.

Tan

Miao

, et al. (2019) Online process monitoring and fault-detection approach based on adaptive neighborhood preserving embedding. Measurement and Control 52(5–6): 387–398.

35.

Tax

Duin

(2004) Support vector data description. Machine Learning 54(1): 45–66.

36.

Tenenbaum

Silva

Langford

(2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500): 2319–2323.

37.

Thadewald

Büning

(2007) Jarque–Bera test and its competitors for testing normality—A power comparison. Journal of Applied Statistics 34(1): 87–105.

38.

Tong

Yan

(2014) Statistical process monitoring based on a multi-manifold projection algorithm. Chemometrics and Intelligent Laboratory Systems 130: 20–28.

39.

Wang

(2012) Laplacian Eigenmaps. Berlin; Heidelberg: Springer Berlin Heidelberg, pp. 235–247.

40.

Yao

Zhao

, et al. (2022) Batch process monitoring based on global enhanced multiple neighborhoods preserving embedding. Transactions of the Institute of Measurement and Control 44: 620–633. DOI:10.1177/01423312211044742

41.

Yin

Yan

(2021) Stacked sparse autoencoders that preserve the local and global feature structures for fault detection. Transactions of the Institute of Measurement and Control 43(16): 3555–3565.

42.

(2012) Local and global principal component analysis for process monitoring. Journal of Process Control 22(7): 1358–1373.

43.

Yuan

Mao

Wang

(2020) A pruned support vector data description-based outlier detection method: Applied to robust process monitoring. Transactions of the Institute of Measurement and Control 42(11): 2113–2126.

44.

Zhan

Yang

(2019) Improved process monitoring based on global–local manifold analysis and statistical local approach for industrial process. Journal of Process Control 75: 107–119.

45.

Zhang

Peng

Dong

(2019) A P-t-SNE and MMEMPM based quality-related process monitoring method for a variety of hot rolling processes. Control Engineering Practice 89: 1–11.

46.

Zhang

Song

, et al. (2011) Global-local structure analysis model and its application for fault detection and identification. Industrial and Engineering Chemistry Research 50(11): 6837–6848.