A Bayesian network dealing with measurements and residuals for system monitoring

Abstract

The purpose of this paper is to present an original method for system monitoring with Bayesian networks. Our proposal is to associate a data-driven method to another model-based under a common tool. The two methods are first modeled under a Bayesian network (conditional Gaussian network), and then combined to evaluate the system state. In the proposed framework the residuals and measures coexist under a probabilistic framework. This approach is tested on a simulation of a water heater process under some various circumstances and shows better results than the two methods used alone.

Keywords

CGN data-driven methods model-based methods classification incidence matrix Bayesian networks water heater system

Introduction

The goal of businesses and industries to optimize gain/loss and the growing demand for increased product quality have immensely contributed to the development and the imperative use of monitoring methods (fault detection and isolation (FDI); fault detection and diagnosis (FDD)) among other means of operating safety. These methods are used to describe and explain, at each instant, the system state. They try to detect early faulty situations and provide their diagnosis when they occur.

Many (FDD, FDI) methods have emerged in the recent years (Chiang et al., 2001; Ding, 2011, 2008; Isermann, 2006; Qin, 2006). Among them, we can distinguish model-based methods and data-driven methods. Model-based methods (Ding, 2008; Frank et al., 2000; Gertler, 1991; Isermann, 2006; Patton et al., 1995) use prior knowledge explaining the system dynamics. This knowledge corresponds to a specific set of physical equations (e.g. quantitative model) representing the dependencies that may exist between system variables. It contributes to residuals generation (e.g. the difference between the measures taken on the system and their estimation). Once residuals are generated, they are immediately evaluated in order to describe the system operating state. In contrast, data-driven methods (Qin, 2006; Venkatasubramanian et al., 2003; Yin et al., 2012) use only system measures (e.g. sensors) taken at different times (historical data).

However, each of these monitoring methods has its advantages and drawbacks (Yew and Rajagopalan, 2010). The current majority strongly depend on the system nature and domain, and the quality, quantity and type of information available. Thus, many researchers (Chiang et al., 2001; Ding et al., 2009; Venkatasubramanian et al., 2003) suggest that the creation of a single framework using these two kind of methods, would allow a better system state description. Indeed, it seems clear that the data-driven methods capability to manage a significant number of data associated with the model-based methods ability to describe accurately the system dynamic behavior and provide physical understanding, could improve the system monitoring task. In the FDI–FDD literature, the major contributions generally focus on the development or improvement of one of both methods and, as far as we know this research field still unexplored, even if we can find some work describing the association attempts of these two types of methods (Ghosh et al., 2011; Luo et al., 2010; Schubert et al., 2011).

An unified scheme that integrates model-based and data-driven methods is proposed in Schubert et al. (2011). The authors combine subspace model identification (SMI) and univariate/multivariate statistical control methods with inputs reconstruction method and banks of unknown input observer (UIO). Luo et al. (2010) proposed a hybrid approach using parity equations and a nonlinear observer to generate residuals. Once the residuals generated, statistical tests are used to detect and isolate, with the aid of SVM (support vector machine), the different antilock braking system (ABS) faults. In Ghosh et al. (2011), a similar approach to those combining multiple classifiers (mixture of experts) in pattern recognition is used. Many decision fusion strategies (utility-based and evidence-based decision fusion strategies) of numerous fault detection and identification methods have been studied. For example, to monitor a laboratory distillation column, they use four heterogeneous methods: a model-based method (an extended Kalman filter) and three data-driven methods (SOM (self-organized map), ANN (artificial neural network) and PCA (principal component analysis)), where the output of each method corresponds to an assignment to a fault class.

The proposals mentioned above, despite the fact that they are attractive for the combination of data-driven and model-based methods, do not consider or address a particular problem which is the lack of data (e.g. some data are missing or insufficiently represented) and/or an approximation of the system model (e.g. unavailability of an accurate model). Moreover, instead of combining methods from a different nature and use different FDI–FDD tools to improve the decision making, it could be interesting to use a single tool associating and integrating data-driven and model-based methods. An attractive approach can be the use of Bayesian networks (BNs). They offer a probabilistic/statistical framework able to reason under uncertain knowledge and integrate information from different sources, in a natural way. In this paper, we propose a probabilistic FDI–FDD framework based on conditional Gaussian networks (CGNs), a particular class of BNs, able to use simultaneously the different information available on the system.

The paper is structured as follows: in Section 2 we introduce BNs, followed by a short description of some data-driven and model-based methods and their links to the BNs in Section 3; Section 4 describes the proposed monitoring methodology; the results of the proposed method tested on a heater water process under different conditions are outlined in Section 5; finally, the conclusions are described in the last section.

Bayesian networks

Definition

A BN is a probabilistic directed acyclic graph (Jensen, 1996) and it can be defined formally as:

a directed acyclic graph G, G=(V, E), where V is the set of G nodes, and E is the set of G arcs;

a finite probabilistic space (Ω, $ℤ$ , p), with Ω a non-empty space, $ℤ$ a collection of Ω subspaces and p a probability measure (we use the same notation for both probability distributions and probability density functions, the meaning will be clear from the context) on $ℤ$ with p(Ω) = 1;

a set of random variables associated with the nodes of the graph G and defined on (Ω, $ℤ$ , p), such that

\begin{matrix} p (V_{1}, V_{2}, \dots, V_{r}) = Π_{i = 1}^{r} p (V_{i} | Pa (V_{i}) \end{matrix}

(1)

where Pa(V_i) is the set of parent nodes of V_i (child of each node in Pa(V_i)) in G; given new information (evidence) about one or more of these nodes, the network needs to be updated by means of calculations named inference.

Conditional Gaussian networks

CGNs are a particular form of Bayesian networks. Each node in the network represents a random variable that could be discrete or Gaussian (univariate/multivariate). Arcs may exist between nodes from same or different nature (discrete or Gaussian). However, to ensure availability of exact inference (see (Lauritzen, 2001)), discrete variables are not allowed to have continuous parents. So, each discrete node given its parents, follows a multinomial distribution, generally outlined under a conditional probability table (CPT).

Consider a discrete node Y with only two values y_j,j=1,2, and with two parents Pa(Y) = (X, Z) where each one for example, can takes k values. The CPT of Y is shown in Table 1, where $θ_{y_{j} | Pa (Y)_{i}}$ is the probability of y_j given the i^th values of its parents Pa(Y)_i.

Table 1.

The CPT of Y given its parent nodes.

		Y
X	Z	y ₁	y ₂
x ₁	z ₁	$θ_{y_{1} \| Pa (Y)_{i = 1},}$	$θ_{y_{2} \| Pa (Y)_{i = 1}}$
…	…	…	…
x ₁	z_k	$θ_{y_{1} \| Pa (Y)_{i = 1 \times k}}$	$θ_{y_{2} \| Pa (Y)_{i = 1 \times k}}$
…	…	…	…
x_k	z_k	$θ_{y_{1} \| Pa (Y)_{i = k \times k}}$	$θ_{y_{2} \| Pa (Y)_{i = k * k}}$

Concerning the Gaussian nodes, they are allowed, unlike discrete nodes, to have Gaussian nodes as parents. Thus, we can distinguish two types of nodes. First, the Gaussian linear node Y, a Gaussian node with only Gaussian parents Z₁,…,Z_c, its conditional distribution is written as $p (Y | Z_{1} = z_{1}, \dots, Z_{c} = z_{c}) = N (μ_{Y} + W_{1} z_{1} + \dots + W_{c} z_{c}; Σ_{Y})$ , where μ_Y is the parameter governing the mean of Y, and Σ_Y is its covariance matrix and W₁,…,W_c are the regression coefficients.

The second node is a Gaussian node with only discrete parents. Consider a Gaussian node Y with two discrete parents Pa(Y) = (X, Z), which both can take k values, X = x₁,…,x_k and Z = z₁,…,z_k, its conditional distribution is represented by the CPT shown in Table 2, where $μ_{Pa (Y)_{i}}$ is the mean of Y given the i^th value of its parents and Σ_Pa(Y)i is its corresponding covariance matrix.

Table 2.

The CPT of Y given its discrete parent nodes.

X	Z	Y
x ₁	z ₁	$Y ~ N (μ_{P a {(Y)}_{i = 1 \times 1}}, \sum_{P a {(Y)}_{i = 1 \times 1}})$
…	…	…
x ₁	z_k	$Y ~ N (μ_{P a {(Y)}_{i = 1 \times k}}, \sum_{P a {(Y)}_{i = 1 \times k}})$
…	…	…
x_k	z_k	$Y ~ N (μ_{P a {(Y)}_{i = k \times k}}, \sum_{P a {(Y)}_{i = k \times k}})$

Several CGNs could be used to solve classification problems. One of them is the naive Bayesian network (NBN; see Figure 1). It assumes that the variables, child nodes of the decision discrete root node, are conditionally independent (the Gaussian nodes in the graph (circle form) are not connected). Another network is the condensed semi-naive Bayesian network (CSNBN; see Figure 1). It provide a simple structure that takes into account correlation that may exist under a group of variables (e.g. a single joint variable X = [x₁,…,x_m]^T unlike m independent variables x₁, …, x_m).

Figure 1.

Bayesian network classifiers: (a) NBN and (b) CSNBN.

Fault detection and diagnosis

Fault detection and diagnosis are two phases that are usually associated. The first seeks to confirm at any moment whether the system is still in control (IC). Once a change is confirmed (the system is out of control (OC)), diagnosis procedures try to explain it, in order to make a decision about the system future (adjusting settings, maintenance, closure, etc.). The fault diagnosis phase can be defined in different ways (Chiang et al., 2001) depending on the type of the employed method and its description level.

Data-driven methods

FDD methods using system data (temporary or not) are called data-driven methods. Their effectiveness depends heavily on the quality and quantity of the data used. These last decades, several data-driven methods have been developed or enhanced. They correspond generally to rigorous statistical developments of the collected data. Almost all of them take into account correlation that may exist between the system variables and use multivariate statistical test to detect a change in the system. Among multivariate statistical tests, the most common in the literature are based on the T² statistic (see (2), where x is an observation of a multivariate variable X (set of the system variables or their transformation) with m dimension, assumed to follow a multivariate normal distribution with mean μ_X and variance Σ_X):

T^{2} = (x - μ_{X})^{T} Σ_{X}^{- 1} (x - μ_{X})

(2)

Once this statistic is calculated, the obtained scalar is analyzed with respect to a given false alarm rate. This is done by checking the obtained scalar membership over an interval (representing the system normal operation) bounded by an upper control limit (CL, very often it is chosen equal to $χ_{α, p}^{2}$ which is the quantile in the value α (significance level) of the chi-squared distribution with p degrees of freedom). If this limit is exceeded, the system is declared out of control.

The T² statistic can be modeled under a binary CSNBN classifier discriminating between the classes IC and OC. This BN (see Figure 2) consists of a multivariate Gaussian node X, representing the system variables, child of a discrete root node D representing the states of the system (IC and OC).

Figure 2.

The statistical T² under a CGN.

The nominal operating class IC is characterized by a mean μ_IC and a variance Σ_IC, where the other alternative class OC is assumed expressing more variability through a coefficient c_α% different than 1. This coefficient is the root of the following equation

1 - c_{α %} + \frac{m c_{α %}}{CL} \ln (c_{α %}) = 0

(3)

where m is the number of the system variables.

The demonstration of the computation of c_α% is given in Verron et al. (2010). In Table 3, we give some values of the coefficient c_α% given a significance level α, which tunes the control limit CL of the statistical T².

Table 3.

Values of c_α% for different α values.

α	0.1	0.5	0.02	0.01	0.005	0.0027
c _α%	11.93	42.57	218.58	754.54	2634.50	8092.56

The CPTs of the both nodes, D and X are given by Tables 4 and 5.

Table 4.

The node D CPT

D
IC	OC
p(IC)	p(OC)

Table 5.

The node X CPT.

D	X
IC	$X ~ N (μ_{IC}, Σ_{IC})$
OC	$X ~ N (μ_{OC}, c_{α %} \times Σ_{OC})$

For a given observation x of X, the posterior probability of each state is calculated and the state with the greater one is taken in order to reach the decision made by statistic T².

Model-based methods

In the presence of an analytical system representation, model-based methods can be used. These methods calculate the difference between the measures taken on the system and their estimation. This difference is called residuals r₁,…,r_n with n the residuals number. The residuals quality (sensitivity to faults) strongly depends on their generator. Classically, we can distinguish three types of generators (Chiang et al., 2001; Isermann, 2006): observers, parameter estimation and parity space.

The residuals, once generated, are immediately evaluated, very often independently of each others. The result of their evaluation is named symptoms u₁,…,u_n, where u_i = 0 if the residual i is in control, otherwise u_i = 1. To detect a fault, it is sufficient to have one residual not null. However, often residuals are not only sensitive to faults, but also to system noise, disturbances and modeling errors. Thus, residuals may differ from zero even during the nominal operation, that is why generally each residual r_i ∈ 1,…,n is considered to be statistically null (e.g. a residual r_i is assumed to follow a normal distribution, with a mean $μ_{r_{i}}$ and variance $σ_{r_{i}}^{2}$ during the nominal operation). So, in order to monitor each residual, univariate binary statistical tests (e.g. statistic T² (univariate case)) can be used. They discriminate between the null hypothesis H₀ (corresponding to the residual distribution when the system is in control) and H₁ the alternative hypothesis (faults).

These tests can be also used to isolate a fault once detected. In this paper, we are interested in the well-known structured residuals approach (Gertler and Singer, 1990), where the residuals are generated so that they are sensitive only to certain faults. In order to made a decision, symptoms are compared (e.g. by a logic test) to each column of the incidence matrix (see Table 6). This matrix, for a given system, reflects its residuals sensitivity (b_ij = 1) or robustness (b_ij = 0) to each of its fault F_j ∈ {F₁, F₂, …, F_k}.

Table 6.

Incidence matrix example.

	IC	F ₁	F ₂	…	F _k
u ₁	0	b _1.1	b _1.2	…	b _1.k
u ₂	0	b _2.1	b _2.2	…	b _2.k
⋮	⋮	⋮	⋮	⋮	⋮
u _n	0	b _n.1	b _n.2	…	b _n.k

The incidence matrix can be modeled under a directed acyclic graph with discrete nodes able to make a decision given the observed symptoms, where faults are considered as the causes of the residuals deviation. Better yet, using a CGN as that given in Figure 3, we are able to model conjointly the residuals evaluation (based on the statistic T²) and the decision making (diagnosis based on the incidence matrix) as proposed by Verron et al. (2009).

Figure 3.

An incidence matrix on a CGN.

Proposed method

In this section, under a same tool, a CGN, a data-driven method and another model-based are proposed and associated to make better decisions.

CGN for data-driven monitoring

As a data-driven method, we propose to use a CGN, more precisely a CSNBN classifier, to discriminate between faults and the state IC (fault-free state H₀). The proposed network (see Figure 4) is consisting of a discrete node S_m with k + 1 states and a node X joining the m system variables. For each state $S_{m_{j = 1, \dots, k + 1}} \in (IC, F_{1}, F_{2}, \dots, F_{k})$ , X follows a multivariate normal distribution with a mean $μ_{S_{m_{j}}}$ and a variance–covariance matrix $Σ_{S_{m_{j}}}$ . The states prior probabilities of the node S_m are assumed to be equally probable. This is justified by the fact that the fault data available are collected after implementing each method. So then, the prior probability cannot be defined exactly.

Figure 4.

CGN for data-driven monitoring.

Once the network updated, given a new observation, a decision is made and the state with the maximum posterior probability is taken. Following this rule, the proposed network reflects a discriminant analysis:

\begin{matrix} δ : x \in S_{m_{j}}^{*}, if j^{*} = \underset{j = 1, \dots, k + 1}{argmax} p (S_{m_{j}} | X = x) \\ = \underset{j = 1, \dots, k + 1}{argmax} \frac{p (S_{m_{j}}) p (X = x | S_{m_{j}})}{p (X = x)} \end{matrix}

(4)

where $p (S_{m_{j}} | X = x)$ is the posterior probability of the class $S_{m_{j}}$ given x, p(X) is the density function of X, $p (X | S_{m_{j}})$ is the likelihood and $p (S_{m_{j}})$ is the prior probability of $S_{m_{j}}$ .

Using the proposed data-driven network, we are able to decide at each instant as to which operating state the system belongs. The probabilities tables corresponding to each node are shown in Tables 7 and 8.

Table 7.

The node S_m CPT.

S _m
IC	F ₁	…	F_k
p(IC)	p(F₁)	…	p(F_k)

Table 8.

The node X CPT.

S _m	X
IC	$X ~ N (μ_{I C}, \sum_{I C})$
F ₁	$X ~ N (μ_{F_{1}}, \sum_{F_{1}})$
…	…
F_k	$X ~ N (μ_{F_{k}}, \sum_{F_{k}})$

CGN for model-based monitoring

In order to assist the CGN given above, another diagnosis strategy is considered. Instead using the available system data, it is based on the available system model, more exactly on the generated residuals and the defined incidence matrix. The proposed CGN as a model-based method is made of: a discrete node F_j (representing the fault j), with two states (presence (yes) and not presence (no)), and continuous nodes (residuals sensitive to F_j) each representing a univariate Gaussian variable. Each arc connecting these nodes corresponds to an link b_ij = 1 between faults and symptoms in the system incidence matrix. This network evaluates the outputs of a given suitable residuals generator, for a given significance level, and give at each instant the occurrence probability of each fault (state value yes of each node F_j). These probabilities are compared and the fault F_j with the maximum posterior probability is considered.

However, as the normal operating state IC is associated to a null characteristic vector [0,…,0]^T in the incidence matrix (see Table 6), the node IC is not linked to any other node in the CGN. Given that, the system is implicitly declared in control (fault-free case) if and only if no fault is detected. Note that, in this paper we only consider single faults, and multiple faults are not treated. As we shall extend this method, we need to express explicitly the probabilities of belonging to one of the k faults (when the system is out of control (OC)) and the probability that the system is in control (IC). To do so, we propose to introduce (see Figure 5) a new discrete parent node S_r with k + 1 states (IC, F₁, …, F_k), that connect the k faults nodes.

Figure 5.

CGN for model-based monitoring.

The CPTs of the node S_r and its child nodes F_j∈1,…,k are given in Tables 9 and 10. The CPT of each node F_j is set following these intuitive rules: (1) if S_r = F_j, then it is certain (p(F_j = Yes|S_r = F_j) = 1) that the observation belong to the fault F_j, (2) if S_r = IC, then it is certain (p(F_j = Yes|S_r = IC) = 0) that the observation differs from fault F_j; (3) if $S_{r} = F_{o_{(o \neq j)}}$ , then no knowledge is learned about the membership of the observation to the fault F_j, we fix $p (F_{j} = Yes | S_{r} = F_{o_{(o \neq j)}}) = p (F_{j} = No | S_{r} = F_{o_{(o \neq j)}}) = 0.5$ , this value can be tuned to advantage some faults.

Table 9.

CPT of the node S_r.

S _r
IC	F ₁	…	F_k
p(IC)	p(F₁)	…	p(F_k)

Table 10.

CPT of the nodes F_j.

F _j
S _r	Yes	No
IC	0	1
F ₁	0.5	0.5
⋮	⋮	⋮
F_j	1	0
⋮	⋮	⋮
F_k	0.5	0.5

Probabilistic framework for system monitoring

In order to make a decision about the system state and use the maximum amount of information available on the system, we suggest a probabilistic framework handling the system observed variables and generated residuals. To do so, a new discrete node S_r&m is added (see Figure 6). It represents a discrete variable that has the same values as the discrete nodes S_r and S_m.

Figure 6.

The proposed network.

The CPT of the node S_r&m is given in Table 11.

Table 11.

The node S_r&m CPT.

S _r & m
IC	F ₁	…	F_k
p(IC)	p(F₁)	…	p(F_k)

The posterior probability of each state j of the node S_r&m in Figure 6, given new virtual evidences (ve): $e_{F} = {e_{F_{1}}, \dots, e_{F_{n}}}$ and $e_{S_{m}}$ respectively the states posterior probabilities of the nodes F = {F₁,…,F_k } and S_r for given residuals and observations of the system variables, can be inferred as follows (for more details about virtual evidences in BNs, see Bilmes (2004) and Pearl (1988)):

\begin{array}{l} p (S_{r & m_{j}} = | ve) = \sum_{S_{r}} \sum_{F} p (S_{r & m_{j}}, S_{r}, S_{m}, F) \frac{p (S_{m}, F | ve)}{p (S_{m}, F)} \\ = \frac{p (S_{r & m_{j}}) \sum S_{m} p (S_{m} | S_{r & m_{j}}) e_{S_{m}} \sum S_{r} p (S_{r} | S_{r & m_{j}}) \sum_{F} p (F | S_{r}) e_{F}}{\sum_{S_{m}} \sum_{F} p (S_{m}, F) e_{S_{m}} e_{F}} \end{array}

(5)

Thus, a probabilities combination is made under the assumption that the model-based method and the data-driven method are conditionally independent to the node added. We can see that the proposed network structure correspond to a NBN with a root node S_r&m.

Once the networks are built, our probabilistic framework is ready to be used for the monitoring task. The inputs or the set of evidence of the proposed method correspond first to the calculated residuals and the observed system variables and second to the outputs of both methods. In this paper, these inputs are propagated following the junction tree principle (Cowell et al., 2007). Once this is done, the node S_r&m indicates, for each k + 1 state, the probability of its occurrence.

Among other criteria, we choose to use the maximum posterior probability. So, at each instant, the state with the higher posterior probability is taken as

δ : ve \in S_{r & m_{j}}^{*}, if j^{*} = \underset{j = 1, \dots, k + 1}{argmax} p (S_{r & m_{j}} | ve)

The proposed framework, thanks to its probabilistic aspect, can be an effective way to assist experts in making a decision about the monitored system operating state. For example, this can be done by ranking the different system sates according to their posterior probability of occurrence.

Application

System description

To illustrate the interest of our approach, we use a water heater process (Verron et al., 2009; Weber et al., 2008). It consists of a tank (see Figure 7) equipped with two resistors R₁ and R₂. The inputs are the water flow rate Q_i, the electric power for heating P and the water temperature T_i. The outputs are the water flow rate Q₀ and the temperature T regulated around an operating point.

Figure 7.

Heating water system.

The thermal system objective is to assure a constant water flow rate at a given temperature. Using the following hydraulic and thermal equations describing the system:

{\begin{matrix} S \frac{dH (t)}{dt} = Q_{i} (t) - Q_{0} (t) \\ \frac{dT (t)}{dt} = \frac{P (t)}{ρ CSH (t)} - \frac{(T (t) - T_{i}) \times Q_{i} (t)}{SH (t)} \end{matrix}

(6)

where ρC is a constant thermal variable, S represents the section and T_i is assumed constant and equal to 20°C, a discrete system state space representation around an operating condition (H_op = 0.6 m, T_op = 50°C) is determined as follows:

{\begin{matrix} x (k + 1) = A_{d} x (k) + B_{d} u (k) \\ y (k) = x (k) \end{matrix}

(7)

where the output vector y(k) is equal to [T(k) H(k)]^T and the input vector u(k) defines [Q_i(k) P(k)]^T. The sampling period is fixed to 360 seconds in order to respect the closed-loop time constants.

In this analysis, only sensor and components faults are considered: H water level sensor, T output temperature sensor, Q₀ output flow rate sensor. Due to the property of matrix A_d (which is diagonal), structured residuals can be generated directly with a conventional observer: each residual is sensitive to one fault. Then based on the state space representation, from model-based methods, a Luenberger observer is considered to generate the residuals vector $y (k) - \hat{y} (k)$ such that

{\begin{matrix} \hat{x} (k + 1) = A_{d} \hat{x} (k) + B_{d} u (k) + K (y (k) - \hat{y} (k)) \\ \hat{y} (k) = \hat{x} (k) \end{matrix}

(8)

Using this observer, for each instant k, a residuals vector $[r_{1} (k) r_{2} (k)] = [T (k) - \hat{T} (k) H (k) - \hat{H} (k)]^{T}$ is generated in order to detect a fault occurrence on H level sensor or T temperature sensor. Moreover, according to the physical equation between output flow rate Q₀ and liquid level H (determined by using the Torriceli rule: $Q_{0} (k) = η \sqrt{H (k)}$ ), another residual $r_{3} (k) = [Q_{0} (k) - {\hat{Q}}_{0} (k)]$ is calculated.

Based on the evaluation of r(k) = [r₁(k) r₂(k) r₃(k)]^T the associated fault incidence matrix (the link between symptoms (u₁, u₂, u₃) and faults (T is a fault in TH is fault in HQ₀ is fault in Q₀) is defined in Table 12.

Table 12.

Heating water system incidence matrix.

	T	H	Q ₀
u ₁	1	0	0
u ₂	0	1	0
u ₃	0	1	1

The framework construction

At first, we have to define the two methods for the water heater process. The structures of their corresponding networks are completely different. The one representing the data-driven method corresponds to a linear DA. Its structure is relatively simple (see Figure 8), it links a discrete node representing the different states of the system to a multivariate Gaussian node representing the system variables (m = 3).

Figure 8.

Data-driven monitoring.

Concerning the one representing the model-based method, it is based on the system incidence matrix (as shown in Figure 9). The different CPTs corresponding to the two graphs are illustrated in Tables 13 and 14.

Figure 9.

Model-based monitoring.

Table 13.

The node X CPT.

S _m	X
IC	$X ~ N (μ_{I C}, \sum_{I C})$
T	$X ~ N (μ_{T}, \sum_{T})$
H	$X ~ N (μ_{H}, \sum_{H})$
Q ₀	$X ~ N (μ_{Q_{0}}, \sum_{Q_{0}})$

Table 14.

(a) The nodes r₁, r₂ CPT; (b) the node r3 CPT.

(a)
T or H	r_i, i = 1,2
OC	$r_{i} ~ N (μ_{r_{i}}, c_{2 %} \times σ_{r_{i}}^{2})$
IC	$r_{i} ~ N (μ_{r_{i}}, σ_{r_{i}}^{2})$
(b)
H	Q ₀	r ₃
OC	OC	$r_{3} ~ N (μ_{r_{3}}, c_{1 %} \times σ_{r_{3}}^{2})$
IC	OC	$r_{3} ~ N (μ_{r_{3}}, c_{2 %} \times σ_{r_{3}}^{2})$
OC	IC	$r_{3} ~ N (μ_{r_{3}}, c_{2 %} \times σ_{r_{3}}^{2})$
IC	IC	$r_{3} ~ N (μ_{r_{3}}, σ_{r_{3}}^{2})$

Regarding the CGN presented in Figure 9, the conditional probabilities tables (see Table 14) corresponding to its continuous nodes are defined so that the residual linked to only one fault (see Table 14(b)), is assigned to a 2% as a risk of the first kind (α = 2%). For those connected to more than one fault node, we adopt α = 1%. This choice is made by assuming that a single fault may have a higher false alarm rate than multiple faults. But, since we will not treat multiple faults here, it does not matter if we choose the same α in both cases. The c_{α
%} values for different α are given in Table 3.

Once the structure of both networks is established, new discrete nodes S_r&m, S_r are added as shown in Figure 10. The added nodes represent discrete variables, each with four states representing the state IC and the three faults T, H and Q₀, as node S_m. These nodes contribute to the fusion of the non-deterministic outputs made by both networks (methods).

Figure 10.

Proposed framework.

The states prior probabilities of the node S_r&m are defined to be equally probable to allow an unbiased probabilities fusion. The conditional probability tables of each of these nodes are given in Tables 16 and 15.

Table 15.

The nodes S_m, S_r and S_r&m CPT.

S _r&m
IC	T	H	Q ₀
$\frac{1}{k + 1}$	$\frac{1}{k + 1}$	$\frac{1}{k + 1}$	$\frac{1}{k + 1}$

Table 16.

The CPT of the node S_r.

S_r & S_m
S _r&m	IC	T	H	Q_o
IC	1	0	0	0
T	0	1	0	0
H	0	0	1	0
Q ₀	0	0	0	1

Simulations and results

Hereinafter, the proposed method is tested under different assumptions. The value of combining the two methods is mainly to increase the decision reliability. Also, the proposed probabilistic method could be useful when one and/or the other method are not very efficient (abnormal operating state misdetection, faults misdiagnosis). So, we propose to compare our method with the two other networks used alone, taking into account an accurate model (M+) or a less accurate model (M−), and a complete data set of suitable size (D+) or an incomplete data set (lack of fault data, few data available, D−). The scenarios presented previously will be tested on four hypothesis described in Table 17. The system is simulated according to the scenarios described in Table 18.

Table 17.

Hypothesis matrix.

Hypothesis	Model	Data
H_I	M ⁺	D ⁺
H_II	M ⁺	D ⁻
H_III	M ⁻	D ⁺
H_IV	M ⁻	D ⁻

Table 18.

Simulated scenarios.

Period	1–30	31–60	61–90	91–120
Cases	In control	Fault T	Fault H	Fault Q₀

Every simulation was performed using Matlab/Simulink and BNT (Bayes Net Toolbox). The results obtained by testing our methods under four different assumptions are presented in Tables 19, 20, 21 and 22, where each of them represents a confusion matrix. These tables show how the discrimination of the different faults is done, where each column and row of these matrices represent the instances respectively in a predicted class and in an actual class. Moreover, in Figure 11, the comparison of the three methods (DD, data-driven; MB, model-based; BNF, Bayesian networks framework) results under each hypothesis is presented.

Table 19.

Confusion matrix on the first hypothesis.

(a)
DD decision
State	IC	T	H	Q ₀	Total
IC	30	0	0	0	30
T	1	29	0	0	30
H	1	0	29	0	30
Q ₀	0	0	0	30	30
Total	32	29	29	30	120
(b)
MB decision
State	IC	T	H	Q ₀	Total
IC	29	1	0	0	30
T	0	30	0	0	30
H	0	0	30	0	30
Q ₀	0	0	0	30	30
Total	29	31	30	30	120
(c)
BNF
State	IC	T	H	Q ₀	Total
IC	30	0	0	0	30
T	0	30	0	0	30
H	0	0	30	0	30
Q ₀	0	0	0	30	30
Total	30	30	30	30	120

Table 20.

Confusion matrix on the second hypothesis.

(a)
DD decision
State	IC	T	H	Q ₀
IC	30	0	0	0	30
T	1	29	0	0	30
H	9	21	0	0	30
Q ₀	0	0	0	30	30
Total	40	50	0	30	120
(b)
MB decision
State	IC	T	H	Q ₀	Total
IC	29	1	0	0	30
T	0	30	0	0	30
H	0	0	30	0	30
Q ₀	0	0	0	30	30
Total	29	31	30	30	120
(c)
BNF
State	IC	T	H	Q ₀	Total
IC	30	0	0	0	30
T	0	29	1	0	30
H	0	0	30	0	30
Q ₀	0	0	0	30	30
Total	30	29	31	30	120

Table 21.

Confusion matrix on the third hypothesis.

(a)
DD decision
State	IC	T	H	Q ₀	Total
IC	30	0	0	0	30
T	1	29		0	30
H	1	0	29	0	30
Q ₀	0	0	0	30	30
Total	32	29	29	30	120
(b)
MB decision
State	IC	T	H	Q ₀	Total
IC	24	5	0	1	30
T	0	30	0	0	30
H	0	0	30	0	30
Q ₀	0	6	1	23	30
Total	24	41	31	24	120
(c)
BNF
State	IC	T	H	Q ₀	Total
IC	30	0	0	0	30
T	0	30	0	0	30
H	0	0	30	0	30
Q ₀	0	0	0	30	30
Total	30	30	30	30	120

Table 22.

Confusion matrix on the fourth hypothesis.

(a)
DD decision
State	IC	T	H	Q ₀	Total
IC	29	0	0	1	30
T	1	29	0	0	30
H	30	0	0	0	30
Q ₀	0	0	0	30	30
Total	60	29	0	31	120
(b)
MB decision
State	IC	T	H	Q ₀	Total
IC	20	9	0	1	30
T	0	30	0	0	30
H	0	0	30	0	30
Q ₀	0	9	1	20	30
Total	20	48	31	21	120
(c)
BNF
State	IC	T	H	Q ₀	Total
IC	28	0	1	1	30
T	0	30	0	0	30
H	0	0	30	0	30
Q ₀	0	0	0	30	30
Total	28	30	31	31	120

Figure 11.

Results synthesis.

Hypothesis I

In hypothesis (H_I), the three methods are used under standard conditions, where an available data set of different faults is used to estimate the needed parameters, and an accurate model (assumed) is provided to generate the residuals. Note that we always consider (for the four hypotheses) that we have enough data when the system is in control. Under these conditions, we see that the three methods are accurate. When the data-driven method made two wrong decisions (error rate: 2/120) and the model-based method made just a one (error rate: 1/120), our method does not make any mistake. These results are deduced from Table 19.

Hypothesis II

Regarding hypothesis II (H_II), we consider that there is no data available about the fault H and there is only few data (10 samples) about the other faults (T, Q₀ just H and Q₀) to estimate the CPT of the node X. Here, where the fault H is assumed to be unknown, the node S_m will be construct with only three states (IC, T, Q₀) with k = 2. Thus, the data-driven method outputs (the posterior probabilities of the three states obtained by inference) that will be integrated in our Bayesian network, are adjusted. First, we add the fault state H with p_H the probability of its occurrence to the others states. Second, we normalize all of the probabilities by multiplying them by 1/1 + p_H, where p_H is equal to 1/k + 1. Comparatively to the data-driven method which has, under the same conditions, made more bad decisions (error rate 39/120) than the proposed method (error rate 1/120), as shown in Table 20. This can be explained by the use of the model-based method that does not depend on the collected faults data.

Hypothesis III

In hypothesis III (H_III), we have made the model less accurate in order to have a model-based method that is less efficient. In the heating water system case, seeking to cause some inaccuracy at the generated residuals, we assume that a physical parameter is incorrectly estimated: the section is set to about 0.35 m² instead of 1 m². Under this situation, we see that the proposed method gives good results (does not make any error) comparatively to the model-based method used alone (error rate 13/120). These results can be seen in Table 21.

Hypothesis IV

Finally, in hypothesis IV, we can see that the proposed method allows a better decision (its makes two errors) than the two other methods, where the two conditions made in hypotheses II and III are considered (see Table 22).

Synthesis

One may notice that for each hypothesis, the proposed method can usually approximate or equalize each method performance and even improve the decision making. In hypothesis I, the three methods are good, they give mainly the right answers. In terms of accuracy our method is better. Indeed, it provides more consistency on some states, given the probabilities of the two methods (DD, MB), which could well be useful for experts to make a decision. Concerning the hypothesis II, where we use the data-driven method with a lack of data, the proposed method is relatively good but not as accurate as the model-based method. However, the proposed method has better results than the data-driven method used under an inappropriate environment. The same thing holds, in the case of hypothesis III, where we have degraded the model. Finally, in hypothesis IV the proposed approach made better decisions than the two methods used alone. In this case, it is interesting to see that the two methods under our framework complement each other, as they use different information about the system, to give better decisions. Thus, we can notice that the probabilistic framework can take advantage of the two methods.

Conclusions

The interest of this paper is in presenting a new method for monitoring industrial systems. We have presented a probabilistic framework (i) using discrete and Gaussian variables and (ii) allowing us to model and combine two CGNs for system monitoring: one for data-driven monitoring and the other representing the incidence matrix and the residuals evaluation; two eminent phases in the model-based methods. This framework can enhance decision making during monitoring using data and residuals simultaneously. This original method was tested on a water heater process, where a decision improvement is made and this was in most cases (specific model, degraded model and more or less data) in a natural way. The evident outlook of the proposed approach is, given the proprieties of BNs, to model and integrate other methods or information (maintainability information, components reliability and so on.) about the system in order to further enhance the decision making.

Footnotes

Acknowledgements

The authors gratefully acknowledge the contribution of the reviewers comments.

Funding

Mohamed Amine Atoui is supported by a PhD purpose grant from ‘la Région Pays de la Loire’.

References

Bilmes

(2004) On virtual evidence and soft evidence in Bayesian networks. Technical Report UWEETR-2004-0016, Department of Electrical Engineering, University of Washington.

Chiang

Russel

Braatz

(2001) Fault Detection and Diagnosis in Industrial Systems. New York: Springer.

Cowell

Dawid

Lauritzen

Spiegelhalter

(2007) Probabilistic Networks and Expert Systems: Exact Computational Methods for Bayesian Networks. New York: Springer.

Ding

(2008) Model-Based Fault diagnosis Techniques. Berlin: Springer-Verlag.

Ding

(2011) A survey of the application of basic data-driven and model-based methods in process monitoring and fault diagnosis. In 18th IFAC World congress, Milano, Italy.

Ding

Zhang

Naik

Ding

Huang

(2009) Subspace method aided data-driven design of fault detection and isolation systems. Journal of Process Control 19: 1496–1510.

Frank

Ding

Marcu

(2000) Model-based fault diagnosis in technical processes. Transactions of the Institute of Measurement and Control 22: 57–101.

Gertler

(1991) Analytical redundancy methods in fault detection and isolation: survey and synthesis. In: IFAC symposium on fault detection supervision and safety for technical processes SAFEPROCESS, vol. 1, pp. 9–12.

Gertler

Singer

(1990) A new structural framework for parity equation-based failure detection and isolation. Automatica 26: 381–388.

10.

Ghosh

Srinivasan

(2011) Evaluation of decision fusion strategies for effective collaboration among heterogeneous fault diagnostic methods. Computers and Chemical Engineering 35: 342–355.

11.

Isermann

(2006) Fault Diagnosis Systems An Introduction from Fault Detection to Fault Tolerance. New York: Springer.

12.

Jensen

(1996) An Introduction to Bayesian Networks. London: Taylor and Francis.

13.

Lauritzen

(2001) Causal inference from graphical models. In Complex stochastic systems, pp. 63–107.

14.

Luo

Namburu

Pattipati

Qiao

Chigusa

(2010) Integrated model-based and data-driven diagnosis of automotive antilock braking systems. IEEE Transactions on Systems, Man, and Cybernetics, Part A 40: 321–336.

15.

Patton

Chen

Nielsen

(1995) Model-based methods for fault diagnosis: some guide-lines. Transactions of the Institute of Measurement and Control 17: 73–83.

16.

Pearl

(1988) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco, CA: Morgan Kaufmann Publishers Inc.

17.

Qin

(2006) An overview of subspace identification. Computers and Chemical Engineering 30: 1502–1513.

18.

Schubert

Kruger

Arellano-Garcia

de Sa Feital

Wozny

(2011) Unified model-based fault diagnosis for three industrial application studies. Control Engineering Practice 19: 479–490.

19.

Venkatasubramanian

Rengaswamy

Kavuri

Yin

(2003) A review of process fault detection and diagnosis: Part III: Process history based methods. Computers and Chemical Engineering 27: 327–346.

20.

Verron

Tiplica

Kobi

(2010) Fault diagnosis of industrial systems by conditional Gaussian network including a distance rejection criterion. Engineering Applications of Artificial Intelligence 23: 1229–1235.

21.

Verron

Weber

Theilliol

Tiplica

Kobi

Aubrun

(2009) Decision with Bayesian network in the concurrent faults event. In 7th IFAC symposium on fault detection, supervision and safety of technical processes (SafeProcess’09), Barcelona, Spain, pp. 306–311.

22.

Weber

Theilliol

Aubrun

(2008) Component reliability in fault diagnosis decision-making based on dynamic Bayesian networks. Proceedings of the Institution of Mechanical Engineers Part O Journal of Risk and Reliability222: 161–172.

23.

Yew

Rajagopalan

(2010) Multi-agent based collaborative fault detection and identification in chemical processes. Engineering Applications of Artificial Intelligence 23: 934–949.

24.

Yin

Ding

Haghani

Hao

Zhang

(2012) A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark tennessee eastman process. Journal of Process Control 22: 1567–1581.