A selective ensemble learning approach based on evolutionary algorithm

Abstract

This paper proposes an evolutionary-based selective ensemble learning framework for solving classification problem. In the proposed ensemble learning framework, extreme learning machine (ELM) is selected as base learner and evolutionary algorithms are employed to optimize the weights of base learners in the ensemble. Then, some base learners, that their weights are larger than the threshold, are selected for making decision. The proposed ensemble learning framework is evaluated on 20 benchmark data sets from KEEL repository through four different evolutionary algorithms. Results show that the proposed evolutionary-based ensemble learning framework outperforms the simple voting based ensemble method in terms of classification performance. In four evolutionary optimization algorithms, PSOGA-based and DE-based weight optimization algorithms can effectively improve the classification accuracy and generalization ability.

Keywords

Extreme learning machine evolutionary algorithm ensemble learning classification

1 Introduction

Extreme learning machine (ELM) proposed by Huang et al. [1] is an effective approach for single-hidden-layer feedforward neural networks (SLFNs). Different from the traditional gradient-based learning algorithms, ELM not only has fast learning speed but also achieves better generalization performance. In recent years, ELM has attracted tremendous attentions. Many applications of the ELM have been proposed in the literature, i.e., credit risk assessment [2], face recognition [3], and image analysis [4].

However, a single ELM suffers from some problems, such as stability and overfitting. Random assignment parameters of ELM may lead to these problems. Some optimization strategies have been widely used in parameter selection of ELM. Zhu et al. [5] presented an evolutionary ELM (E-ELM) to select the input weights using differential evolution (DE) algorithm. The E-ELM can achieve good generalization performance with much more compact networks. On the basis of E-ELM, Cao et al. [6] further optimized the number of hidden nodes using a self-adaptive DE algorithm, and proposed a self-adaptive evolutionary ELM for SLFNs. Feng et al. [7] proposed an evolutionary selection ELM (ES-ELM) for regression. ES-ELM uses the crossover mechanism derived from the genetic algorithm (GA) to select the optimal number of hidden nodes for ELM. However, these strategies are mainly used to parameter optimization of ELM.

Moreover, recent research shows combination of multiple learners, i.e., ensemble learning, is also an effective strategy to deal with the above problems. Generally an ensemble learning will train multiple base classifiers and combine the predictions of these classifiers. Many ensemble strategies based on ELM have been proposed [8 –11]. The EN-ELM proposed by Liu and Wang [8] is an effective ensemble method, which uses the cross validation strategy and ensemble learning approach to alleviate overfitting and enhance the predictive stability. All base classifiers in EN-ELM are considered equally important. Wang and Li [9] used a dynamic AdaBoost algorithm to ensemble the outputs of ELMs, while Zhai et al. [10] proposed a dynamic ensemble ELM based on sample entropy. Their aims is to deal with instability and over-fitting of a single ELM, especially on large data sets. The voting-based ELM ensemble learning method (V-ELM) [11] incorporated the voting method into ELMs and made the final decision based on majority voting method. V-ELM enhances the classification accuracy and has much smaller variations in different trials of simulations than that of ELM. These ensemble methods assume that all base learners in the ensemble contribute equally to the making decision [12]. However, this assumption does not work well in many real-world applications. It is necessary to distinguish the importance of each base learner in making decision. Some researchers have focused on the use of evolutionary algorithms, such as GA, DE, and particle swarm optimization (PSO), to optimize the weights of each base learner in the ensemble [12 –14].

Based on the above considerations, this paper presents an evolutionary-based selective ensemble learning framework based on ELM for solving classification problem. In the proposed ensemble learning framework, evolutionary algorithms are employed to optimize the weights of base learners in the ensemble. The proposed ensemble learning framework is evaluated on 20 benchmark data sets from KEEL repository through four different evolutionary algorithms. Results show that the proposed evolutionary-based ensemble learning framework outperforms the simple voting based ensemble method in terms of the classification performance.

The remainder of this paper is organized as follows. Section 2 briefly introduces ELM and several evolutionary algorithms. Then, an evolutionary-based selective ensemble learning framework is presented in Section 3. Two evolutionary-based weight optimization strategies, DE-based optimization algorithm and PSOGA-based optimization algorithm, also illustrated in this section. The performance evaluation of several different ensembles is performed on 20 benchmark data sets in Section 4. Section 5 concludes this paper.

2 Preliminaries

This section briefly reviews the basic ELM algorithm and several evolutionary algorithms as the background knowledge for our work.

2.1 Extreme learning machine

ELM [1] works on the single hidden layer feedforward neural networks, where the input weights and biases are randomly chosen and need not to be adjusted. The output weights are calculated analytically using Moore-Penrose generalized inverse.

For N training samples (x_i, t_i), where x_i = [x_i1, x_i2, …, x_in] ^T is an n-dimensional input vector and t_i = [t_i1, t_i2, …, t_im] ^T is an m-dimensional target vector which represents the expected output value of input vector x_i, the SLFNs with L hidden nodes and activation function g (x) can be mathematically modeled as $\sum_{j = 1}^{L} β_{j} g (a_{j}, b_{j}, x_{i}) = t_{i}, i = 1, 2, \dots, N$ (1) where a_j = [a_j1, a_j2, …, a_jn] ^T is the weight vector connecting the jth hidden node and the input nodes and b_j is the bias of the jth hidden node. β_j = [β_j1, β_j2, …, β_jm] ^T denotes the output weight vector connecting the jth hidden node and the output nodes.

Equation (1) can be rewritten as follows $H β = T,$ (2) where $\begin{matrix} H & = & {[\begin{matrix} g (a_{1}, b_{1}, x_{1}) & \dots & g (a_{L}, b_{L}, x_{1}) \\ ⋮ & \dots & ⋮ \\ g (a_{1}, b_{1}, x_{N}) & \dots & g (a_{L}, b_{L}, x_{N}) \end{matrix}]}_{N \times L}, \\ β & = & {[\begin{matrix} β_{1}^{T} \\ ⋮ \\ β_{L}^{T} \end{matrix}]}_{L \times m}, and T = {[\begin{matrix} t_{1}^{T} \\ ⋮ \\ t_{N}^{T} \end{matrix}]}_{N \times m} . \end{matrix}$

H is the hidden-layer output matrix of the neural network. β is the output weight matrix and T is the target output matrix.

Thus, the output weight matrix β of Equation (2) can be estimated by minimizing the approximation error as follows $\hat{β} = \underset{β}{arg min} {∥ H β - T ∥}^{2} = H^{+} T,$ (3) where H⁺ is the Moore-Penrose generalized inverse of H. The solution produced by ELM in Equation (3) not only achieves the minimum square training error but also the best generalization performance on novel patterns [15].

In ELM, different activation functions can be used in different hidden neurons [16]. If a function satisfies ELM universal approximation capability theorems [17], it can be one of activation functions in ELM.

2.2 Differential evolution algorithm

DE is an evolutionary optimization technique proposed by Price and Storn [18]. It has been widely applied for many real-world optimization problems [19]. Similar to other evolutionary algorithms, DE algorithm is also a population-based heuristic search algorithm. It uses three principal operations of mutation, crossover and selection to search a global optimum solution. These operations are carried out iteratively and their aim is to improve the fitness of the best vector in the population.

DE algorithm starts with a randomly generated initial population of M individuals X_i,G, i = 1, … , M, where the index i denotes the ith candidate solution of the population at generation G. Each individual is defined as a D-dimensional vector X_i,G = [x_i,G (1) , x_i,G (2) , … , x_i,G (j) , …, x_i,G (D)].

For each vector X_i,G, i = 1, … , M, a mutant vector V_i,G is created by adding it to the weighted difference of two randomly selected distinct vectors from the population, according to the most common DE version [20] $V_{i, G} = X_{r_{1}, G} + F \times (X_{r_{2}, G} - X_{r_{3}, G})$ (4) where r₁, r₂, r₃ ∈ {1, 2, …, M} are mutually distinct non-negative integers. The parameter F is a real and constant factor which controls the amplification of the differential variation (X_r₂,G - X_r₃,G).

After mutation operation, the crossover operation combines the mutation vector V_i,G and the target vector X_i,G to generate a trial vector U_i,G = [u_i,G (1) , u_i,G (2) , …, u_i,G (j) , … , u_i,G (D)]:

$\begin{matrix} u_{i, G} (j) \\ = {\begin{matrix} v_{i, G} (j), & if ({rand}_{j} [0, 1] \leq CR or j = j_{rand} \\ x_{i, G} (j), & otherwise \end{matrix} \end{matrix}$ (5) where j = 1, 2, …, D, CR ∈ [0, 1] is a control parameter which represents the probability that a component of the trial vector is chosen from a mutant vector. rand_j is a uniformly distributed random number in the range of [0,1]. j_rand is a randomly chosen integer in the range of [1, D], which ensures that the trial vector U_i,G gets at least one component from the mutant vector V_i,G.

During the selection process, to determine whether the trial vector U_i,G survives to the next generation, the fitness value of the trial vector U_i,G is compared with that of the target vector X_i,G. For an objective function f, the selection operation is carried out as $X_{i, G + 1} = {\begin{matrix} U_{i, G} & if f (U_{i, G}) < f (X_{i, G}) \\ X_{i, G} & else \end{matrix}$ (6)

If the objective function value of the trial vector is less than that of the target vector, the target vector will be replaced by the trial vector to survive to the next generation.

2.3 GA, PSO and PSO-GA

GA is the adaptive heuristic search algorithm based on biological evolution. A fitness function is used to evaluate individuals. A typical GA involves four major steps of fitness evaluation, selection, crossover and mutation of a new population [21].

PSO is a population-based global stochastic search method for solving optimization problems [22]. Compared to GA, PSO is easier to implement and has fewer control parameters to adjust [23]. PSO optimizes objective function by a population-based search. The population consists of particles, which are randomly initialized and fly through the multi-dimensional search space to find the best solution according to an optimization function. In the optimization process, the velocity and the position of each particle are updated iteratively.

Garg [24] combined the advantage of the GA and PSO, and presented a hybrid PSO-GA approach to solving optimization problems. Its basic idea is to embed the genetic operators in the standard PSO, and to combine the ability of social thinking in PSO with the local search capability of GA.

PSO-GA approach also starts from the initialization phase. The particles of the swarm and their corresponding velocities are generated randomly over the search space. During optimization, each particle updates its previous best performance (personal best, pbest) and the best previous performance of its neighbors (global best, gbest).

Let X_i,G and V_i,G respectively be the position and velocity of ith particle in the search space at kth iteration, then its velocity and position of this particle at (k + 1)th iteration are updated as follows: $\begin{matrix} V_{i, G + 1} & = & w \cdot V_{i, G} + c_{1} \cdot r_{1} \cdot ({pbest}_{i, G} - X_{i, G}) \\ + c_{2} \cdot r_{2} \cdot ({gbest}_{G} - X_{i, G}) \end{matrix}$ (7) $X_{i, G + 1} = X_{i, G} + V_{i, G + 1}$ (8) where r₁ and r₂ are random numbers in the range of [0,1], c₁ and c₂ are constants, and w represents the inertia factor.

After forming a new generation in PSO iterations, some particles of new population are selected and then GA is applied for each of them separately. The number of selected particles is determined as

$\begin{matrix} {GA}_{num} & = & {GA}_{numMax} - {(\frac{{PSO}_{G}}{{PSO}_{MaxIter}})}^{γ} \\ \times ({GA}_{numMax} - {GA}_{numMin}) \end{matrix}$ (9) where GA_numMax and GA_numMin represent maximum and minimum number of individuals that effected by GA respectively. PSO_G is current PSO iteration and PSO_MaxIter is maximum number of iteration in PSO. γ is the decreasing rate of number of individuals that effected by GA.

After selecting individuals from the population, the PSO-GA algorithm creates a new population by replacing points in the current population with better points via GA.

Then, PSO-GA algorithm updates population size and maximum iteration number for GA after evaluating the new population.

PSO-GA algorithm repeats the process and seeks the global optimum.

3 The evolutionary-based selective ensemble learning framework

Zhou et al. [25] suggested that many instead of all predictors in the ensemble could be used for decision making. There are two important phases in ensemble learning methods, constructing multiple base learners and combining them [26].

Generally two approaches are employed to construct an ensemble model, namely homogeneous model and heterogeneous model [27]. A homogeneous ensemble contains a group of identical base learners, while a heterogeneous ensemble can be generated by different learning algorithms to the same dataset. In this paper, we construct a homogeneous ensemble learning model using ELM as base learner.

Many approaches have been presented to combine base learners in an ensemble. Simple voting and weighted voting strategies are widely used [28]. In weighted voting strategy, it is a very key issue to select the appropriate weights of voting for base learners in the ensemble.

In this paper, we employ four evolutionary-based algorithms, such as GA-based, PSO-based, DE-based and PSOGA-based algorithms, to optimize the weights of voting for base learners. Before optimization, each base learner in the ensemble is randomly initialized a weight between 0 and 1. Then an evolutionary algorithm is used to optimize these weights. To utilize evolutionary-based optimization algorithm, we combine the weights of base learners w_k (k = 1, …, K, where K is the number of base classifiers) in the ensemble to form a vector W = [w₁, w₂, …, w_K], which is taken as an individual of a population in evolutionary algorithm and to be optimized.

In optimization process, we use the error rate as the objective function of the fitness in this paper. Therefore, the main goal is to minimize this objective function using the search capability of evolutionary algorithm.

After optimization, a population of candidate weight vectors is generated and we select an individual with the best fitness value as weights of base learners in the ensemble. We use a selective strategy to choose some ELM base learners, that their weights are larger than the threshold λ, to make decision through weighted majority voting strategy.

The proposed evolutionary-based selective ensemble learning framework for classification problem is illustrated in Algorithm 1.

Algorithm 1. The evolutionary-based selective ensemble learning framework.

Input:

Sequence of N samples D = {(x_i, t_i) |x_i ∈ Rⁿ, t_i ∈ R^m, i = 1, 2, …, N}; number of base learners K; number of ELM hidden nodes L; threshold λ; population size M.

Process:

1. for k = 1 to K

Divide the dataset into a training set $D_{train} = {x_{i}, t_{i}}_{i = 1}^{N^{tr}}$ and a validation set $D_{validation} = {x_{i}, t_{i}}_{i = 1}^{N^{va}}$ , where N^tr and N^va represent number of the training set and number of the test set, respectively. D_train is used to generate base learner and D_validation is used to evaluate the fitness of evolutionary algorithm.

Randomly assign input weight a_i and bias b_i, i = 1, 2, …, L.

Calculate the hidden layer output matrix H_k.

Calculate the output weight ${\hat{β}}_{k} = H_{k}^{+} T_{k}$ .

Construct base learner C_k.

end for

2. Generate an initial population of M individuals W_G = {W_1,G, W_2,G, …, W_M,G}, where W_i,G = [w_i,G (1) , w_i,G (2) , …, w_i,G (K)], (i = 1, …, M) represents the weights of K base learners.

3. Use evolutionary-based method to evolve the population, where the fitness of a weight vector is measured by the error rate on the validation set D_validation.

4. Obtain the evolved best weight vector W^*.

5. Choose base leaners that the weights are larger than the threshold λ to ensemble.

Output:

An ensemble learner: $C^{*} (x) = \underset{t}{arg max} \sum_{w_{i}^{*} \geq λ, C_{i} (x) = t} w_{i}^{*} .$

In decision making on the ensemble, the proposed algorithm sums up the total weight values of each category through majority voting. For a sample x, the class that receives the highest weight sum is considered as the predicted label.

In Step 3 of Algorithm 1, we employ evolutionary-based method to evolve the population and optimize the weights of each base learner in the ensemble. Algorithm 2 presents a DE-based optimization algorithm. We also give a PSOGA-based optimization algorithm [24] in Algorithm 3.

Algorithm 2. DE-based weight optimization algorithm.

Input:

A series of base learners C_k, k = 1, 2 … , K; number of iterations G_max.

Process:

1. Initialization: Set the generation iterator G = 0.

2. while G < G_max do

Mutation: Generate a mutant vector V_i,G corresponding to the ith target vector X_i,G using Equation (4);

Crossover: Generate a trial individual U_i,G for the ith target vector X_i,G using Equation (5);

Fitness evaluation and selection: Consider X_i,G and U_i,G as the weights of K base classifiers respectively, and output the corresponding weighted majority voting label;

Evaluate the fitness of the vector U_i,G and X_i,G using K base classifiers C_k on validation set D_validation, where the fitness is defined as the error rate;

Update the vector X_i,G+1 of the next generation (G + 1) using Equation (6);

Increase generation iterator G = G + 1.

end while

Output:

The best weight vector W^* = X_i,G.

Algorithm 3. PSOGA-based weight optimization algorithm.

Input:

A series of base learners C_k, k = 1, 2 … , K; number of iterations G_max.

Process:

1. Initialization: Set the generation iterator G = 0; initialize the position and velocity of each particle.

2. while G < G_max do

Update pbest and gbest position of the particle;

Update the velocity of each particle using Equation (7);

Update the position of each particle using Equation (8);

Evaluate the fitness of each individual on validation set D_validation, where the fitness is defined as the error rate;

Choose GA_num best particles, where GA_num is calculated by Equation (9); apply GA for each selected best particle separately, and replace some particles in the current PSO population with the best individuals through DE operation;

Update GA parameters: GA_num, population size of the particles in GA, and maximum number of iteration in GA;

Increase generation iterator G = G + 1.

end while

Output:

The best weight vector W^* = X_i,G.

4 Experimental results

4.1 Benchmark data sets

To validate the effectiveness of ensemble methods, we investigate the classification accuracies and computational efficiencies of five optimization approaches, i.e., simple voting based ensemble, GA-based ensemble, PSO-based ensemble, DE-based ensemble, and PSOGA-based ensemble on twenty benchmark data sets. The details of the twenty data sets are summarized in Table 1, which are from the KEEL dataset repository [29]. It is noted that N_sample is number of samples, N_feature is number of features and N_class is number of categories.

4.2 Experimental settings

Five ensemble methods are evaluated in the experiments. Simple voting based ensemble applies simple voting strategy to decide the final class label of test sample. GA-based, PSO-based, DE-based and PSOGA-based ensemble methods employ GA, PSO, DE and PSO-GA [24] algorithm to optimize the weights of base learners respectively, and employ weighted majority voting strategy to make decision.

For all ELM base learners in simple voting based ensemble method and evolutionary-based ensemble methods, the Sigmoid kernel function is selected as activation function. The number of ELM base learners K is set to be 30. The number of hidden nodes L of ELM base learner is needed to be tuned. We employ a grid search method to choose the number of hidden nodes which ranges from 10 to 200 with an interval of 10. Ten trials are repeated for a single ELM base learner and the number of hidden nodes is then determined by the validation error. And then the tuned number of hidden nodes is used in all ensemble methods.

For evolutionary-based ensemble methods in the experiments, ELM base learners are all constructed on the 70% training samples randomly selected and the rest 30% are used to validate the fitness of evolutionary algorithms. The population size M is set to be 20. The number of maximum iteration is 50. The dimension of the population is 30 corresponding with 30 base learners. The threshold λ is set to be 0.5. Namely, we only choose the base learners that the weights are more than 0.5 to ensemble and verify the test samples through weighted voting strategy.

In the experiments, 5-fold cross-validation is also used to evaluate the performances of different ensemble methods. For fair comparisons, all the ensemble methods are run on the same 5-folded data sets. The training and testing process are repeated 50 times. In addition, all experiments are carried out in Matlab 7 environment under a PC with Intel 3.2 GHz CPU and 4 GB RAM.

4.3 Results and discussion

The evaluation results among five ensemble methods are presented in Table 2. We record the best number of hidden nodes (# hidden nodes) of ELM base learner on each data set, training time, testing accuracy (acc) and testing standard deviation (dev). Each result, including training time, testing accuracy and testing standard deviation, is an average of 50 trials. We emphasize in boldface the highest value among the five values that are being compared for each data set.

From the comparison results of Table 2, we can easily find that PSOGA-based ensemble is more time consuming than simple voting based ensemble, GA-based ensemble, PSO-based ensemble and DE-based ensemble. However, it obtains the best performance on 8 out of 20 data sets in terms of accuracy, which outperforms the other four comparison methods. The training time of GA-based ensemble, PSO-based ensemble and DE-based ensemble is similar, which is slightly higher than that of simple voting based ensemble. DE-based ensemble obtains the best accuracies on 7 out of 20 data sets, while 3 and 2 for GA-based ensemble and PSO-based ensemble respectively. In terms of the comparison between PSOGA-based ensemble and DE-based ensemble, it can be observed from Table 2 that the classification performance of PSOGA-based ensemble is slightly superior to that of DE-based ensemble. However, DE-based ensemble is more efficient in terms of computational cost. It also should be noted that although the complexity of simple voting based ensemble is the lowest, its classification performance is the worst.

Statistical significance test is also carried out to check which method is statistically better than the methods in comparison in terms of accuracy. Table 3 shows the rankings of the five ensemble methods. The rankings are computed by the Friedman test, which is a non-parametric equivalent of the repeated-measures ANOVA and compares the average ranks of methods [30]. Obviously, it can be observed from Table 3 that the ranking of PSOGA-based ensemble method is the lowest, only 1.9000. The ranking of DE-based ensemble method is slightly larger than that of PSOGA-based ensemble method and lower than those of PSO-based and GA-based ensemble methods, while simple voting based ensemble method achieves the highest ranking of 4.350. It shows that compared with simple voting based ensemble method, four evolutionary-based ensemble methods can improve the performance of classification to some extent. Furthermore, Friedman test returns the p value of 6.7618×10^–6, so the hypothesis that all methods are uniform is rejected.

We further employ the Holm test to compare the best ranking method (PSOGA-based ensemble) with the remaining methods. In the Holm test, PSOGA-based ensemble is selected as the control method. The result of this test is shown in Table 4. It is noted that the Holm test applied step-down procedure to sequentially test the hypotheses ordered by their significance.

In Table 4, Holm test starts with the most significant p₄ value, namely, p₄= 9.5837×10^–7. Because p₄ is less than the associated Holm’ value (α/4 = 0.0125) in the same row, we reject the hypothesis and then compare p₃ with α/3. Similarly, the second and the third hypotheses are also rejected. The results show that PSOGA-based approach outperforms simple voting based, GA-based and PSO-based ensemble methods. But for the last hypothesis, p₁ is larger than the corresponding Holm’ value (α/1 = 0.05), so we accept this hypothesis. It indicates that there is no statistical significant difference between PSOGA-based and DE-based ensemble methods.

5 Conclusion

In this paper, we employ several evolutionary-based optimization method, such as PSOGA, PSO, DE and GA, to optimize the weights of ELM-based ensemble learning, and propose a general selective ensemble learning framework. Five ensemble methods are performed on 20 benchmark data sets from KEEL dataset repository. Compared with sampling voting ensemble, four evolutionary-based ensemble methods can improve the classification performance and generalization ability. Especially PSOGA-based and DE-based ensemble methods obtain more advantage in terms of the classification performance. However, PSOGA-based ensemble has no advantage in terms of the time complexity. In the future work, we will balance the complexity and accuracy, and give a better adaptive selective ensemble framework.

Footnotes

Acknowledgments

This work is partly supported by National Natural Science Foundation of China (No. 61373127), and the State Key Laboratory for Novel Software Technology (Nanjing University) of China (No. KFKT2015B16).

References

Huang

G.B.

, Zhu

Q.Y.

and Siew

C.K.

, Extreme learning machine: Theory and applications, Neurocomputing 70 (2006), 489–501.

Zhang

, Dai

and Ma

, Extreme learning machines’ ensemble selection with GRASP, Applied Intelligence 43 (2015), 439–459.

Jin

, Cao

, Wang

and Zhi

, Ensemble based extreme learning machine for cross-modality face matching, Multimedia Tools and Applications 75(19) (2016), 11831–11846.

Liu

and Wang

, Evolutionary extreme learning machine and its application to image analysis, Journal of Signal Processing Systems 73 (2013), 73–81.

Zhu

Q.Y.

, Qin

A.K.

, Suganthan

P.N.

and Huang

G.B.

, Evolutionary extreme learning machine, Pattern Recognition 38(10) (2005), 1759–1763.

Cao

, Lin

and Huang

G.B.

, Self-adaptive evolutionary extreme learning machine, Neural Processing Letters 36(3) (2012), 285–305.

Feng

, Qian

and Zhang

, Evolutionary selection extreme learning machine optimization for regression, Soft Computing 16(9) (2012), 1485–1491.

Liu

and Wang

, Ensemble based extreme learning machine, IEEE Signal Processing Letters 17(8) (2010), 754–757.

Wang

G.T.

and Li

, Dynamic Adaboost ensemble extreme learning machine, in: 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), vol. 3, 2010, Piscataway, NJ, IEEE, pp. 54–58.

10.

Zhai

J.H.

, Xu

H.Y.

and Wang

X.Z.

, Dynamic ensemble extreme learning machine based on sample entropy, Soft Computing 16 (2012), 1493–1502.

11.

Cao

, Lin

, Huang

G.B.

and Liu

, Voting based extreme learning machine, Information Sciences 185(1) (2012), 66–77.

12.

Liu

, Cao

, Lin

, Pek

P.P.

, Koh

Z.X.

and Ong

M.E.H.

, Evolutionary voting-based extreme learning machines, Mathematical Problems in Engineering 2014, p.7. Article ID 808292.

13.

Zhang

, Liu

, Cai

and Zhang

, Ensemble weighted extreme learning machine for imbalanced data classification based on differential evolution, Neural Computing and Applications (2016), in press, doi: 10.1007/s00521-016-2342-4

14.

Yang

and Han

, An improved ensemble of extreme learning machine based on attractive and repulsive particle swarm optimization, LNCS 8588 (2014), 213–220.

15.

Liang

N.Y.

, Saratchandran

, Huang

G.B.

and Sundararajan

, Classification of mental tasks from EEG signals using extreme learning machine, International Journal of Neural Systems 16(1) (2006), 29–38.

16.

Huang

G.B.

, Bai

, Kasun

L.L.C.

and Vong

C.M.

, Local receptive fields based extreme learning machine, IEEE Computational Intelligence Magazine 10(2) (2015), 18–29.

17.

Huang

G.B.

and Chen

, Convex incremental extreme learning machine, Neurocomputing 70(16) (2007), 3056–3062.

18.

Storn

and Price

, Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces, Journal of Global Optimization 11(4) (1997), 341–359.

19.

Das

and Suganthan

P.N.

, Differential evolution–a survey of the state-of-the-art, IEEE Transactions on Evolutionary Computation 15(1) (2011), 4–31.

20.

Mukherjee

, Patra

G.R.

, Kundu

and Das

, Cluster-based differential evolution with crowding archive for niching in dynamic environments, Information Sciences 267 (2014), 58–82.

21.

Srinivas

and Patnaik

L.M.

, Genetic algorithms: A survey, Computer 27(6) (1994), 17–26.

22.

Kennedy

and Eberhart

R.C.

, Particle swarm optimization, in: IEEE Int Conf Neural Networks, 1995, Piscataway, NJ, IEEE, pp. 1942–1948.

23.

Cao

, Zhao

and Zaiane

O.R.

, A PSO-based cost-sensitive neural network for imbalanced data classification, in PAKDD 2013 workshops, LNAI 7867, 2013, pp. 452–463.

24.

Garg

, A hybrid PSO-GA algorithm for constrained optimization problems, Applied Mathematics and Computation 274 (2016), 292–305.

25.

Zhou

Z.H.

, Wu

and Tang

, Ensembling neural networks: Many could be better than all, Artificial Intelligence 137(1-2) (2002), 239–263.

26.

Zhang

, Dai

and Ma

, Extreme learning machines’ ensemble selection with GRASP, Applied Intelligence 43 (2015), 439–459.

27.

Ekbal

and Saha

, Weighted vote-based classifier ensemble for named entity recognition: A genetic algorithm-based approach, ACM Transactions on Asian Language Information Processing 10(2) (2011), 37.

28.

Zhang

, Zhang

, Cai

and Yang

, A Weighted voting classifier based on differential evolution, Abstract and Applied Analysis (2014), 6. Article ID 376950.

29.

KEEL dataset repository. http://sci2s.ugr.es/keel/datasets.php

30.

Demsar

, Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research 7(1) (2006), 1–30.