Min-Max Ensemble Feature Selection 1

Abstract

Feature selection is one of the key problems in machine learning and data mining. It involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. It can reduce the dimensionality of original data, speed up the learning process and build comprehensible learning models with good generalization performance. Nowadays, ensemble idea has been used to improve the performance of feature selection by integrating multiple base feature selection models into an ensemble one. In this paper, in order to improve the efficiency of feature selection in dealing with large scale, high dimension and imbalanced problems, a Min-Max Ensemble Feature Selection (M2-EFS) is proposed, which is based on balanced data partition and min-max ensemble strategy. The experimental results demonstrate that the M2-EFS can obtain higher performance than other classical ensemble methods in most cases, especially for large scale, high dimension and imbalanced data.

Keywords

Feature selection Min-Max strategy ensemble data partition

1 Introduction

Feature selection is an important preprocessing step in many machine learning applications, and it is often used to find the smallest subset of feature that maximally increases the performance of following learning model. Besides maximizing model performance, other benefits of applying feature selection include the ability to build simpler and faster models using only a subset of all features, as well as gaining a better understanding of the insight of data [1]. Feature selection techniques can be divided into three categories depending on how they interact with classifier. Filter methods directly operate on the data set, and provide a feature weighting, ranking or subset as output. These methods have the advantage of being fast and independent of the classification model, but at the cost of inferior results. Wrapper methods perform a search in the space of feature subsets, guided by the outcome of the model (e.g. classification performance on a cross-validation of the training set). They often report better results than filter methods, but at the price of an increased computational cost [2]. Finally, embedded methods use internal information of the classification model to perform feature selection (e.g. use of the weight vector in support vector machines). They often provide a good trade-off between performance and computational cost [1 –3]. We have reviewed the recent advances of feature selection in survey paper [4].

Nowadays, the growth of the high-throughput technologies has resulted in exponential growth in the harvested data with respect to both dimensionality and sample size. Efficient and effective preprocess for these data becomes increasing important and challenging, and traditional feature selection is impractical for these data [5]. We will solve the feature selection problem of large scale imbalanced data efficiently using the ensemble idea. The ensemble feature selection involves creating a set of different base feature selectors and aggregating the results of all base feature selectors. There are three categories of methods for generating base feature selectors: one is to use Bootstrap strategy on the original data sets to obtain many data subsets, and then train different base feature selectors on these data subsets [6 –11]. Another is to train base feature selectors using different feature selection algorithms on the original data set [12], or obtain different base feature selectors using a feature selection algorithm based on random strategy, such as genetic algorithm, random forward backward search. The last one is the combination of Bootstrap strategy and different feature selection algorithm [13]. The ensemble strategy for base feature selection results depends on the output type of feature selection [14]. For example, if the output is feature weights or feature ranking, the outputs of base feature selection will be summed by the linear combination method, and then to reorder the feature based the obtained feature weights or the ranking value and to obtain the final ensemble result [9 –11]. If the output of feature selection is feature subset, the voting method will be used. That is to say, the times of feature being selected by base feature selection is counted, and then the final ensemble result is the feature subset with top largest number. The ensemble methods mentioned above are ineffective for the large-scale imbalanced data set. In this paper, the Min-Max Ensemble Feature Selection (M2-EFS) is proposed where the base feature selectors are produced based on data partition and the results are integrated by min-max strategy. M2-EFS can efficiently handle both large scale and imbalanceddata.

This paper is the expand of our previous work [15]. The rest of this paper is organized as follow. Section 2 introduces the proposed method M2-EFS. Section 3 presents the experimental results to show the effectiveness of our approach. Section 4 concludes this paper.

2 Min-Max ensemble feature selection M2-EFS

Nowadays, ensemble idea has been used to improve the performance of feature selection. Same to the ensemble models for supervised learning, there are two essential steps in ensemble feature selection: Creating a set of different base feature selectors with outputs and aggregating the results of all feature selectors. Each feature selector is always learned from a bootstrapped sample subset of original data. Some classical bootstrap strategies are bagging, random-based and equal-based, et al. The well-known aggregation strategies are voting,K-median, mean weight, and so on. However, these traditional ensemble feature selections did not consider the large scale imbalanced problem. To address the large scale and imbalanced issues, we propose a Min-Max Ensemble Feature Selection (M2-EFS) that consists of data partition and Min-Max ensemble strategy.

2.1 Data partition

Before the introduction of data partition, some notations used in this paper will be introduced. We use X = (x₁, y₁) , (x₂, y₂) , . . . , (x_n, y_n) to denote a data set X containing n samples. The k-th sample in X is $x_{k} = (x_{k}^{1}, x_{k}^{2}, . . ., x_{k}^{d}) \in R^{d} (k = 1, 2, . . ., n)$ , and each sample is described by d features. For binary classification problem, the label of each sample is y_k = {+ 1, - 1} (k = 1, 2, . . . , n). As for multi-class classification problems, we could take advantage of either “one versus one” strategy or “one versus other” strategy to convert it into binary classification problems [16, 17].

The main idea of data partition in M2-EFS is as follows. Suppose the original data set X is a binary classification problem consists of positive class X⁺ with label +1 and negative class X^- with label -1. $X^{+} = {x_{k}^{+}, + 1}_{k = 1}^{l^{+}}, X^{-} = {x_{k}^{-}, - 1}_{k = 1}^{l^{-}},$ (1) where $x_{k}^{+} \in R^{d}$ is a sample in positive class (positive sample) and $x_{k}^{-} \in R^{d}$ is a sample in negative class (negative sample). l⁺, l^- is the number of samples in positive and negative class respectively.

Then, for large scale and imbalanced binary classification problem, we firstly divide X⁺, X^- into N⁺, N^- data blocks respectively. To achieve it, random based method, hyper plane based method and clustering based method can be used [17]. The results of partition is described as (2), (3), $X_{i}^{+} = {x_{k}^{+ i}, + 1}_{k = 1}^{l_{i}^{+}} (i = 1, 2, . . ., N^{+}),$ (2) $X_{j}^{-} = {x_{k}^{- j}, - 1}_{k = 1}^{l_{j}^{-}} (j = 1, 2, . . ., N^{-}),$ (3) where $X_{i}^{+}$ denotes the i-th data block of positive class, and $l_{i}^{+}$ is the number of samples in $X_{i}^{+}$ . $X_{j}^{-}$ and $l_{j}^{-}$ are the counterparts for negative class.

After the partition, we will merge positive data blocks and negative ones above to obtain N⁺ × N^- data subsets T, as shown in (4). $T^{i, j} = X_{i}^{+} \cup X_{j}^{-} (i = 1, 2, . . ., N^{+}, j = 1, 2, . . ., N^{-}),$ (4) where T^i,j denotes a training sample subset, which is the combination of the i-th positive sample block and the j-th negative sample block.

Now, we have introduced the partition of original data set. Then, for imbalance data sets, we could adjust the block number N⁺ or N^- to generate a balance training sample subset T^i,j, i.e., the number of positive samples $l_{i}^{+}$ is similar to the number of negative samples $l_{j}^{-}$ in T^i,j. The determination of these two parameters N⁺ and N^- depends on the number of samples in each class. For example, for an imbalanced sample set of 1000 instances includes 800 positive samples and 200 negative samples, we could set N⁺ = 8 and N^- = 2 to obtain 100 instances for each data blocks and then to generate 16 balanced training sample subsets with 200 samples. Generally, N⁺, N^- are set before data partition, however, we could not know the number of training samples and the sample ratio between different categories, especially in online learning and stream learning. Therefore, an adaptive partitioning strategy is used in this paper, which only requires the parameter N⁺ to be set at the beginning. In order to maintain the relative balance between positive samples and negative samples, we could compute the parameter N^- according to N⁺, as (5). $N^{-} = ⌊ l^{-} / l_{\arg}^{+} ⌋,$ (5) where $l_{\arg}^{+} = ⌊ l^{+} / N^{+} ⌋$ and $l_{\arg}^{+}$ denotes the average number of positive samples in each positive block after partition. This method could ensure the number of positive samples and negative samples are similar, and positive one is smaller than negative one, i.e., $l_{i}^{+} < l_{j}^{-} (i = 1, 2, . . ., N^{+}, j = 1, 2, . . ., N^{-})$ .

For large scale imbalanced data processing, the data partition above usually divides the large scale imbalanced problem into several small balanced sub-problems. For each sub-problem, feature selection can be implemented independently. Therefore, M2-EFS could easily deal with each independent sub-problem in a parallel way and finally achieve the process of large scale problem. The time complexity of this part depends on the data partition strategy. For example, the time complexity of random partition strategy is O (n).

2.2 Min-Max ensemble strategy

For N⁺ × N^- training sample subset, classical feature selection algorithm is learned on each subset T^i,j and output N⁺ × N^- base feature selection results. Suppose each base feature selection result is feature weighting as in (6). $W^{i, j} = F (T^{i, j}) (i = 1, 2, . . ., N^{+}, j = 1, 2, . . ., N^{-}),$ (6) where F denotes a feature selection algorithm. Then the Min-Max ensemble strategy is used to integrate these base feature selection results, and it is described in (7) and (8). $\begin{matrix} W^{i} & = & MIN (W^{i, 1}, W^{i, 2}, . . ., W^{i, N^{-}}) \\ (i = 1, 2, . . ., N^{+}), \end{matrix}$ (7) $W = MAX (W^{1}, W^{2}, . . ., W^{N^{+}}) .$ (8)

The rule of minimal ensemble unit-MIN is to take the minimum weight value of each feature in different base feature selection results. These base feature selection results are derived from the training subsets containing the same positive samples and different negative samples. The rule of maximum ensemble unit-MAX is to take the maximum weight value of each feature in different feature selection results after the MIN operation. For this ensemble strategy, the time complexity is O (N⁺N^-).

Now, we are at position to summarize the framework of M2-EFS and present its pseudo code, which are shown in Fig. 1 and in Algorithm 1, respectively.

Fig.1

Min-Max Ensemble Feature Selection-M2 EFS model.

2.3 Highlights of M2-EFS

M2-EFS is similar to the integration of sampling methods with ensemble learning. However, both the sampling and the ensemble strategy of M2-EFS are quit different from others. The sampling of M2-EFS is task decomposition, instead of under-sampling the majority classes or over-sampling the minority classes for imbalanced data set. And the ensemble of M2-EFS is through a specially designed Min-Max network, different from the commonly used weighted voting or linear combination.

Algorithm 1 Min-Max Ensemble Feature Selection-M2-EFS

Input : Training data X, the number of positive data blocks N⁺, method of data partition P, feature selection algorithm F

Output : Feature weights W

(1) Data partition phase:

(a) Compute N^- using Equation (5).

(b) Transform X into binary classification X⁺, X^-, and divide X⁺, X^- using P

${X_{1}^{+} \cup X_{2}^{+} \cup . . . \cup X_{N^{+}}^{+} = X^{+}}$

${X_{1}^{-} \cup X_{2}^{-} \cup . . . \cup X_{N^{-}}^{-} = X^{-}}$

For j = 1: N^-

$T^{i, j} = {X_{i}^{+}} \cup {X_{j}^{-}}$

End

(2) Min-Max ensemble phase:

(d) For i = 1: N⁺

For j = 1: N^-

W^i,j = F (T^i,j)

End

Wⁱ = MIN (W^i,1, W^i,2, . . . , W^{i,N^-})

End

(e) Output W = MAX (W¹, W², . . . , W^{N
⁺})

Both M2-EFS and other integration methods of sampling and ensemble can convert imbalanced data into balanced, thus eventually solve the imbalanced learning problem. However, M2-EFS has a notable merit of parallel learning, which is precise in facing large-scale data sets. The modules in M2-EFS are independent thus they can be learned individual, while the modules in Adaboost must be generated sequentially.

3 Experiments and results analysis

We like to evaluate M2-EFS on various data sets. Some common data sets are used to verify the effectiveness of the M2-EFS. Moreover, Internet traffic data set is also used to evaluate M2-EFS on large scale and imbalanced scenario.

3.1 Preliminary knowledge

3.1.1 Data sets

Five data sets are used in this experiment. Four data sets names, the size of each data set and their sizes of attribute are shown in Table 1. The PcMac is the data set about text recognition, the HIVA is a medical experimental data and the GINA is a data set about image identification. These three data sets are small scale with high dimensions. In addition, the HIVA is more imbalanced than other two. And the fourth data set Adult is large scale, low dimension and imbalanced [18 , 21].

Table 1
The summary of common data sets

Name Size of Dataset (Positive/Negative) Size of Attribute

PcMac 1943 (982/961) 3289

GINA 3153 (1550/1603) 970

HIVE 3845 (135/3710) 1617

Adult 32561 (7841/24720) 123

Name	Size of Dataset (Positive/Negative)	Size of Attribute
PcMac	1943 (982/961)	3289
GINA	3153 (1550/1603)	970
HIVE	3845 (135/3710)	1617
Adult	32561 (7841/24720)	123

We also validate the performance of M2-EFS on Internet traffic data set, which is provided by A.W. Moore et al. in [22]. It consists of 20000 samples, 29 key features and 6 classes. This data set is not only large scale, but imbalanced rate is very high. The number of samples in some categories account for only 5% or less, which will result in severe under-fitting. The class names, the size of each class and their sample proportions are shown in Table 2. The training set and test set have been presented in [22], so we directly used them in the experiments.

Table 2

The description of imbalanced Internet traffic data set

Class name	Size of class	Proportion%	Blocks
DATABASE	337	1.7	2
SERVICE	489	2.4	2
P2P	802	4.0	4
FTP	2635	13.2	16
MAIL	6037	30.2	36
HTTP	9700	48.5	60

3.1.2 Evaluation criteria

For imbalanced problem, the correct classification of minority class should be paid more attention. Then G-Mean metric is widely adopted in imbalanced classification, which treats the classification accuracy of each class equally. The G-Mean metric is defined as follows, $G - Mean = PR \times NR,$ (9) $PR = TP / (TP + FN), NR = TN / (TN + FP),$ (10) where PR is the classification accuracy of the positive class, while NR is the negative one. TP denotes the numbers of positive samples that are correctly classified, FN denotes that numbers of positive samples that are incorrectly classified, TN denote that numbers of negative samples that are correctly classified, and TP denotes that numbers of negative samples that are incorrectly classified.

3.1.3 Feature selection algorithms

As introduced in Section 2, the traditional feature selection algorithm should be chosen to produce the base feature selector in ensemble feature selection. In our case, two well-known feature selection algorithms, i.e., Fisher score [23] and Relief [24], are utilized. Both of them belong to filter model with lower time complexity, and they are suitable for high dimensional data. The outputs of Fisher score and Relief are feature weights.

Fisher score is one of the most widely used feature selection methods. The key idea of Fisher score is to find feature weights such that in the data space spanned by the weighted features, the distances between data points in different classes are as large as possible, while the distances between data points in the same class are as small as possible. Then the criterion for Fisher score prefers features that have similar values for the samples from the same class and different values for the samples from different classes. Feature weights are obtained by computing the deviation of each feature from its mean value on all classes.

The Relief algorithm is considered one of the most successful feature selection algorithms due to its simplicity and effectiveness. The key idea of Relief is to iteratively estimate feature weights according to their ability to discriminate between neighboring samples.

3.2 Experimental results

M2-EFS is originally designed for two-class feature selection. For the multi-class problem in Internet traffic classification, the “one versus one” strategy is adopted to transform it into some binary problems. One class is considered as positive class, and the other one is considered as negative class. In our experiments, we will present the results of different ensemble strategies comparison on the data sets shown in Table 1 and Internet traffic data set, different ensemble feature selection algorithms comparison on Internet traffic data set and different strategies comparison for dealing with imbalanced problem on Internet traffic data set. For Internet traffic data set, to ensure the balanced training subset after data partition, the number of division blocks for each class are listed in last column (from left to right) of Table 2. 10-cross validation is adopted in the experiments. SVM [19] is utilized to classify the samples after feature selection, where its sigmoid value is set to 2, the loss function C is set to 32 and the SMO algorithm is used to calculate the parameters.

3.2.1 Validation of ensemble strategies

After data partition by hyper plane (HP), three ensemble strategies (i.e., Min-Max strategy, voting and K-median) are used to integrate the results of base feature selection trained on each block. The K-median clustering ensemble strategy is mainly based on the idea to choose K representative results from all base feature selection results as its output. In our case, only one representative result is needed, then K = 1. Voting method is to compute the frequency of each feature being selected from each base feature selection result.

Figures 2 and 3 are the experimental results of PcMac data set, Figs. 4 and 5 are the results of GINA data set, Figs. 6 and 7 are the results of HIVA data set, Figs. 8 and 9 are the results of Adult data set. For feature selection algorithms, Figs. 2, 4, 6 and 8 show results for the Fisher score, while Figs. 3, 5, 7 and 9 are the experimental results for Relief algorithm. The X-axis is the number of selected features, and Y-axis is the G-Mean value. From the experimental results, we can observe that the performance of Min-Max ensemble strategy is better or similar to others in most cases. For the HIVA data set, the Min-Max strategy has obvious advantages compared to other ensemble strategies in the low dimensional space. This result should be attributed to the imbalanced property of the data set. When Relief is used as base feature selection algorithm, the Min-Max strategy has obvious advantages compared to other ensemble strategies in any number of selected feature space. However, for the Adult data set, the advantage of the Min-Max ensemble strategy is not obvious, which might be rooted in the low dimensions of the data set.

Fig.2

The experimental results of three ensemble strategies after data partition for Fisher score feature selection algorithm on PcMac data set.

Fig.3

The experimental results of three ensemble strategies after data partition for Relief feature selection algorithm on PcMac data set.

Fig.4

The experimental results of three ensemble strategies after data partition for Fisher score feature selection algorithm on GINA data set.

Fig.5

The experimental results of three ensemble strategies after data partition for Relief feature selection algorithm on GINA data set.

Fig.6

The experimental results of three ensemble strategies after data partition for Fisher score feature selection algorithm on HIVE data set.

Fig.7

The experimental results of three ensemble strategies after data partition for Relief feature selection algorithm on HIVE data set.

Fig.8

The experimental results of three ensemble strategies after data partition for Fisher score feature selection algorithm on Adult data set.

Fig.9

The experimental results of three ensemble strategies after data partition for Relief feature selection algorithm on Adult data set.

On the other hand, the same experiments are also conducted on Internet traffic data set. Figure 10 shows the results for the Fisher score algorithm, while Fig. 11 is the experimental results for Relief algorithm. From the experimental results, we also can observe that the performance of Min-Max ensemble strategy is better or similar to others in most cases.

Fig.10

The experimental results of three ensemble strategies after data partition for Fisher score feature selection algorithm on Internet traffic data set.

Fig.11

The experimental results of three ensemble strategies after data partition for Relief feature selection algorithm on Internet traffic data set.

3.2.2 Comparisons on internet traffic data set

We compare the M2-EFS with other traditional ensemble feature selection methods, i.e., bagging-voting and bagging-K-median, on imbalanced Internet traffic data set. Bagging-voting and bagging-K-median are classical ensemble feature selection algorithms that bagging is used to sampling data set and voting strategy or K-median strategy is adopted to integrate the outputs of base feature selectors. For M2-EFS, two partition strategies (hyper plane (HP), random partition (RP)) are explored, which leads to two methods, i.e., HP-M2-EFS and RP-M2-EFS. The base feature selectors are built by traditional feature selection algorithm Fisher score and Relief. The experimental results for Fisher score are shown in Fig. 12, while Fig. 13 is for Relief. The experimental results have shown that the bagging coupled with voting (bagging-voting) has better performance than bagging coupled with K-median (bagging-K-median). However, Regardless of the partition strategy, M2-EFS are always better than bagging-voting. Obviously, the combination of Min-Max strategy with hyper plane (HP-M2-EFS) has better performance than Min-Max combined with random data partition strategy (RP-M2-EFS).

Fig.12

The experimental results of different ensemble feature selection algorithms for Fisher score on Internet traffic data set.

Fig.13

The experimental results of different ensemble feature selection algorithms for Relief on Internet traffic data set.

Moreover, two commonly used methods for dealing with imbalanced problem, i.e., the 1:1 Random undersampling (1:1 RUS) and 1:1.5 RUS, are compared with M2-EFS with HP partition strategy. 1:1 RUS is to make the number of samples in the majority class is same to the number of samples in the minority class through randomly sampling. And 1:1.5RUS is to make the number of samples in majority samples is 1.5 times as much as the number of samples in minority class. In the experiments, Fisher score is used to choose features for 1:1 RUS and 1:1.5 RUS, and it is also used to produce base feature selectors for M2-EFS. The number of selected features is 18 as example. The experimental results are shown in Fig. 14, where X-axis is different methods for imbalanced problems and Y-axis is the G-Mean value. From the experimental results, we can observe that M2-EFS is better than others, and M2-EFS is efficient to deal with imbalanced data sets.

Fig.14

The experimental results of different methods dealing with imbalanced problem.

4 Conclusions

In this paper, the ensemble feature selection method based on balanced data partition and Min-Max strategy is proposed and validated on real world data set, such as the large scale imbalanced Internet traffic data set. The proposed method can efficiently deal with imbalanced data set through data partition. Since the training process of each base feature selector in the proposed method M2-EFS is independent, then it is easy to be parallelized and can handle large scale data. The various experiments have shown its higher performance than other methods.

References

Guyon

and Elisseeff

, An introduction to variable and feature selection, Journal of Machine Learning Research3 (2003), 1157–1182.

Kohavi

and John

G.H.

, Wrappers for feature subset selection, Artificial Intelligence97(1) (1997), 273–324.

Saeys

, Inza

and Larrasaga

, A review of feature selection techniques in bioinformatics, bioinformatics23(19) (2007), 2507–2517.

, Li

and Liu

, Recent advances in feature selection and its application, Knowledge and Information Systems (2017). DOI: https://doi.org/10.1007/s10115-017-1059-8.

Tang

, Alelyani

and Liu

, Feature selection for classification: A review, Data Classification: Algorithms and Applications37 (2014), 313–334.

Abeel

, Helleputte

and Van de Peer

, Dupont

and Saeys

, Robust biomarker identification for cancer diagnosis with ensemble feature selection methods, Bioinformatics26(3) (2010), 392–398.

Saeys

, Abeel

and Van de Peer

, Robust feature selection using ensemble feature selection techniques, Proc. of the 25th European Conf. on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2008, pp. 313–325.

Kursa

M.B.

, Robustness of random forest-based gene selection methods, BMC Bioinformatics15 (2014), 1–8.

and Li

, Energy-based feature ranking for assessing the dysphonia measurements in Parkinson detection, IET Signal Processing6(4) (2012), 300–305.

10.

and Gao

S.Y.

, Energy-based feature selection and its ensemble version, Intąŕl Conf. on Neural Information Processing (ICONIP), Lecture Notes in Computer Science (LNCS), 7063, 2011, pp. 53–62.

11.

, Si

, Zhou

G.J.

, Huang

S.S.

and Chen

S.C.

, FREL: A stable feature selection algorithm, IEEE Trans on Neural Networks and Learning Systems26 (2015), 1388–1402.

12.

Bolon-Canedo

, Sánchez-Maroño

and Alonso-Betanzos

, Data classification using an ensemble of filters, Neurocomputing135(5) (2014), 13–20.

13.

Zhou

Q.F.

, Ding

J.C.

, Ning

Y.P.

, Luo

L.K.

and Li

, Stable feature selection with ensembles of multi-reliefF,ŕl Conf. on Natural Computation (ICNC), Proc. of 10th Intąŕl Conf. on Natural Computation (ICNC), 2014, pp. 742–747.

14.

Woznica

, Nguyen

and Kalousis

, Model mining for robust feature selection, Proc. of ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD), 2012, pp. 913–921.

15.

Huang

Y.X.

, Li

and Qiang

B.H.

, Internet traffic classification based on Min-Max Ensemble Feature Selection, Proc. Int’l Joint Conf. on Neural Networks (IJCNN) (2016), pp. 3485–3492.

16.

Mitchell

, Machine Learning, McGraw Hill, New York, 1997.

17.

B.L.

and Ito

, Task decomposition and module combination based on class relations: A modular neural network for pattern classification, IEEE Transactions on Neural Networks10(5) (1999), 1244–1256.

18.

Bache

and Lichman

, UCI machine learning repository, http://archive.ics.uci.edu/ml, 2013.

19.

Chang

C.C.

and Lin

C.J.

, LIBSVM: A library for support vector machines, http://www.csie.ntu.edu.tw/cjlin/libsvm/, 2014.

20.

Guyon

, Agnostic learning vs prior knowledge. http://www.agnostic.inf.ethz.ch/datasets.php, 2007.

21.

Liu

, Ye

J.P.

, Zhao

, et al., Feature selection at Arizona State University in conjunction with the DMML, http://featureselection.asu.edu/datasets.php (2008).

22.

Moore

A.W.

and Zuev

, Internet traffic classification using bayesian analysis techniques, ACM SIGMETRICS Performance Evaluation Review, ACM33(1) (2005), 50–60.

23.

, Li

and Han

, Generalized fisher score for feature selection, arXiv preprint arXiv:1202.3725, 2012.

24.

Robnik-Sikonja

and Kononenko

, Theoretical and empirical analysis of ReliefF and RReliefF, Machine Learning53(1-2) (2003), 23–69.

Min-Max Ensemble Feature Selection 1

Abstract

Keywords

1 Introduction

2 Min-Max ensemble feature selection M2-EFS

2.1 Data partition

3 Experiments and results analysis

3.1 Preliminary knowledge

3.1.1 Data sets

Table 1 The summary of common data sets Name Size of Dataset (Positive/Negative) Size of Attribute PcMac 1943 (982/961) 3289 GINA 3153 (1550/1603) 970 HIVE 3845 (135/3710) 1617 Adult 32561 (7841/24720) 123

3.2 Experimental results

3.2.1 Validation of ensemble strategies

References

Table 1
The summary of common data sets

Name Size of Dataset (Positive/Negative) Size of Attribute

PcMac 1943 (982/961) 3289

GINA 3153 (1550/1603) 970

HIVE 3845 (135/3710) 1617

Adult 32561 (7841/24720) 123