A hybrid filter-wrapper feature selection method for DDoS detection in cloud computing

Abstract

In today’s rapidly emerging computing environment, cloud computing has become a significant trend for the delivery of IT business services, and representes a potential technology resource choice that offers cost effective and scalable processing. However, Distributed Denial of Service (DDoS) attacks continually target cloud services and resource availability, rendering the cloud unavailable to the detriment of both cloud providers and users. In previous research, feature selection, has revealed its importance in the recognition of irrelevant and redundant features, which increases detection rates and decreases processing speeds toward the evaluation of intrusive patterns, while reducing computational complexity. In this work we propose a Hybrid Filter-Wrapper Feature Selection HFWFS method for DDoS detection, which takes advantage of both filter and wrapper methods, to identify the most irrelevant and redundant features in order to form a reduced input subset. Subsequently, it applies a wrapper method to achieve the optimal selection of features. To evaluate the performance of our proposed model, we used two datasets (NSL-KDD and UNSW-NB15) and a Random Tree classifier. The results indicated that the proposed model may reduce the number of features from more than 40 to nine, while maintaining high detection accuracy, in contrast to well-known feature selection methods.

Keywords

Filter methods wrapper methods hybrid feature selection Cloud DDoS intrusion detection system

1. Introduction

The adoption of cloud computing has been a lively topic of discussion over the last few years. The NIST has advanced the most commonly employed definition of the cloud-computing approach as an approach that facilitates expedient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that may be rapidly provisioned and released with minimal supervision efforts, or service provider interactions. There are three primary service models, namely Software as a Service (SaaS), Infrastructure as a Service (IaaS), and Platform as a Service (PaaS) [4, 8]. The emergence of the cloud computing approach serves as a motivating factor for both industry players and academia to embrace the model in order to host a wide range of functions, ranging from highly computationally rigorous functions down to lightweight services.

Concomitant with the drastic increase in the popularity of cloud computing, attention has shifted toward security issues that have been introduced since the model was operationalized. One such issue comprises Distributed Denial of Service (DDoS) attacks. A Denial of Service attack is one that has the purpose of preventing legitimate users from engaging with a specified network resource, such as a website, web service, or computer system [30]. A Distributed Denial of Service (DDoS) attack is a coordinated assault on the availability of services of a given target system or network, which is launched indirectly through multiple compromised computing systems known as zombies [30, 25]. Such attacks have continued to propagate both in size and sophistication, and extortion has been identified as one of the main underlying motives [21].

Generally, DDoS detection systems are based on the assumption that the behavior of intruders is different from that of legitimate users, which may be categorized into two primary approaches; misuse based, or anomaly based. Misuse detection, also known as signature-based detection, is based on the storage of all features of known attacks in a knowledge database in order to identify an attacker; the reason why it is efficient for the detection of known attacks. Alternatively, anomaly based techniques employ the behavioral changes of users in the network, where any deviations from normal or expected behavior is considered to be associated with an intruder .

One of the key issues with DDoS detection systems relates to response times, due to the increasing volume of information and data that must be processed in real time [23, 9, 55, 54]. Utilizing feature selection can deal with such issues; it improves classification accuracy and reduces computational complexity by identifying only the critical and relevant attributes from a dataset. Feature selection is utilized in various areas such as statistical pattern recognition, machine learning, and data mining [11]. There are three principal categories of feature selection, including algorithms, filter, wrapper, and embedded methods [31].

Filter feature selection methods apply a statistical measure to assign a score to each feature. Features are ranked by the score, and are either selected to be kept or removed from the dataset. The methods are often univariate and consider the feature independently, or with regard to a dependent variable (e.g., Chi squared test, information gain, and correlation coefficient scores) [33, 49, 48]. Wrapper methods, on the other side, consider the selection of a set of features as a search problem, where different combinations are prepared, evaluated and compared to other combinations. A predictive model is employed to evaluate a combination of features and assign a score based on model accuracy (e.g., recursive feature elimination algorithm.). Lastly, embedded methods learn which features best contribute to the accuracy of the model while the model is being created. The most common type of embedded feature selection methods are regularization methods (LASSO, Elastic Net, and Ridge Regression).

Recent studies have shown that the combining of feature selection methods can improve the performance of classifiers by identifying features that are individually weak, but strong as a group [44], removing redundant features [48], and determining features that have a high correlation with the output class. In this work we propose a hybrid feature selection that combines both filter and wrapper. We present a Hybrid based Filter and Wrapper Feature Selection (HFWFS) method that combines the output of Chi-square, Info Gain, Gain Ratio and WrapperSubset

This paper presents a novel hybrid feature selection strategy using filter and wrapper feature selection model for DDoS detection on Cloud environment, which combines filter methods (Chi-Square, InfoGain and GainRatio) and a wrapper based on greedy step-wise search method, in order to select important features and to significantly reduce the feature set while maintaining or improving classification accuracy using a Random Tree classifier. We specify the main contribution of our work as follows:

•
Novelty. The proposed approach inserts the reduced and substantial attributes obtained by filters into a wrapper phase where a predictive model is used to evaluate different combination and to get the most relevant subset of features to identify DDoS attack in Cloud environment.
•
Effectiveness. The hybrid algorithm takes the efficiency of filters and the accuracy of wrappers, so we can improve the classification accuracy and we decrease the processing time of wrappers. The selected features by our model show the best detection accuracy compared with existing feature selection approaches.
•
Robustness. Our proposed feature selection approach is tested on two different datasets NSL-KDD and UNSW-NB15 data sets. On both datasets the hybrid model produces an improvement in detection accuracy compared to other approaches.

The remainder of the paper is organized as follows: related work is presented in Section 2, while the proposed HFWFS method is described in Section 3. In Section 4, the classification algorithm and benchmark datasets are presented. In Section 5, our experimental findings are discussed. Section 6 concludes the paper.
2. Literature review

Due to the continuous expansion of network traffic data volume and the high dimensionality of the feature space, feature selection is becoming an essential phase toward building intrusion detection systems [2]. In any intrusion attempt there are a number of behavioral patterns and interrelations that are unique and recognizable. Since these patterns are hidden within irrelevant and redundant features, it is often difficult to discover them. Feature selection methods have been applied in classification problems to select a reduced feature subset from the original set to achieve faster and more accurate classification, and to eliminate less relevant features [42]. The selection of the correct feature can be quite a challenging task; several methods have been proposed to resolve this issue, while discarding redundant, irrelevant, and noisy features [22].

Mukkamala and Sung [36] proposed a novel feature selection algorithm to reduce the feature space of the KDD Cup 99 dataset, from 41 dimensions to six dimensions, and evaluated these six selected features using an IDS based on SVM. The results showed that the classification accuracy increased by 1% when using the selected features. Chebrolu et al. [35] investigated the performance of a Markov blanket model and decision tree analysis for feature selection, which showed the capacity to reduce the number of features in KDD Cup 99, from 41 to 12 features. Chen et al. [50] proposed an IDS based on Flexible Neural Tree (FNT). The model applied a pre-processing feature selection phase to improve the detection performance. Using KDD Cup 99, the FNT model achieved 99.19% detection accuracy with only four features.

Huang et al. [46] present an anomaly detection model based on quantum wavelet neural network and normalized mutual information feature selection, the method is used to select best feature combination from a given sample features sets. Experiment results using KDD99 data set showed a reduction of features from 41 to 20 which is still an important number that affect the train and execution time, even the model achieve a detection accuracy of 98.88 when dealing with DoS attack. In [28] authors present an intelligent water drops (IWD) algorithm for feature selection, a nature inspired optimization algorithm with support vector machine as a classifier. Experiments conducted using KDDCUP99 dataset, the model was able to reduce the number of features to 9 features while maintaining the detection accuracy at 99.09%.

Wang and Combault [47] used Information Gain to extract and rank nine of the most important attributes as input values to train and detect a DDoS attack in the network traffic. The results obtained using a C 4.5 and Bayesian network indicated that the detection accuracy remained the same or was even improved. Bolon-Canedo et al. [44], proposed a new method for combining discritizers, filters, and classifiers to improve the classification performance for both binary and multi-class classification problems. This method was applied to the KDD Cup 99 dataset, where the results obtained showed an improved performance, while reducing the number of features.

The Group Method for Data Handling (GMDH) comprises a supervised inductive learning method [53]. The KDD’99 dataset was preprocessed using Info Gain, Gain Ratio, and GMDH to rank features, and the results of the detection rates achieved 98%. Lin et al. [37] developed an anomaly based intrusion detection strategy that combined SVM, Decision Tree, and Simulated Annealing (SA). SVM and SA identified the most critical features from the KDD’99 dataset to detect new attacks toward improving the detection accuracy of SVM and DT. Feature selection and multi-agent intrusion detection were implemented onto an Industrial Control System (ICS) [12]. The NSL-KDD data set was used to compare the performance of the proposed method to common feature selection techniques (Info Gain, Gain Ratio, Relief, and Chi-square).

Sindhu et al. [39] addressed the challenge of identifying important features by employing a wrapped based feature selection algorithm, and realized an IDS with a neuro tree to achieve a higher detection accuracy. Zhang et al. [11] proposed a feature selection approach based on a Bayesian Network classifier with the aim of decreasing the attack detection time, and to improve the classification accuracy, as well as the true positive rates. The performance of this proposed approach was evaluated with the NSL-KDD dataset, and compared with other commonly used feature selection methods. The authors of [16] have combined a Correlation-based Feature Selection (CFS) technique and ANN for DoS detection. The method consists of collecting the incoming network traffic, selecting relevant features for DoS detection the CFS method and classifying the network traffic into DoS traffic or normal traffic. The proposed method achieved satisfactory results on both UNSW-NB15 and NSL-KDD datasets.

Clustering is an unsupervised technique that is also used for feature selection. Data with similar characteristics are placed in a group called cluster. The task of feature selection involves two steps, namely partitioning the original features set into a number of homogenous subsets (clusters) and selecting a representative feature from each cluster. A K-Means method that creates mutually exclusive clusters, which effectively helps in identification of significant features for the attack, has been used in [34] to build a multi-measure approach containing filter, wrapper and clustering methods, that work in parallel in order to generate a rank of features by assigning a multi-weight to each feature. The experimental studies show that the less important features as identified by the proposed approach does not affect the performance of most of the classifiers in detecting DoS and probe attacks in various types of network datasets. Li et al. [51] proposed a gradual feature removal method that processed the dataset prior to employing the combining cluster method, ant colony algorithm, and SVM to classify network traffic as either normal or abnormal.

On the other hand, some research has been conducted on the detection of DDoS attacks on cloud computing. Wang et al. [45] proposed a graphic model based attack detection technique that may deal with the dataset shift problem in the era of cloud computing. Xu and Shelton [19] employed a continuous time Bayesian network model, which considered temporal sequences of events, to construct both network-based and host-based intrusion detection systems. Osanaiye et al. [31] proposed an ensemble-based multi-filter feature selection method that combined the output of four filter methods to achieve an optimal selection. Experimental evaluation performed using the NSL-KDD dataset and decision tree, revealed that the detection rate and classification accuracy was enhanced, while the number of features was reduced from 41 to 13.

The aim of this paper is to present an algorithm that supports data mining and security detection for Cloud DDoS attacks with minimal features for real time response. HFWFS is method that filters and combines the output of filter and wrapper methods to identify the most relevant features selected by each technique. The performance of a suggested feature selection algorithm was evaluated using a NSL-KDD dataset, which represented an improved version of the original KDD Cup 99 and UNSW-NB15 datasets; a recent dataset generated at the Australian Center of Cyber Security (2015). We implemented our HFWFS method using Weka data mining software [10], and used the Random Tree classification algorithms to classify the incoming network traffic as normal, or as a DoS attack.

3. A hybrid feature selection method HFWFS

3.1 Filters vs wrappers

As mentioned above, filter methods carry out the feature selection as a pre-processing phase. The information of a set of features might be calculated by the distances between classes, or via statistical measures over a training dataset. Figure 1 shows that Filters possess three primary stages: initially, feature subsets are generated, whereafter the features are scored and a threshold is determined in order to remove the features that are below a certain threshold. Finally, the testing step proceeds through the use of a learning algorithm, where the results include the testing results of the selected features. The filter model is more rapid than the wrapper approach. and results in an improved generalization as it acts independently of the induction algorithm. Hence, filters are often applied to feature selection in high-dimensional data.

Figure 1.

The filters.

The working procedure of wrappers is the same as that of the filters, with the exception that the measurement stage is replaced by a learning algorithm Fig. 2. This cause the wrappers perform slowly; however, they can achieve improved feature selection results in most cases. The stop criterion might be through a predefined number of selected features, or when the performance begins to degrade [15].

Figure 2.

The wrappers.

In this section, we describe our proposed hybrid filter wrapper method. We initially used Chi-square, Info Gain, and Gain Ratio methods to eliminate a large number of irrelevant and redundant features, with the aim of reducing the complexity of the search space. Secondly, we applied a wrapper method to find a smaller set of features than those obtained through filters. The proposed method utilized the filter method efficiency and high classification accuracy of wrapper methods to complete each other’s selections. Figure 3 depicts an overview of the proposed hybrid Filter-Wrapper method for feature selection.

Figure 3.

A flowchart of the proposed hybrid feature selection method.

3.1.1 Chi-square

$\chi^{2}$ statistic (Chi-square) was employed to test the independence of two variables in mathematical statistics. We used $\chi^{2}$ in the feature selection to measure the independence of features with the class. The greater the calculated score, the more category information the feature owned where a high dependent relationship existed [15]. The Chi-square formula is defined as follows [29]:

$\displaystyle\chi^{2}(r,c_{i})=\frac{N\left[P(r,c_{i})P(\bar{r},\bar{c}_{i})-P% (r,\bar{c}_{i})P(\bar{r},c_{i})\right]^{2}}{P(r)P(\bar{r})P(c)P(\bar{c}_{i})}$ (1)

where $N$ represents the entire dataset, $r$ denotes the presence of the feature ( $\bar{r}$ its absence) and $c_{i}$ signifies the class. $P(r,c_{i})$ is the probability that feature $r$ occurs in class $c_{i}$ , and $P(\bar{r},c_{i})$ is the probability that the feature $r$ does not occur in class $c_{i}$ . Also, $P(r,\bar{c}_{i})$ and $P(\bar{r},\bar{c}_{i})$ are the probabilities that the features do or do not occur in a class that is not labelled $c_{i}$ and so on. $P(r)$ is the probability that the feature appears in the dataset while $P(\bar{r})$ is the probability that the feature does not appear in the dataset. $P(c_{i})$ and $P(\bar{c}_{i})$ are the probabilities that a dataset is labelled to class $c_{i}$ or not.

3.1.2 Information gain

Information Gain (IG) (Quinlan, 1986) is one of the filter feature selection methods that is used as a criterion for the determination of relevant attributes from a set of features. The information gain reduces the uncertainty associated with identifying the class, when the value of the attribute is unknown based on information theory. The higher information gain of a feature is more important than the feature is for the detection of the class category. Uncertainty is determined by the entropy of the distribution, sample entropy, or estimated model entropy of the dataset. The entropy of variable $X$ [36] may be defined as [32]:

$\displaystyle H(X)=-\sum_{i}P(x_{i})\textit{log}_{2}(P(x_{i}))$ (2)

$P(xi)$ denotes the value of prior probabilities of $X$ . The entropy of $X$ after observing the value of another variable $Y$ is defined as:

$\displaystyle H(X\mid Y)=-\sum_{j}P(Y_{j})\sum_{i}P(x_{i}\mid y_{j})\textit{% log}_{2}(P(x_{i}\mid y_{j}))$ (3)

In Eq. (2) $P(x_{i}\mid y_{j})$ is the posterior probability of $X$ is given as $Y$ . The information gain is defined as the amount by which the entropy of $X$ decreases to reflect additional information in regard to $X$ provided by $Y$ , and is defined as:

$\displaystyle\textit{IG}(X\mid Y)=H(X)-H(X\mid Y)$ (4)

Based on this measure, it is clear that features $Y$ and $X$ are more correlated than features $Y$ and $Z$ , if $IG(X/Y)>IG(Z/Y)$ . Therefore, the feature ranking may be calculated using Eq. (3). This ranking is used to select the most important features.

3.1.3 Gain ratio

The gain ratio technique was introduced to improve the bias IG toward multi-valued attributes, by taking the number and size of branches into account when selecting an attribute. The intrinsic information of a given feature may be determined by the entropy distribution of the instance value. The gain ratio of a given feature $x$ and a feature value $y$ may be calculated [42] using Eqs (4) and (5) [14].

$\displaystyle\textit{Gain Ratio(x,y)}=\frac{\textit{Information Gain(y,x)}}{% \textit{Intrinsic Value(x)}}$ (5)

where

$\displaystyle\textit{Intrinsic Value(x)}=-\sum\frac{\left|S_{i}\right|}{S}*% \textit{Log}_{2}\frac{\left|S_{i}\right|}{S}$ (6)

Note that $|S|$ is the number of possible values feature $x$ can take, while $|Si|$ is the number of actual values of feature $x$ . In our work, we selected 14 features, which represented a one-third split of the ranked features using the NSL-KDD benchmark dataset. These 14 features represented the highest ranked feature using the gain ratio.

3.1.4 Wrapper based feature selection

The wrapper feature subset evaluation conducts a search for a viable subset using the learning algorithm itself as a portion of the function. In this work, repeated 5 cross-validation was used as an estimate for the accuracy of the Random tree classifier. A greedy stepwise forward search was employed to procure a list of attributes, ranked according to their overall contribution to the accuracy of the attribute set with respect to the target learning algorithm [3].

Table 1
HFWFS algorithms characteristics

Algorithm	Type	Execution time	Search criteria
3.1	Filter	Fast	Threshold
3.2	Combined filters	Fast	Threshold
3.3	Wrapper	Slow	Classifier performance

3.2 HFWFS execution process

The proposed HFWFS method for feature selection has two primary objectives. The first is to identify the most irrelevant features that may properly improve the classification accuracy and reduce the computational complexity of the wrapper method; the second is to identify a small set of features with minimum redundancy, which can reduce the computational cost of the modeling process without reducing the classification accuracy.

In the initial step, we applied Chi-square to rank the feature set of the original dataset the score is averaged as given by Eq. (1). Features above the threshold were considered as the most critical features, which were selected to create a new subset for the second filter, the Algorithm 1 details the tree steps of the Chi-square filter. In the second step, Info Gain and Gain Ratio filter methods were used separately to rank the resulting output of the Chi-square and to select those candidate features that were more information rich.The selected features of the two filters IG and GR were combined. Algorithm 2 describes the combined model. The relevance of the feature in a subset within the combined model is defined as given in Eqs (4) and (5). Subsequently, we exploited the wrapper method using a greedy stepwise forward searching algorithm and machine learning model to identify a small set of features that might result in higher classification accuracy. Algorithm 3 describes the search and evaluation of a subset. Wrapper methods are not suitable in dealing with a dataset that contains an enormous number of features. The previous filters reduced the complexity of the search space; hence, wrappers may be applied with less computational effort. Figure 1 shows the proposed HFWFS method, and the HFWFS method is constructed through the algorithms presented below.

Algorithm 1: Filter Based Chi-square method

Step 1:
Let $X$ be the feature set in the DataSet, where $X=\{X_{1},X_{2},X_{3},\dots,X_{N}\}$ and $C_{i}$ represents the class (normal, DoS), $Ci=\{C1,C2\}$ . Where $k$ represents the threshold predefined of measurement. Step 2:
Rank and sort the features $X_{i}$ according to its importance in determining the Class $C_{i}$ .
Step 3:
Select features $X_{i}$ above the threshold $k$ .

Algorithm 2: Filter based IG and GR methods

Step 1:

For each filter method rank and sort the features $X_{i}$ selected from Algorithm 1 according to its importance in determining the output class $C_{i}$ Step 2:

Determine threshold $k$ for each method

Step 3:

For each method select $X_{I}$ above the Threshold $T$

Step 4:

Combine selected output features $X_{i}$ and of the two method( IG and GR).

Algorithm 3: Wrapper based GreedStepWise FS

Step 1:

Searching a subset of features. Step 2:

Evaluating the selected subset of features by the performance of the classifier.

Step 3:

Repeating Step 1 and Step 2 until the desired quality is reached.

Table 1 summarize the HFWFS algorithms and their characteristics and Fig. 3 shows the hybrid feature selection procedure. Three filter models were preliminarily selected in order to remove the most redundant or irrelevant features. Chi-square was initially used to reduce the dimensions of dataset, followed by Info Gain and Gain Ratio, which were used separately. The two resulted feature sets were combined as a preprocessed feature set for the next fine tuning step. In the second step, the wrapper model was applied to improve classification accuracy, while reducing dimensionality using the Random Tree classifier.

4. NSL-KDD and UNSW-NB15 datasets

Currently, there are only a few publicly available datasets for the evaluation of intrusion detection. Among these datasets, the NSL-KDD and UNSW-NB15 datasets have been commonly cited in the literature to assess the performance of the HFWFS feature selection method. NSL-KDD comprises a new revised version of KDD Cup 99 that has been proposed by Tavallaee et al. [41]. This dataset addresses some problems included in the KDD Cup 99 dataset, such as a vast number of redundant records in the KDD Cup 99 data. As in the case of the KDD Cup 99 dataset, each record in the NSL-KDD dataset is composed of 41 different quantitative and qualitative features. Despite all improvement over KDD Cup 99 dataset, the NSL-KDD dataset still suffers from several issues, such as the existence of large numbers of redundant records and its high complexity. Further, NSL-KDD does not represent actual existing networks. However, it may still be applied as an effective benchmark data set to assist researchers with comparing different intrusion detection methods. To overcome these limitations, Nour and Slay [27] created the UNSW-NB15 data set in the Cyber Range Lab of the Australian Centre for Cyber Security (ACCS), which represents a hybrid of modern realistic normal network activities and synthetic contemporary attack behaviors from network traffic.

4.1 NSL-KDD dataset

The NSL-KDD dataset, an improved version of KDDCUP 99 that has been widely cited in the literature for intrusion detection [37, 11, 14] was used to validate our proposed algorithm. NSL-KDD is a labelled benchmark dataset derived from KDDCUP 99 to improve its flaws. NSL-KDD is made up of 41 features that are labelled as either normal or attack, and NSL-KDD comprises both training and testing datasets. The attacks are grouped into four categories: DoS, Probe, U2R, and R2L. Table 2 describes the NSL-KDD feature dataset.

Table 2
NSL-KDD dataset features

No.	Data features	No,	Data features	No.	Data features
1	Duration	15	Su_attempted	29	Same_srv_rate
2	Protocol_type	16	Num_root	30	Diff_srv_rate
3	Service	17	Num_file_creations	31	Srv_diff_host_rate
4	Flag	18	Num_shells	32	Dst_host_count
5	Src_bytes	19	Num_access_files	33	Dst_host_srv_count
6	Dst_bytes	20	Num_outbound_cmds	34	Dst_host_same_srv_rate
7	Land	21	Is_host_login	35	Dst_host_diff_srv_rate
8	Wrong_fragment	22	Is_guest_login	36	Dst_host_same_src_port_rate
9	Urgent	23	Count	37	Dst_host_srv_diff_host_rate
10	Hot	24	Srv_count	38	Dst_host_serror_rate
11	Num_failed_logins	25	Serror_rate	39	Dst_host_srv_serror_rate
12	Logged_in	26	Srv_serror_rate	40	Dst_host_rerror_rate
13	Num_compromised	27	Rerror_rate	41	Dst_host_srv_rerror_rate
14	Root_shell	28	Srv_rerror_rate

Table 3

UNSW-NB15 dataset Features

No.	Data feature	No.	Data feature	No.	Data feature
1	Srcip	17	Spkts	33	Tcprtt
2	Sport	18	Dpkts	34	Synack
3	Dstip	19	Swin	35	Ackdat
4	Dsport	20	Dwin	36	Is_sm_ips_ports
5	Proto	21	Stcpb	37	Ct_state_ttl
6	State	22	Dtcpb	38	Ct_flw_http_mthd
7	Dur	23	Smeansz	39	Is_ftp_login
8	Sbytes	24	Dmeansz	40	Ct_ftp_cmd
9	Dbytes	25	Trans_depth	41	Ct_srv_src
10	Sttl	26	Res_bdy_len	42	Ct_srv_dst
11	Dttl	27	Sjit	43	43 Ct_dst_ltm
12	Sloss	28	Djit	44	Ct_src_ ltm
13	Dloss	29	Stime	45	Ct_src_dport_ltm
14	Service	30	Ltime	46	Ct_dst_sport_ltm
15	Sload	31	Sintpkt	47	Ct_dst_src_ltm
16	Dload	32	Dintpkt

4.2 UNSW-NB15 dataset

As previously mentioned , the UNSW-NB15 data set was created [27] at the Cyber Range Lab of the Australian Centre for Cyber Security (ACCS) using the AXIA Perfect Storm tool to create a hybrid of modern normal and abnormal network traffic. This data set includes nine families of attack: Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, and Worms. In addition, the UNSW-NB15 dataset is decomposed into training and testing sets and contains 49 features Table 3 [26]

5. Experimental results

In this section, we deployed our proposed HFWFS method to pre-process the dataset toward the selection of the most important features for the Random Tree classification algorithm, which classifies data as either normal or attack in cloud computing. Our analysis was carried out using Weka [44], which contains a collection of machine learning algorithms for data mining tasks. The parameters for classification in the experiments were set to the Weka default values. We employed the NSL-KDD and UNSW-NB15 datasets with the distribution presented in Table 4, in order to evaluate the performance of our HFWFS method and random tree classifier using train and test datasets. All experiments were performed on a 2.5 GHz Intel Core i5 CPU with 8 GB of RAM.

Table 4
Normal and DoS Distribution on NSL-KDD and UNSW-NB15

Dataset		Normal	DoS	Total
NSL-KDD	Training	54,027	33,981	88,008
	Testing	23,027	14,692	37,719
UNSW-NB15	Training	32,469	17,250	49,719
	Testing	14,031	7,278	21,309

Table 5

HFWFS applied to NSL-KDD

Algorithm	Feature selection method	No of selected features	Selected features
3.1.1	Chi-square	24	5,30,4,29,3,35,34,6,23,39,25,33,26,38,12,36,32,37,24,31,41,40
3.1.2.	Info gain	12	3,4,5,6,23,29,30,33,34,35,38,39
	Gain ratio	9	4,5,12,25,26,29,30,38,39
	Combination	15	3,4,5,6,12,23,25,26,29,30,33,34,35,38,39
3.1.3	Wrapper	9	3,4,5,6,23,30,33,34,39

Table 6

HFWFS applied to UNSW-NB15

Algorithm	Feature selection method	No of selected features	Selected features
3.1.1	Chi-square	24	7,27,2,12,1,8,9,28,16,32,4,13,6,17,11,5,14,25,24,20,18,15
3.1.2	Info Gain	10	1,2,7,8,9,12,16,27,28,5
	Gain Ratio	9	2,3,4,7,11,12,17,27,32
	Combination	14	1,2,4,5,7,8,9,11,12,16,17,27,28,32
3.1.3	Wrapper	9	2,3,4,5,7,8,12,27,28

5.1 Datasets preparation and pre-processing

We applied the HFWFS method for feature selection to determine the most relevant features of the NSL-KDD and UNSW-NB15 datasets. Following the application of our Algorithm 3.1.1 using the Chi-square filter method, we selected 22 of the most important features from the NSL-KDD and UNSW-NB15 datasets (Tables 5 and 6, respectively). Subsequently, we engaged Algorithm 3.1.2 by applying Info Gain and Gain Ratio and selected the nine most prevalent features for each method (Tables 5 and 6) after which we combined the results to obtain 14 relevant features. Finally, we applied the final portion of our Algorithm 3.1.3 using a wrapper method, which was based on a Greedy Step Wise search. We selected nine of the most important features from both NSL-KDD and UNSW-NB15 datasets (Tables 5 and 6). In order to compare our model with the filter methods used in our approach we selected nine of the most important features for each method. Tables 7 and 8 reveal the nine selected features from NSL-KDD and UNSW-NB15 datasets, respectively, using the HFWFS method, Chi-square, Info Gain, and Gain Ration. We employed these features as input for the training and testing of the Random Tree classification algorithm.

Table 7
Feature selection using filter methods on NSL-KDD dataset

Filter method	Features selected
Chi-squared	5, 30, 4, 29, 3, 35, 34, 6, 23
Gain ratio	4, 26, 25, 5, 39, 12, 38, 30, 29
Info gain	5, 30, 4, 29, 3, 6, 35, 34, 23
Domain knowledge [20]	1, 2, 3, 4, 23, 34, 38, 39, 40
Our method	3, 4, 5, 6, 23, 30, 33, 34, 39

Table 8

Feature selection using filter methods on UNSW-NB15 datase

Filter method	Features selected
Chi-squared	7, 27, 2, 12, 1, 8, 9, 28, 16
Gain ratio	2, 20, 11, 21, 22, 23, 4, 35, 32
Info gain	7, 27, 2, 12, 8, 1, 9, 28, 16
Our method	2, 3, 4, 5, 7, 8, 12, 27, 28

Figure 4.

Classification accuracy for feature selection methods.

5.2 Performance evaluation

Several experiments were conducted to evaluate the performance of the proposed HFWFS method. For this purpose, different metrics are applied, including accuracy and false rate positive (FAR). The measure of these metrics involved the quantification of: true positive (TP), as representing the actual number of attacks that were classified as attacks, true negative (TN), being the number of actual normal records classified as normal, false positive (FP), was the number of actual normal records classified as attacks, whereas false negative (FN) was the misclassification of a test sample as normal, when it was actually an attack. In this work, we compared the accuracy, false alarm rate, timeline for building and testing the model of our proposed HFWFS method with each filter method, and the full dataset feature using the Random Tree classification algorithm. Furthermore, we compared the time required to build the classification model, which was the duration of the classifier’s learning process after applying each feature selection method. Tables 9 and 10 present the results of the performance measure of the Random Tree classifier using both datasets with all features, with the nine features that were selected using our proposed HFWFS method.

5.2.1 Classification accuracy

Classification accuracy pertains to the number of correct predictions made divided by the total number of predictions made, multiplied by 100 to convert it into a percentage. This may be determined by:

$\displaystyle\textit{Accuracy}=\frac{(\textit{TP}+\textit{TN})}{(\textit{TP}+% \textit{TN}+\textit{FP}+\textit{FN})}*100$

Figure 4 depicts the classification accuracy across different filter feature selection methods and the HFWFS method. Our proposed method presents a slight increase in classification accuracy, by 0.05% and 0.04%, on the NSL-KDD and UNSW-NB15 datasets, respectively.

Table 9
Performance measure using NSL-KDD dataset

Filter method	Number of features	Accuracy	False alarm rate	Building time (s)	Testing time (s)
No FS	41	99.88	0.122	1.73	1.12
Chi-square	9	99.88	0.156	0.71	0.41
Gain ratio	9	99.83	0.183	0.77	0.43
Info gain	9	99.88	0.163	0.59	0.47
Our method	9	99.93	0.088	0.64	0.36

Table 10

Performance measure using UNSW-NB15 dataset

Filter method	Number of features	Accuracy	False alarm rate	Building time (s)	Execution time (s)
No FS	42	98.91	1.07	1.03	0.37
Chi-square	9	98.54	1.60	0.54	0.28
Gain ratio	9	94.14	6.96	0.69	0.34
Info gain	9	98.37	1.69	0.43	0.28
Our method	9	98.95	1.30	0.35	0.26

5.2.2 False alarm rate

False alarm is the amount of normal data that has been misclassified as an attack. This may be determined by:

$\displaystyle\textit{FAR}=\frac{\textit{FP}}{(\textit{FP}+\textit{TN})}*100$

Figure 5 illustrates the false alarm rate of the full feature set and different filter feature selection methods. The UNSW-NB15 dataset Gain Ratio produced the highest false alarm rate, while the full feature set had the lowest rate at 1.07%. Our method generated a false alarm rate of 1.3%. Our method presented the lowest rate with 0.088% when dealing with the NSL-KDD dataset.

Figure 5.

False alarm rate for feature selection methods.

5.2.3 Time to build and test model

Figure 6 presents the time required to build our model across different filter selection methods and the full feature set. The results indicated that the use of full features had the worst learning time at 1.73 s and 1.03 s, respectively, with NSL-KDD and UNSW-NB15, due to the important number of features that the classifier had to process. When compared with other filter methods, our proposed method presented the best time, at 0.64 s and 0.35 s.

Figure 6.

Time to buil the model for feature selection methods.

Figure 7 depicts the time required to test the model across different filter selection methods and the full feature set. Our proposed method showed the best time to test the model at 0.36 s with NSL-KDD, and 0.26 s with UNSW-NB15, and as expected, the full features sets presented the worst time to test the model.

5.3 Comparison study

In order to demonstrate the performance of the HFWFS, we compared our method using the Random Tree classifier with some state-of-the-art approaches that have been tested using the NSL-KDD dataset, and Table 11 reveals the resulting comparison.

Table 11
The comparison with other methods (using NSL-KDD)

Approaches	Classifier	No of features	Accuracy	Time to build model (s)
Linear correlation based [46]	C4.5	17	99.10%	12.02
CFS [18]	C4.5	NA	99.13%	NA
EMFFS [31]	J48	13	99.67%	0.78
MMIFS [6]	SVM	8	84.11%	19
FMIFS [5]	SVM	12	98.93%	10.06
Domain knowledge	Decision tree	9	99.90%	1.31
Our method (HFWFS)	Random tree	9	99.93%	0.64

Figure 7.

Time to test the model for feature selection methods.

With regard to the performance measure in Tables 9 and 10 using NSL-KDD and UNSW-NB15 dataset respectively, the comparison of results shows that our method appears to have the best accuracy, execution and test time and less False Rate Alarm. Table 11 compares our proposed method with other existing feature selection methods on the NSL-KDD dataset. The results show advantage of our method in classification accuracy and execution time. MMIFS proposed in [6] reduced features to 8 with a low classification accuracy of 84.11% and high execution time of 10 s when compared to our method [20] used domain knowledge with the practical significance and the feasibility of each feature for selecting it. Thus, from the total 41 features on NSL-KDD, they selected 9 features for DoS layer. It was notable that our hybrid feature selection, with 99.93% classification accuracy and 0.64 second to train the model, performed better than feature selection based on domain knowledge and other model presented in Table 11. However, all these results suggests that HFWFS feature selection approach using Random Tree classifier is a feasible scheme for building reliable intrusion detection systems for DoS attack.

5.4 Discussion

Real-time and efficacious DDoS attack detection systems in cloud computing have become a necessity, which has translated to the increased complexity of detection techniques. Filter and wrapper methods for feature selection present several advantages and weaknesses when identifying important features. In this study, we proposed a new feature selection mechanism which utilized the advantages of both filters and wrappers. Our HFWFS method greatly decreased the number of features, from more than 40 features for both datasets to nine. Further, the accuracy, as well as the time to build and test the model was improved. We initially compared our method with the full set, and with nine selected features using other filter methods. The results indicated that the classification accuracy obtained using our method was an improvement over other filter methods for both NSL-KDD and UNSW-NB15 datasets. Additionally, our method presented the best time to build and to test the model, which made our HFWFS an effective hybrid feature selection method with less complexity. As shown in Table 11 we conclude that the use of the nine features selected using our method in conjunction with the Random tree classifier provided the best results in terms of classification accuracy and the time required to build the model, in contrast to the other approaches presented in Table 11.

6. Conclusion

In this paper we proposed and tested a hybrid feature selection mechanism to address the massive complexity of DDoS attack detection in cloud computing. The idea was to initially utilize the efficiency of filters, followed by the accuracy of wrappers. Chi-square, InfoGain, and GainRatio were first employed to remove the most irrelevant and redundant features toward the formation of a reduced subset. A wrapper method was then applied to improve the classification accuracy and to identify a small set of features. The performance of the hybrid mechanism was evaluated using two different datasets (NSL-KDD and UNSW-NB15). Experimental results showed that the proposed approach, on one hand, was more efficient than individual filter feature selection methods, and on the other hand, identified fewer features than other proposed feature selection methods, while the overall efficacy was improved.

References

Tesfahun

and Bhaskari

, Intrusion Detection using Random Forests Classifier with SMOTE and Feature Reduction, In Proceedings of the International Conference on Cloud & Ubiquitous Computing & Emerging Technologies (CUBE), IEEE, Pune, 2013, pp. 127–132.

Abraham

Jain

Thomas

and Han

S.Y.

, D-scids: Distributed soft computing intrusion detection system, Journal of Network and Computer Applications 30(1) (2007), 81–98.

Afzal

and Torkar

, Towards benchmarking feature subset selection methods for software fault prediction, Computational Intelligence and Quantitative Software Engineering, Springer International Publishing, 2016, pp. 33–58.

Mohamed

Grundy

and Müller

, An analysis of the cloud computing security problem, arXiv preprint arXiv:1609.01107 2016.

Mohammed

et al., Building an intrusion detection system using a filter-based feature selection algorithm, IEEE Transactions on Computers 65(10) (2016), 2986–2998.

Fatemeh

et al., Mutual information-based feature selection for intrusion detection systems, Journal of Network and Computer Applications 34(4) (2011), 1184–1199.

Agarwal

and Mittal

, Optimal feature selection for sentiment analysis, Proceedings of 14th International Conference on Computational Linguistics and Intelligent Text Processing CICLing, Springer, Samos, Greece, 2013, pp. 13–24.

Chang

Yen-Hung

and Muthu

, Cloud computing adoption framework: A security framework for business clouds, Future Generation Computer Systems 57 (2016), 24–41.

Quick

and Choo

K.-K.R.

, Impacts of increasing volume of digital forensic data: a survey and future research challenges, Digit Investig 11(4) (2014), 273–294.

10.

Data mining software in Java. http://www.cs.waikato.ac.nz/ml/weka/Last accessed 7 May 2017.

11.

Zhang

and Wang

, An effective feature selection approach for network intrusion detection, In Eighth IEEE International Conference on Networking, Architecture and Storage (NAS), 2013, pp. 307–311.

12.

et al., Multi-agent intrusion detection system using feature selection approach, Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP), 2014 Tenth International Conference on, IEEE, 2014.

13.

Eid

Hassanien

Kim

and Banerjee

, Linear correlation-based feature selection for network intrusion detection model, In Proceedings of the 1st International Conference on Advances in Security of Information and Communication Networks (SecNet). Springer, Cairo, 2013, pp. 240–248.

14.

Ibrahim

Badr

and Shaheen

, Adaptive layered approach using machine learning techniques with gain ratio for intrusion detection systems, Int J Comput Appl 56(7) (2012), 10–16.

15.

Hui-Huang

Hsieh

C.-W.

and Lu

M.-D.

, Hybrid feature selection by combining filters and wrappers, Expert Systems with Applications 38(7) (2011), 8144–8150.

16.

Idhammad

Afdel

and Belouch

, DoS Detection Method based on Artificial Neural Networks, International Journal of Advanced Computer Science and Applications, 2017.

17.

Peng

Choo

K.-K.R.

and Ashman

, Bit-level n-gram based forensic authorship analysis on social media: Identifying individuals from linguistic profiles, J Netw Comput Appl, Elsevier, 2016 in press.

18.

Kang

Park

Bang

and Kang

, An in-depth analysis on traffic flooding attacks detection and system using data mining techniques, J Syst Architect 59(10) (2013), 1005–1012.

19.

and Shelton

C.R.

, Intrusion detection using continuous time bayesian networks, J Artif Intell Res, 2010.

20.

Gupta

Nath

and Kotagiri

, Layered approach using conditional random fields for intrusion detection, IEEE Transactions on Dependable and Secure Computing 7(1) (2010), 35.

21.

Krämer

Krupp

Makita

Nishizoe

Koide

Yoshioka

and Rossow

, AmpPot: monitoring and defending against amplification DDoS attacks, In Proceedings of 18th International Symposium on Research in Attacks Intrusion and Defenses (RAID), Springer, Kyoto, 2015.

22.

Koc

Mazzuchi

and Sarkani

, A network intrusion detection system based on a Hidden Naïve Bayes multiclass classifier, Expert Syst Appl 39(18) (2012), 13492–13500.

23.

Zhao

Chen

Ranjan

Choo

K.-K.R.

and He

, Geographical information system parallelization for spatial big data processing: A review, Cluster Comput, Springer, 2015 in press.

24.

and Liu

, Feature selection for high-dimensional data: A fast correlation-based filter solution, In Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003). Springer, Washington DC, 2003, pp. 856–863.

25.

Ficco

and Rak

, Stealthy denial of service strategy in cloud computing, IEEE Trans Cloud Comput 3(1) (2015), 80–94.

26.

Nour

and Slay

, The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set, Information Security Journal: A Global Perspective 25(1-3) (2016), 18–31.

27.

Nour

and Slay

, UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set), Military Communications and Information Systems Conference (MilCIS), 2015. IEEE, 2015.

28.

Acharya

and Singh

, An IWD-based feature selection method for intrusion detection system, Soft Computing (2017), 1–10.

29.

Nissim

Moskovitch

Rokach

and Elovici

, Detecting unknown computer worm activity via support vector machines and active learning, Pattern Anal Appl 15(4) (2012), 459–475.

30.

Osanaiye

Choo

K.-K.R.

and Dlodlo

, Distributed denial of service (DDoS) resilience in cloud: review and conceptual cloud DDoS mitigation framework, J Netw Comput Appl 67 (2016), 147–65.

31.

Osanaiye

Cai

Choo

K.K.

Dehghantanha

and Dlodlo

, Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing, EURASIP Journal on Wireless Communications and Networking, No. 1, 2016, pp. 1–10.

32.

Opeyemi

et al., Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing, EURASIP Journal on Wireless Communications and Networking, 2016.

33.

Bermejo

de la Ossa

Gámez

and Puerta

, Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking, KnowlBased Syst 25(1) (2012), 35–44.

34.

Bhattacharya

and Selvakumar

, Multi-measure multi-weight ranking approach for the identification of the network features for the detection of DoS and Probe attacks, Compt J (2015), 1–21.

35.

Chebrolu

Abraham

and Thomas

J.P.

, Feature deduction and ensemble design of intrusion detection systems, Computers & Security 24(4) (2005), 295–307.

36.

Mukkamala

and Sung

A.H.

, Significant feature selection using computational intelligent techniques for intrusion detection, Advanced Methods for Knowledge Discovery from Complex Data, Springer, 2005, pp. 285–306.

37.

Lin

Ying

Lee

and Lee

, An intelligent algorithm with feature selection and decision rules applied to anomaly intrusion detection, Appl Soft Comput 12(10) (2012), 3285-3290.

38.

Rastegari

Hingston

Lam

, Evolving statistical rulesets for network intrusion detection, Appl Soft Comput 33 (2015), 348-359.

39.

Sindhu

Geetha

Kannan

, Decision tree based light weight intrusion detection using a wrapper approach, Expert Syst Appl 39(1) (2012), 129-141.

40.

Zargar

Joshi

Tipper

, A survey of defense mechanisms against distributed denial of service (DDoS) flooding attacks, IEEE Commun Surv, Tutorials 2013.

41.

Mahbod

et al., A detailed analysis of the KDD CUP 99 data set, Computational Intelligence for Security and Defense Applications, 2009. CISDA 2009. IEEE Symposium on. IEEE, 2009.

42.

Cover

T.M.

and Thomas

J.A.

, Elements of information theory, John Wiley & Sons, 2012.

43.

Peng

Leckie

and Ramamohanarao

, Survey of network-based defense mechanisms countering the dos and DDoS problems, ACM omput Surv (1) 2007.

44.

Bolon-Canedo

Sanchez-Marono

and Alonso-Betanzos

, Feature selection and classification in multiple class datasets: an application to KDD Cup 99 dataset, Expert Syst Appl 38(5) (2011), 5947–5957.

45.

Wang

et al., DDoS attack protection in the era of cloud computing and software-defined networking, Computer Networks 81 (2015), 308–319.

46.

Huang

Zhang

Sun

et al., An Anomaly Detection Method Based on Normalized Mutual Information Feature Selection and Quantum Wavelet Neural Network, Wireless Personal Communications 96(2) (2017), 2693–2713.

47.

Wang

and Gombault

, Efficient detection of DDoS attacks with important attributes, In Proceedings of the 3rd International conference on Risks and Security of Internet and Systems (CRiSIS’08), IEEE, Tozeur, 2008, pp. 61–67.

48.

Wang

Liu

and Gombault

, Constructing important features from massive network traffic for lightweight intrusion detection, IET Inform Secur 9(6) (2015), 374–379.

49.

Chen

Cheng

and Guo

, Proceedings of the 2nd SKLOIS Conference Information Security and Cryptology (INSCRYPT), in Survey and taxonomy of feature selection algorithms in intrusion detection system (Springer), Beijing, 2006, pp. 153–167.

50.

Chen

Abraham

and Yang

, Feature selection and classification flexible neural tree, Neurocomputing 70(1) (2006), 305–313.

51.

Xia

Zhang

Yan

and Dai

, An efficient intrusion detection system based on support vector machines and gradually feature removal method, Expert Syst Appl 39(1) (2012), 424–430.

52.

Yang

et al., A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization, Information Processing & Management 48(4) (2012), 741–754.

53.

Baig

Sait

and Shaheen

, GMDH-based networks for intelligent intrusion detection, Eng Appl Artif Intel 26(7) (2013), 1731–1740.

54.

Zhang

Sugumaran

Choo

K.-K.R.

Mei

and Zhu

, Participatory sensing-based semantic and spatial analysis of urban emergency events using mobile social media, EURASIP J Wirel Commun Netw 1 (2016), 1–9.

55.

Liu

Yen

Mei

Luo

Wei

and Hu

, Crowdsourcing based description of urban emergency events using social media big data, IEEE Trans Cloud Comput, 2016.

A hybrid filter-wrapper feature selection method for DDoS detection in cloud computing

Abstract

Keywords

1. Introduction

3. A hybrid feature selection method HFWFS

3.1 Filters vs wrappers

Table 1 HFWFS algorithms characteristics

4.1 NSL-KDD dataset

Table 2 NSL-KDD dataset features

5. Experimental results

Table 4 Normal and DoS Distribution on NSL-KDD and UNSW-NB15

Table 7 Feature selection using filter methods on NSL-KDD dataset

5.2.1 Classification accuracy

Table 9 Performance measure using NSL-KDD dataset

Table 11 The comparison with other methods (using NSL-KDD)

6. Conclusion

References

Table 1
HFWFS algorithms characteristics

Table 2
NSL-KDD dataset features

Table 4
Normal and DoS Distribution on NSL-KDD and UNSW-NB15

Table 7
Feature selection using filter methods on NSL-KDD dataset

Table 9
Performance measure using NSL-KDD dataset

Table 11
The comparison with other methods (using NSL-KDD)