Abstract
The aim of feature selection (FS) is to select a small subset of most important and discriminative features. Many FS approaches based on rough set theory up to now, have employed reduct analysis using feature dependency measures. However the critical shortcoming for such approaches is that they are not able to manage useful information that may be destroyed by noise elements. Therefore several extensions to the original theory have been proposed. Three notable extensions are fuzzy rough set (FRS), variable precision rough set (VPRS), and tolerance rough set model (TRSM). Although successful, each of the extensions exhibits a critical shortcoming which makes that extension inapplicable in most of scenarios. For example, FRS is able to describe the existing dependencies between different attributes accurately, but its high run-times makes it inapplicable to larger datasets. As another example, VPR is very fast, but requires more information than contained within the data itself, which is inaccessible for most of the applications. This paper examines a rough set FS technique which uses a noise resistant dependency measure to quantify information that may be hidden due to the noise elements. Experimental results demonstrate that the use of this measure can result more discriminative reducts than those obtained using other RSFS approaches. Moreover, the proposed measure is as fast as VPRS and as accurate as FRS and TRSM, while it need no additional information other than contained within the data.
Introduction
Although larger feature set gives more information about a concept, we need to keep the feature set as small as possible to reduce computational complexity, gain good generalization performance and increase accuracy [9, 30]. In the feature selection problems, a set of feature vectors with n input features and one or more output features, is given and the task is to select a small subset of most important and discriminative input features.
On feature selection, a specific theoretical framework is Pawlak’s rough set model [25]. This model tries to express vagueness by means of positive and boundary regions of a set. The main advantage of this theory is that it requires no human input or domain knowledge other than the given data set [24]. This property makes the rough sets a powerful mathematical tool for analyzing various types of data [4, 35].
Most existing classical rough-set-based feature selection (RSFS) approaches [10, 36] rely on reduct analysis. A reduct is a sufficient subset of input features that fully characterizes the positive region structure generated by the full feature set. Although successful, these positive region-based approaches has three shortcomings which make them ineffective in real world applications; Firstly, they only operate effectively with datasets containing discrete values and therefore it is necessary to perform a discretization step for real-valued attributes. Secondly, these methods examine only the information contained within the lower approximation of a set and ignore the information contained in the boundary region. Finally, These methods are highly sensitive to noisy data and useful information that may be destroyed by noise elements are all ignored by classical RSFS methods [15, 24].
Motivated by these shortcomings, several extensions to the original theory have been proposed. This extensions can be divided into two categories: Extensions which focus on the modification of the equivalence relation, such as tolerance rough sets model (TRSM) [28] and fuzzy rough sets (FRS) [6, 15]. TRSM uses a similarity relation instead of indiscernibility relation to relax the crisp manner of classical rough sets theory. FRS, on the other hand, uses fuzzy equivalence classes generated by a fuzzy similarity relation to represent vagueness in data. While these extensions are able to handle the classical rough set shortcomings, they needs considerable computational time to apply their modifications. Therefore these extensions are not computationally efficient in large datasets. Extensions which consider manipulation of the subset operator used in calculating rough set approximations. The notable extension in this group is variable precision rough sets (VPRS) [37]. VPRS attempts to overcome the classical rough set shortcomings by generalizing the standard set inclusion relation. While this extension is fast, it requires more information than contained within the data itself. This is contrary to the rough sets theory consideration of operating with no domain knowledge.
In addition to rough sets extensions, there are also some classical rough set based feature selection approaches which consider the boundary region information. In [12, 18] the upper approximations are examined instead of lower approximations, to define feature selection criterions. The approach proposed in [24] examines both the information in the lower approximation and the information contained in the boundary region for the selection of feature subsets. While all this methods are successful in dealing with real-valued datasets and using information contained in the boundary region, they are not designed to deal with noisy datasets. Noise is an integral part of many real world datasets, which could hide important information about the output features. This paper presents a method which can handle noisy data. The proposed method uses a set impurity measure to quantify information that may be hidden due to the noise elements. This measure combined with the proximity measure adopted in [24], constitute a noise resistant dependency measure which enables the RSFS approaches to deal with real-valued datasets, use information contained in the boundary region and control possible noise elements.
In summary, the unique contributions that distinguish the proposed work from existing approaches are threefold: 1) Our work advances the classical RSFS one step further for handling noisy data; 2) a novel feature selection criterion is proposed which needs no human input or domain knowledge; and 3) a new feature selection algorithm is proposed, with extensive comparisons and experimental studies to prove its accuracy and speed.
The remainder of the paper is organized as follows: Section 2 summarizes the theoretical background of RSFS along with a look at the rough set extensions and QUICKREDUCT algorithm. Section 3 presents the main contribution of the new approach with a description of the noise resistant measure for RSFS. Section 4 reports experimental results and Section 6 concludes the paper.
Rough sets
The rough sets theory is introduced by Pawlak [25] to express vagueness by means of boundary region of a set. The main advantage of this implementation of vagueness is that it requires no human input or domain knowledge other than the given data set [24, 31, 24, 31]. This section describes the fundamentals of thetheory.
Information system and indiscernibility
An information system is a pair IS = (U, F), where U is a non-empty finite set of objects called universe and F is a non-empty finite set of features such that f : U → V f , for every f ∈ F. The set V f is called the value set or domain of f. Information system in rough sets theory is analogous with data set in unsupervised machine learning and classification tasks. A decision system is an information system of the form IS = (U, F, d), where d is called the decision feature. data set in a supervised classification and learning can be seen as a decision system, where instances are the objects of universe, features are the elements of F and labels represent decision feature values.
For any set B ⊆ F ∪ {d}, we define the B-indiscernibility relation as:
If (x, y) belongs to IND IS (B), x and y are indiscernible according to the feature subset B. Equivalence classes of the relation IND IS (B) are denoted [x] B and referred to as B-elementary sets. The partitioning of U to B-elementary subsets is denoted U/IND IS (B) or simply U/B. Generating such a partitioning is a common computational routine, that effects the performance of any rough set based operation. Algorithm 1 represents a general procedure PARTITION to compute U/B.
The time complexity of PARTITION is Θ (|B||P||U|), where |P| is the number of generated B-elementary subsets. If none of the objects in U are indiscernible according to B, the number of B-elementary subsets is |U| and therefore the worst-case complexity of PARTITION is O (|B||U|2).
The fundamental notions of rough sets are lower and upper approximations of sets. let B ⊆ F and X ⊆ U, the B-lower and B-upper approximations of X are defined as follow:
The
By the definition of
While all the patterns in the positive region could be certainly included in the set, the boundary region consists of those patterns that can neither be ruled in nor ruled out as members of the set. Moreover, all the patterns in the negative region could be certainly excluded from the set.
Discovering dependencies between attributes is an important issue in data analysis. Let D and C be subsets of F ∪ {d}. It is said that D depends on C in a degree k (0 ≤ k ≤ 1), denoted C ⇒
k
D, if
A reduct R is a minimal set of features R ⊆ C, such that γ (R, d) = γ (C, d) [15, 17]. An optimal reduct is a reduct with minimum cardinality. Finding a minimal reduct is NP-hard [15], because all the possible subsets of conditional features must be generated to retrieve such a reduct. Therefore finding a near optimal reduct has generated much of interest [14, 17]. Figure 2 represents the steps of QUICKREDUCT algorithm [14], which searches for a minimal subset without exhaustively generating all possiblesubsets.
Rough set extensions
Several efforts has been made to make close the attribute reduction concept in rough sets theory and feature selection in machine learning and classification tasks. However traditional rough set based attribute reduction (RSAR) has three shortcomings which make it ineffective in real world applications [15, 24]; First, it only operates effectively with datasets containing discrete values and therefore it is necessary to perform a discretization step for real-valued attributes. Second, RSAR is highly sensitive to noisy data. Finally, RSAR methods examine only the information contained within the lower approximation of a set and ignore the information contained in the boundary region.
Therefore several extensions to the original theory have been proposed to overcome such shortcomings. Four notable extensions are variable precision rough sets (VPRS) [37], tolerance rough set model (TRSM) [28], fuzzy rough sets (FRS) [6, 15] and an extension to dependency measure proposed in [24].
VPRS [37] attempts to overcome the traditional rough sets shortcomings by generalizing the standard set inclusion relation (⊆). In the generalized inclusion relation, a set X is considered to be a subset of Y if the proportion of elements in X which are not in Y is less than a predefined threshold. However, the introduction of a suitable threshold requires more information than contained within the data itself. This is contrary to the rough sets theory consideration of operating with no domain knowledge.
TRSM [28] uses a similarity relation instead of indiscernibility relation to relax the crisp manner of classical rough sets theory. As equivalence classes (elementary sets) in classical rough sets, tolerance classes are generated using similarity relation in TRSM, which are used to define lower and upper approximations. TRSM has two deficiencies; First, it needs a tolerance threshold to generate tolerance classes, which like VPRS this threshold is human defined. Second, the time complexity of generating all tolerance classes, using attribute subset B, is Θ (|B||U|2), which is equal to worst-case time complexity of PARTITION algorithm.
FRS [6, 15] uses fuzzy equivalence classes generated by a fuzzy similarity relation to represent vagueness in data. Fuzzy Lower and upper approximations are generated based on fuzzy equivalence classes. These approximations are extended versions of their crisp notions in classical rough sets, except that in the fuzzy approximations, elements may have membership degree in the range [0, 1]. FRS needs no extra knowledge to define operations on a given dataset, however as tolerance classes in TRSM, generating fuzzy equivalence classes in FRS is an expensive routine (Θ (|B||U|2)).
In addition to rough sets extensions, there are also some modifications, which does not change classical rough sets principals. [24] redefines the dependency notion in classical rough sets to deal with useful information that may be contained in the boundary region. Unlike the other three extensions, this extension does not redefine the lower and upper approximations in classical rough sets, therefore it needs no human input knowledge to deal with available data. Moreover the PARTITION algorithm may be applied directly, that is, generating equivalence classes in this extension is more efficient that generating tolerance classes and fuzzy equivalence classes in TRSM and FRS, respectively. Since this extension has been used as a part of our work, it is explained more extensive in the following subsection.
Useful information in the boundary region
Almost all the classical rough set based attribute reduction methods use only the information contained in the positive region. However the boundary region may also contain useful information that are ignored in this methods [24]. Such scenario is common in real-valued datasets, where some adjacent values may placed in different regions because of crisp manner of classical rough sets. Measuring the proximity of objects in the boundary region to the objects in positive region could help to qualify the information contained in boundary region. The method proposed in [24] uses a distance metric to calculate such proximities.
Let X be a set of objects and B a subset of attributes. The mean positive region, which is the mean of all object attribute values in POS
B
(X), is defined as
The proximity of any object y ∈ BND
B
(X) from the mean positive region is defined as
The proximity of the boundary region to the positive region is defined as
This proximity measure combined with rough-set dependency value create a new evaluation measure M as
The DMQUICKREDUCT (distance metric assisted QUICKREDUCT), a version of QUICKREDUCT that uses the new measure to guide the attribute reduction process, is presented in Algorithm 3.
It is important to point out that the value of M is used as an attribute selection criterion, not as a termination criterion. At each iteration, the algorithm adds a conditional feature that causes maximum increase in the value of M for the selected subset. The algorithm terminates when the dependency value of selected subset (γ (R, d)) reaches that of the unreduced data set.
Using classical dependency combined with proximity of boundary and positive regions enables the attribute reduction algorithm to deal with real-valued datasets. However this measure is insufficient to control noisy data, specially in non real-valued datasets. For example suppose two binary attributes a and b (Table 1), where a is 0 for all first half instances and 1 for all second half and b is similar to a, except that a value from first half is exchanged with a value from second half. Obviously b is a noisy version of a. Let C = {a} and D = {b} then
Therefore γ (C, D) =0 based on (7), which implies that the two attribute sets are independent from the classical rough sets view point. Moreover they have no dependency even if we use (12), because ω (C, D) =0 based on (10).
An example table
An example table
The problem comes from the crisp manner of the inclusion relation in defining lower approximations. Suppose for example the case
The approach described in this section uses a noise resistant dependency measure to search for reducts. The NRRSAR method uses the information that may be hidden due to noise elements and assigns a significance value to this information.
The noise resistant measure attempts to qualify the information that may be unseen due to the crisp manner of the inclusion relation in defining lower approximations. This measure uses an impurity rate value to calculate the noisy portion of a set. Let A and B be two sets. The impurity rate of A with respect to B can be defined as follow:
This value calculates the portion of the elements that should be eliminated from A to make it totally included in B. It is important to note that if c (A, B) >0.5, the impurity of A with respect to B is more than its impurity with respect to
This formulation can be applied to elementary sets to extract information that may be unseen in calculating lower approximations. To do this, a noise measure function, φ, is defined as
This function quantifies the possibility of transferring some objects from boundary to the positive region of a set, if the noisy elements could be removed.
Let C and D be two attribute sets. The noisy dependency of D on C can be defined as follow:
Suppose the example in Table 1. The noisy dependency of D = {b} on C = {a} can be calculated as
The noisy dependency operates on boundary region as proximity measure (10). However the proximity measure considers each point in the boundary region separately and calculates its distance from the positive region, while the noisy dependency considers subsets of objects to measure their transmission possibility to the positive region. Therefore the two values are combined to create a new measure for evaluating boundary region as
This new measure can be used alongside the classical dependency. As one measure only operates on the objects in boundary region and the other only on the objects in positive region, the two operators are combined to create a noise resistant evaluation measure ρ:
For the example in Table 1,
The proposed noise resistant measure can be used as an attribute selection criterion in rough set based attribute reduction algorithms. Algorithm 4 shows NRQUICKREDUCT algorithm, which is similar to QUICKREDUCT and DMQUICKREDUCT algorithms, but uses the proposed measure to guide the attribute reduction process.
The algorithm starts with an empty selected attribute set R. At each iteration, an attribute is added to R from non-selected attributes. This attribute is selected such that its addition causes maximum increase in the value of the noise resistant measure. This process continues until the dependency value of the selected attributes (γ (R, d)) equals the dependency of complete conditional attribute set (γ (C, d)). It is important to point out that as DMQUICKREDUCT, the new measure is used as an attribute selection criterion, not as a termination criterion.
Experimental results
In this section, we provide several experimental results to illustrate the performance of the proposed measure. To do this, NRRSAR is compared with two types of algorithms 1) classical rough set-based attribute reduction algorithms, and 2) attribute reduction algorithms based on rough sets extensions. Results are presented in terms of the selected subset size (compactness), the time to locate the subset (run-time) in seconds, the classification accuracy and the robustness against different types of noises.
Table 2 summarizes the 13 data sets, used in our experiments. The data sets range from 6 to 100 attributes, and 32 to 4177 objects. The monk3, abalone, tic-tac-toe, wine, credit, zoo, dermatology, ionosphere, soybean-small, heart, and lung-cancer datasets are from the UCI machine learning repository [1] and the tm1, and tm2, two synthetic problems called threshold max, are from [26]. If available, we have used the original test set as evaluating dataset, and for the remaining data sets, 10-fold cross-validation is used.
Summary of the Benchmark Datasets (C: categorical, R: real-valued, I: integer)
Summary of the Benchmark Datasets (C: categorical, R: real-valued, I: integer)
All the experiments are carried out on a DELL workstation with Windows 7, 2 GB memory, and 2.4 GHz CPU. Four classifiers are employed for the classification of the data, J48 [27, 33], JRip [3], Naive Bayes [5], and kernel SVM with RBF kernel function [2]. To set the classifier-related parameters, such as the C and γ parameters of the kernel SVM, we used the 5-fold cross validation.
Here, the performance of the proposed measure, is compared with RSAR and DMRSAR [24]. In this regard, NRQUICKREDUCT algorithm is compared with QUICKREDUCT and DMQUICKREDUCT algorithms.
Table 3, report the selected subset sizes found by the three measures. As it can be seen, RSAR found smaller results in comparison to both DMRSAR and NRRSAR. This is reasonable, since the termination criterion and the objectives are the same in QUICKREDUCT iterations, while the DMQUICKREDUCT and NRQUICKREDUCT algorithms may find features, which do not necessarily increase the dependency function value (the stopping criterion). While the result obtained by NRRSAR is larger than that obtained by DMRSAR for dermatology, DMRSAR found subsets which are close to the original feature set in 2 cases (ionosphere and lung-cancer). This is an important result which demonstrates the unreliability of adopting DMRSAR on real valued data sets. Table 3 reports the compactness of the selected subsets using the three SBE algorithms on the large scale data sets. We can conclude that SBE-NRRSAR selects fewer features than the other two algorithms.
Comparison of Selected Subsets Size for RSAR, DMRSAR and NRRSAR
Comparison of Selected Subsets Size for RSAR, DMRSAR and NRRSAR
Table 4 reports the run-times of the three measures on small scale data sets. It is clear that the selected subset size affects the run-times, but considering those tests with equal selected subset sizes enables us to make a clear comparison between the methods. Comparing with RSAR and DMRSAR, the results show that there is only a marginal increase in runtime for the NRRSAR iterations. This means that, there is little overall difference in runtime between the proposed noise resistant dependency measure and the other two measures.
Comparison of Run Times for RSAR, DMRSAR and NRRSAR
Table 5 reports the classification results. As it can be seen, the proposed NRRSAR based algorithm (NRQUICKREDUCT) performs very well and shows increase in classification accuracies for most of the tests.
Comparison of Classification Results for RSAR, DMRSAR and NRRSAR, Using J48, JRip, Naive Bayes (NB), and SVM
Compared with RSAR (QUICKREDUCT) in terms of J48 classifier, the NRRSAR is superior in all tests. Moreover, NRRSAR shows an increase of up to 60 percent for lung-cancer and up to 30 percent for credit. Comparing the two algorithms in terms of JRip, we see the similar results to those discussed for J48. Using Naive Bayes, the RSAR shows slightly better results for the wine and heart datasets, but we can see that our proposed algorithm is superior in the other eleven datasets and it shows an increase of up to 40 percent for lung-cancer, up to 35 percent for soybean-small and lung-cancer, and up to 30 percent for abalone datasets. In terms of SVM classifier, NRRSAR lost the tests only for two cases wine and zoo, while it shows significant increases in classification results for most of the remaining tests.
Compared with DMRSAR (DMQUICKRED-UCT) in terms of J48, our proposed algorithm is superior in all tests and shows an increase of up to 55 percent for abalone, and up to 30 percent for credit, tm1 and tm2. Using JRip, NRRSAR is superior in all tests and we can see that this algorithm shows an increase of up to 55 percent for abalone, up to 45 percent for tm1 and up to 30 percent for credit and tm2. Although NRRSAR lost the test for wine in terms of Naive Bayes classifier, it shows an increase of up to 55 percent for abalone and up to 45 percent for tm2. Using SVM, we can see the similar results to those discussed for JRip. Using SVM, the DMRSAR shows slightly better results for three data sets wine, zoo and heart, but we can see that our proposed algorithm shows significant increases for ninecases.
Here, the performance of the proposed measure, is compared with three rough set extensions, VPRS, TRSM, and FRS.
Comparison with VPRS
VPRS [37] attempts to overcome the traditional rough sets shortcomings by generalizing the standard set inclusion relation (⊆). In the generalized inclusion relation, a set X is considered to be a subset of Y if the proportion of elements in X which are not in Y is less than a predefined threshold β.
To compare the proposed measure with VPRS, NRQUICKREDUCT is compared with a QUICKREDUCT algorithm witch employs the VPRS-based dependency function as feature selection criterion. In our experiments two different threshold values β = 0.1 and β = 0.2 are employed to define the generalized inclusion relation in VPRS-based algorithm.
Table 6 reports the selected subsets size and run times. Comparing with NRRSAR, VPRS found smaller subsets for some tests, however the results show a strong dependence of this method on the β-value. Although the ideal threshold value can be obtained by repeated experimentation for a given data set, this value will be biased to the used data set. Moreover, finding such a value will impose a large computational time to the overall feature selection process. There are few tests with equal selected subsets for NRRSAR and VPRS, but considering those few tests demonstrate that the two methods are comparable in terms of run-times for small scale data sets.
Comparison of Selected Subsets Size and Run Times for NRRSAR and VPRS
Comparison of Selected Subsets Size and Run Times for NRRSAR and VPRS
Table 7 reports the classification results using NRRSAR and VPRS. As it can be seen, NRQUICKREDUCT shows increases in classification accuracies for most of the tests. In terms of J48 classifier, NRQUICKREDUCT is superior for at least one β setting. Moreover this algorithm is superior in eleven cases considering both β settings. The proposed algorithm won the tests for ten, nine and eleven data sets using JRip, Naive Bayes and SVM classifiers, respectively.
Comparison of Classification Results for NRRSAR and VPRS, Using J48, JRip, Naive Bayes (NB), and SVM
TRSM [28] and FRS [6, 15] try to relax the crisp manner of the classical rough set theory by modifying the equivalence relation. TRSM uses a similarity relation instead of indiscernibility relation to define such a relaxation. FRS, on the other hand, uses fuzzy equivalence classes generated by a fuzzy similarity relation to represent vagueness in data.
Although different similarity measures can be adopted in TRSM, we have used a standard measure for this purpose given in [28]:
When the objects have more than one feature, the similarity relation is defined as:
Where P is the feature set and α is a predefined threshold. In our experiments, two different values of tolerance thresholds α = 0.9 and α = 0.95 are used for TRSM.
Generating all tolerance classes in TRSM as well as fuzzy equivalence classes in FRS is quadratic with respect to the number of objects, because, the similarities of all pairs of objects must be calculated. These complexity burdens cause high computational times for TRSM and FRS, even for small scale data sets. Therefore, the comparisons in this subsection are reported only for eight data sets monk3, wine, zoo, dermatology, ionosphere, soybean-small, heart, and lung-cancer. In this regard, NRQUICKREDUCT is compared with a TRSM-based and FRS-based QUICKREDUCT algorithms.
Table 8 reports the selected subsets size and run times for the eight data sets. Similar to VPRS, the optimal threshold value in TRSM is data driven and needs a pre-processing step for each data set. This is contrary to the rough sets theory consideration of operating with no domain knowledge. The results show that NRRSAR outperforms the FRS for two data sets (ionosphere and lung-cancer), while the two methods found same size subsets for the remaining tests. High computational times of FRS and TRSM, is the most prominent result from this table.
Comparison of Selected Subsets Size for NRRSAR, TRSM, and FRS
In order to strengthen the comparisons, we used a paired hypothesis t-test. Let A and B be two methods and d
A
, d
B
be the set of results obtained using methods A and B, respectively, for different executions. We define the following two one-tailed t-tests [7]:
Based on results of these tests, the variable t is defined as
In all the tests, A corresponds to our proposed NRRSAR method and B corresponds to one of the five methods RSAR, DMRSAR, VPRS, TRSM and FRS. The results are reported in Table 10.
Comparison of Classification Results for NRRSAR, TRSM and FRS, Using J48, JRip, Naive Bayes (NB), and SVM
Statistical Comparisons of NRRSAR with RSAR, DMRSAR, VPRS, TRSM and FRS
Here, we investigate the effect(s) of noise on selected subsets of different attribute reduction algorithms. In this regard, several levels of noises are added to the datasets and then the six feature selection algorithms are applied again on these datasets. In order to add a noise with probability K to a decision system D = (U, F, d), we changed f
i
(x) , i = 1, 2, …, |F|, x = 1, 2, …, |U|, with probability K, as follows If f
i
is categorical or integer-valued feature, with value set (domain) V
f
i
, then we replaced f
i
(x) with a randomly selected value from V
f
i
- {f
i
(x)}. If f
i
is a real-valued feature, then we first normalized f
i
(x) using the following formula:
For each algorithm A and decision system D, we define two functions
Where,
A: an attribute reduction algorithm,
D(n): the decision system D which is disrupted using noise,
For each data set, we generated 30 disrupted versions by different settings for K (the noise probability). Then, we applied the comparing feature selection algorithms on the original data sets and their corresponding disrupted versions. For each disrupted version, we recorded the Θ and Δ measures.
Tables 11 and 12 report the Θ and Δ values, respectively, recorded for the comparing algorithms. The results for each algorithm and data set are averaged over the 30 executions. As it can be seen from these tables, noises have least effect on results of the proposed NRRSAR algorithm, while the other algorithms are very noise-sensitive and their results are destroyed, even using small levels of noises.
Θ Results Recorded for NRRSAR, RSAR, DMRSAR, VPRS, TRSM and FRS
Δ Results Recorded for NRRSAR, RSAR, DMRSAR, VPRS, TRSM and FRS
Feature selection, as a pre-processing step, is to select a small subset of most important and discriminative input features. This paper investigated a new feature selection method, NRRSAR, which aims to meet the existing RS based feature selection methods of dealing with noisy data sets. In the experiments, our study has shown that, NRRSAR outperforms the existing RS based feature selection methods, RSAR and DMRSAR, in terms of classification accuracies. Moreover, the results show that there is only a marginal increase in runtime for the NRRSAR iterations, comparing it with RSAR andDMRSAR.
Comparison with a VPRS-based feature selection method has demonstrated that while this method may sometimes find smaller subsets, it requires an additional threshold value, which makes it an unreliable method for real world data sets. NRRSAR requires no such human input knowledge and relies only on the information in the data. Moreover, NRRSAR is able to find subsets with higher classification accuracies than those obtained by VPRS. Comparison of NRRSAR with a TRSM-based feature selection method has shown that the NRRSAR method is able to outperform TRSM in terms of classification accuracy and run times. In addition, as VPRS, the requirement to a thresholding value is a big disadvantage for the TRSM-based feature selection. Comparison with a FRS-based feature selection method has demonstrated that, while NRRSAR is very similar to FRS in terms of selected subset size and classification accuracy, the very faster run-times of NRRSAR makes it a preferable method.
Additional comparisons have shown that noises have least effect on results of the proposed NRRSAR algorithm, while the other five algorithms are very noise-sensitive and even small levels of noises have significant effects on their results.
