Feature Selection by Hybrid Brain Storm Optimization Algorithm for COVID-19 Classification

Abstract

A large number of features lead to very high-dimensional data. The feature selection method reduces the dimension of data, increases the performance of prediction, and reduces the computation time. Feature selection is the process of selecting the optimal set of input features from a given data set in order to reduce the noise in data and keep the relevant features. The optimal feature subset contains all useful and relevant features and excludes any irrelevant feature that allows machine learning models to understand better and differentiate efficiently the patterns in data sets. In this article, we propose a binary hybrid metaheuristic-based algorithm for selecting the optimal feature subset. Concretely, the brain storm optimization algorithm is hybridized by the firefly algorithm and adopted as a wrapper method for feature selection problems on classification data sets. The proposed algorithm is evaluated on 21 data sets and compared with 11 metaheuristic algorithms. In addition, the proposed method is adopted for the coronavirus disease data set. The obtained experimental results substantiate the robustness of the proposed hybrid algorithm. It efficiently reduces and selects the feature subset and at the same time results in higher classification accuracy than other methods in the literature.

1. INTRODUCTION

Over the past two decades, the dimensions of data highly increased due to the rapid growth in information science. However, many features in data sets contain redundant and not relevant information for the machine learning models in data mining, and it leads to slow computation and the performance is adversely affected. Thus, an efficient algorithm is required for reducing the high dimension of the data sets by removing features that are not providing useful information to the learning algorithm, and to keep only the features that are significant for interpretation, making the pattern in the data and maintaining the information quality in data (Van Der Maaten et al., 2009).

Feature selection is a preprocessing method in data mining, which reduces the dimensionality of data by removing the noise and irrelevant attributes and results with optimal feature subset. Two key concepts in feature selection are the criteria of evaluation and the strategy of selection. Considering the criteria of evaluation, there are three main types of feature selection approaches: wrapper-based method, filter-based method, and embedded methods. The filter-based methods, such as Gini Index, Information Gain, Relief, and Chi-Square, use statistical measures to assign scores to each feature, and based on the scores, it makes the ranking of the features and chooses a subset.

The wrapper-based method uses machine learning techniques to select the optimal attribute subset. Most commonly, the wrapper-based method is employed in feature selection processes, because this method results in better classification accuracy. The embedded methods are a combination of the filter and wrapper methods, they are composed of taking the advantages of wrapper-based and filter-based methods.

The total possible number of attribute subsets is $2^{n}$ in data sets with n number of features. As the dimension of data increases, the possible feature subset increases highly. Hence, the goal in the feature selection problem is to minimize the dimension and at the same time maximize the classification accuracy of the given data set, and it is considered as an optimization task. Optimization can be defined as a process of finding the best possible result for a specific problem, under some set of constraints. Practically, we can define all real-world problems as an optimization task; hence, optimization is a crucial paradigm in many distinct domains, such as scheduling, engineering design, networks, transportation, manufacturing, economics, finance, and others.

In feature selection problems, due to the exponential increase in the number of features, this kind of problem belongs to nonpolynomial-hard (NP-hard) optimization problems, where traditional exact optimization algorithms would fail. However, stochastic approximation algorithms, such as metaheuristic algorithms are very successful in tackling such problems. They provide an optimal or close optimal solution in a reasonable amount of time. In this study, we propose a hybrid metaheuristic-based approach to address the feature selection problem.

Metaheuristic algorithms are very successful in different optimization problems, such as architecture design in convolutional neural networks (Bacanin et al., 2020a,b; Bezdan et al., 2020a), connection weight optimization in artificial neural networks (Milosevic et al., 2021), task scheduling in cloud computing environment (Bacanin et al. 2019a,b; Bezdan et al., 2020b,c), and wireless sensor network lifetime optimization (Bacanin et al., 2019c; Zivkovic et al. 2020a,b).

The two main processes in any metaheuristic algorithms are intensification and diversification, often the algorithm has one of these two processes more enhanced: one of the approaches for making the right balance between these two phases is by hybridizing two or more algorithms. In this study, we hybridize the brain storm optimization (BSO) algorithm (Xue et al., 2012) by the firefly algorithm (FA) (Yang, 2009) to achieve a better trade-off between the exploration and exploitation and apply it for feature selection problem by using a wrapper-based method.

The main motivation behind the research presented within this article is to improve the classification accuracy by further enhancement of the feature selection task. Even the slight improvement of the accuracy on the medical data sets can be significant, because it can increase chances of successful treatment. Consequently, the scientific contributions of these articles are triple fold:

Improvement of the performances of the basic BSO algorithm by hybridizing it with the FA metaheuristics to address the known deficiencies of the original implementation.

Utilization of the novel hybridized algorithm as a wrapper method to improve the feature selection process and obtain better classification results on the benchmark data sets.

Application of the devised method on a novel coronavirus disease 2019 (COVID-19) data set.

The rest of the article is organized as follows: Section 2 gives an overview of the background and related studies, Section 3 describes the original BSO as well as the hybridized version with FA, Section 4 presents the results of the experiment, and Section 5 concludes and summarizes the article.

2. BACKGROUND AND RELATED STUDIES

Metaheuristic algorithms have successful applications in many areas. A gaining sharing knowledge (GSK)-based algorithm is introduced and applied for optimizing the electric distribution scheduling in Hassan et al. (2020). In Chen et al. (2020), swarm intelligence algorithm is applied for optimizing unmanned aerial vehicle-aided (UAV-aided) Internet of Things (IoT) data acquisition deployment. A new GSK-based method is utilized in Xiong et al. (2021) for parameter extraction of solar photovoltaic models. An improved binary differential evolution (DE) algorithm is developed and used for assessing candidate locations for solar energy stations in ElQuliti and Mohamed (2016).

Chaotic variant of the Harris hawks optimization metaheuristics, combined with the quasi-reflection-based learning concepts, was utilized to improve the convolutional neural network (CNN) design for medical images classification in Basha et al. (2021). Hybrid approach of artificial intelligence and beetle antennae algorithm was utilized to estimate the number of COVID-19 cases in Zivkovic et al. (2021).

Recently, many feature selection algorithms were proposed. However, according to No-Free-Lunch theorem (Wolpert and Macready, 1997), there is no single algorithm for solving all optimization problems. Metaheuristic algorithms have been employed successfully for feature selection (Agrawal et al., 2021a), however, there is still a need to develop a more robust method. GSK-based optimization algorithm is applied for feature selection in Agrawal et al. (2021b). In Agrawal et al. (2021c), chaotic GSK-based optimization algorithm is utilized for optimizing the number of features. The authors used a binary GSK-based optimization algorithm for feature selection in Agrawal et al. (2021d).

Clustering-based particle swarm optimization (PSO) feature selection was proposed in Zhang et al. (2021) and Song et al. (2021), targeting high-dimension data that contains missing values. The same problem was tackled with coevolutionary particle swarm feature selection in Song et al. (2020). Multiobjective PSO was utilized for feature selection with fuzzy cost in Hu et al. (2020), whereas Xue et al. (2019) used self-adaptive PSO for large scale feature selection in classifying tasks.

As already mentioned in the introduction section, we are proposing a hybrid approach as a wrapper-based method for feature selection. The well-known BSO algorithm is hybridized with the FA. BSO has a wide range of applications in different areas. Cao et al. (2015) proposed BSO with a new step size technique and DE strategy and employed it for artificial neural network training. BSO is employed for the optimization of wireless sensor network dynamic deployment in Chen et al. (2015). Different BSO solution clustering analyses are performed in Cheng et al. (2013). Tuba et al. (2019) uses BSO for classification problems.

3. PROPOSED METHOD

This section first outlines the motivation of the original BSO algorithm and its mathematical formulation, afterward, the proposed hybridized BSO is described.

3.1. Original BSO algorithm

In 2011, Shi (2011) developed a metaheuristic-based BSO algorithm. The algorithm is motivated by the brainstorming process of humans. Brainstorming is a creative way of solving a specific problem by a group of people. In the brainstorming process, several people share their ideas with each other related to the problem that should be solved, where any idea is acceptable and criticism is not allowed. In the end, from all suggested ideas, the best possible solution is selected. Similarly, in the BSO algorithm, initially, random solutions are generated; as in any other swarm intelligence algorithm, each solution is analogous to an idea in the brainstorming process. At every iteration the idea is modified, in other words, the solution's position is updated according to the previous knowledge.

At the beginning of the procedure, N ideas are generated randomly, then the N number of ideas are classified in m cluster by utilizing the K-means algorithm. Then each solution's fitness value is evaluated and the best solution is selected for cluster centroid. Next, the solution's position is updated according to the following formula: $x_{n e w}^{d} = x_{s e l e c t e d}^{d} + β \times N (μ, σ),$ (1)

where the d th dimension of selected idea (solution) is denoted by $x_{s e l e c t e d}^{d}$ , the updated position is indicated by $x_{n e w}^{d}$ , $N (μ, σ)$ indicates Gaussian random function, and the weighting factor is denoted by $β$ .

The value of $β$ is obtained by the following mathematical equation: $β = l o g s i g ((0.5 * M a x I t e r - t) ∕ k) * r (),$ (2)

where $l o g s i g ()$ denotes the logarithmic sigmoid function, MaxIter denotes the maximum number of iterations, and t refers to the iteration counter and indicates the current number of iterations. The value of k modifies the slope of the logarithmic sigmoid function, and r is a random number from the uniform distribution.

For more details about the BSO algorithm, you may refer to Shi (2011).

3.2. Hybrid BSO algorithm

BSO is a high-performing approach in many problems; however, its drawback is the slow convergence speed. At exploration, the algorithm is very good, however, its performance is poor at exploitation. In this study, we propose a hybridized BSO method to speed up the algorithm's convergence and achieve better results, as well as to make a better balance between exploration and exploitation. Algorithm hybridization blends the benefits of various algorithms, which results in the synergistic synthesis of fused algorithms. In some runs of the algorithm, when BSO finds the promising region of the search space early enough, it converges successfully as it has enough time for exploitation.

However, in other runs, when it hits the promising region of the search space later, it does not have enough iterations for the exploitation (i.e., already the weakest spot of the algorithm), and it is not able to converge. For enhancing the exploitation capability, we incorporated the search mechanism of the FA (Yang, 2009), which is famous for its intensification capability. Various studies have shown that the FA exploitation procedure is efficient and can be used in solving a wide variety of optimization problems. By hybridizing the BSO with FA search, the novel algorithm aims to take the best characteristics from both BSO (excellent exploration) and FA (superior exploitation), and to minimize the known deficiencies. The FA search mechanism is mathematically formulated as follows: $x_{i}^{t + 1} = x_{i}^{t} + β_{0} \cdot e^{- γ r_{i, j}^{2}} (x_{j}^{t} - x_{i}^{t}) + α^{t} (κ - 0.5),$ (3)

where the current solution is denoted by $x_{i}^{t}$ , and the newly updated position is denoted by $x_{i}^{t + 1}$ . $α$ , $β_{0}$ , and $γ$ are modifiable parameters, which can be adjusted to achieve better performance on a specific problem. The distance between two neighbor solutions is denoted by r, and $κ$ is a randomly generated number from the Gaussian distribution.

In the proposed hybrid approach, if the iteration counter t is even, the original BSO updating mechanism is utilized for solution modification, otherwise, if the iteration counter value is odd, the FA search mechanism is utilized for a position update.

Considering the problem for which the proposed method is applied, we adopted it for the binary selection problem. First, the sigmoid function is used to move the solutions’ values between 0 and 1, then the threshold value, which is set to 0.5 in this experiment, will be decided whether to be 0 or 1. If the value is less than the threshold values, its value will 0, otherwise, it will be 1. The binary conversion of the solution value is described in Equation (4). $x_{i} = \{\begin{matrix} 1, & i f S (x_{i}) > 0.5 \\ 0, & o t h e r w i s e, \end{matrix}$ (4)

the ith solution is denoted by x_i, $S ()$ represent the Sigmoid transfer function.

In this way, the solution representation in the feature selection problem is encoded by binary values. The total number of solutions is N and the dimension of the solutions corresponds to the number of features in a given data set. If the solution value is 0, it means that the corresponding feature is removed, in contrast, if its value is 1, the feature is selected for the classification model.

Algorithm 1 describes the pseudocode of the proposed binary BSO algorithm, and we named it binary BSO FA, in short bBSOFA.

4. SIMULATION RESULTS

This section first provides the experimental setup, the data sets used for evaluating the proposed bBSOFA's performance, and presents the obtained experimental results.

The dimension of solutions consists of equal number of features in a given data set. Initially, the solutions are generated randomly. In this study, mixed initialization methods are utilized where only about half of the features are selected randomly as was used in Xue et al. (2014). At the end of the algorithm's procedure, the Sigmoid function squeezes the solution's value between 0 and 1, afterward based on the threshold value, it will decide whether 0 or 1 will be assigned to the corresponding feature. If the solution's value is less than the threshold, which is set to 0.5, after applying the transfer function, the solution will be 0, otherwise, it will be 1.

The value of 1 indicates utilizing the feature in the evaluation, contrarily. If the value is 0, the feature is removed. The K-NN classifier is utilized for evaluation purposes, and it represents the objective function. The fitness function takes into account the number of features selected as well as the classification error rate, and it is formulated according to the following formula:

Algorithm 1: bBSOFA pseudocode
Result: Best solution
Randomly generate N solutions (ideas);
while $t < M a x I t e r$ do
if $t 2 % = = 0$ then
Assign the solutions to m clusters;
Evaluate the fitness value of each individual;
In each cluster, sort the individuals according to their fitness
value and store the fittest individual for cluster centroid;
if $r a n d o m_{1} < p_{r e p l a c e}$ then
Select the cluster's centroid randomly;
Generate a new random solution for the change of the
selected cluster center;
Generate new individual;
end
if $r a n d o m_{2} < p_{1}$ then
Use p₁ probability for selecting a random cluster;
if $r a n d o m_{3} < p_{1 c e n t e r}$ then
Select the cluster center and add random value for
creating new individual
end
else
Select randomly an individual from this cluster and add
random value number for generating new individual
end
end
else
Generate new individual from two randomly chosen
Individuals
end
if $r a n d o m_{4} < p_{p 2 c e n t e r}$ then
Combine the two centers of clusters and add a random value
to generate new individual;
else
Combine two individuals from a cluster that is randomly
selected and add a random value for generating new
individual;
end
end
Compare the new individuals with the old, if the new one is
fittest, keep it and replace the old one;
end
else
for all individual i in the population do
Update the position of individuals by utilizing FA search
according to Equation (3);
Compare the fitness value of the new individual with the old
one and keep a better one;
end
end
end
Sort all individuals based on their fitness value;
Return the best individual;

F = α E_{R} (D) + β \frac{|R|}{|C|},

(5)

where the error rate of the classification is denoted by $E_{R} (D)$ , C is the total number of features, whereas R indicates the number of features selected by the algorithm. $α$ and $β$ are weight coefficients, and their values are 0.99 and 0.01, respectively.

The experiment is repeated in 20 runs, 70 is used for the maximum number of iterations. And the population size is set to 8. The control parameters are summarized in Table 1.

Table 1.

Control Parameters

Parameter	Value
BSO parameters
One cluster selection probability p₁	0.8
Total number of clusters $c l u s t e r_{n u m b e r}$	5
Replacing operator probability $p_{r e p l a c e}$	0.2
Probability of choosing the center of cluster 1 $p_{1 c e n t e r}$	0.4
Probability of choosing the centers of clusters 2 $p_{2 c e n t e r}$	0.5
Step size k	20
Parameter $Ω_{1}$	0.5
Parameter $Ω_{2}$	0.5
FA parameters
Randomization parameter $α$	1.0
Attractiveness parameter $b e t a_{0}$	1.0
Bright intensity parameter $γ$	1.0

BSO, brain storm optimization; FA, firefly algorithm.

For evaluating the performance of the proposed method, 21 data sets from UCI Machine Learning Repository are used with different dimension, their summary is presented in Table 2.

Table 2.

Data Sets

Data set name	No. of features	No. of samples
Breast Cancer	9	699
Tic Tac Toe	9	958
Zoo	16	101
Wine EW	13	178
Spect EW	22	267
Sonar EW	60	208
Ionosphere EW	34	351
Heart EW	13	270
Congress EW	16	435
Krvskp EW	36	3196
Waveform EW	40	5000
Exactly	13	1000
Exactly 2	13	1000
M of N	13	1000
Vote	16	300
Breast EW	30	569
Semeion	265	1593
Clean 1	166	476
Clean 2	166	6598
Lymphography	18	148
PenghungEW	325	73

In addition, the bBSOFA is applied for COVID-19 (https://github.com/Atharva-Peshkar/Covid-19-Patient-Health-Analytics) patient health prediction (Iwendi et al., 2020), where the data set has 15 different attributes, including the patient's location, country, gender, age, and different symptoms.

The results of the experiment are compared with other approaches, as well as with the original FA and the original BSO algorithm. The results of WOA, bWOA-S, bWOA-V, BALO1, BALO2, BALO3, PSO, bGWO, and bDA are taken from Hussien et al. (2020), where the experiments are conducted in the same way as in this study and on the same data sets. In addition, the proposed method is compared with three GSK algorithm variants (Mohamed et al., 2020), namely V-shaped GSK-based algorithm (bGSK-V4) (Agrawal et al., 2021b), chaotic GSK-based optimization algorithm (CBi-GSK1) (Agrawal et al., 2021c), and binary GSK-based optimization (FS-pBGSK) (Agrawal et al., 2021d).

The results of the COVID-19 are compared with HLBDA, BDA, BMVO, and BPSO and their results are taken from Too and Mirjalili (2021). It is worth noting that the authors have implemented all mentioned algorithms and recreated the experiments independently, therefore, confirming the results that were used for comparison. The original parameters’ values for core algorithms were utilized, as given in Table 3. For additional information about the algorithms, please refer to the original publications, where the mentioned metaheuristics were introduced.

Table 3.

Control Parameters’ Settings for Metaheuristics Used in the Comparative Analysis

Algorithm	Parameters
WOA	a₁ linearly decreasing from 2 to 0, a₂ linearly decreasing from −1 to −2, b = 1
PSO	Inertia factor 0.1, individual-best acceleration factor of PSO 0.1
GWO	a linearly decreasing from 2 to 0, and C random values in $[0, 2]$
DA	Food attraction weight f = 2 * random
GSK	Probability $p = 0.1$ , knowledge ratio $k_{r} = 0.95$

DA, dragonfly algorithm; GSK, gaining sharing knowledge; GWO, grey wolf algorithm; PSO, particle swarm optimization; WOA, whale optimization algorithm.

The obtained experimental results on the 21 data sets are presented in Tables 4 and 5. Table 4 shows the comparison of the average fitness value of the comparable approaches, whereas Table 5 presents the comparison of the average accuracy of the comparable approaches.

Table 4.

Comparison of the Average Fitness over Different Approaches

No.	WOA	bWOA-S	bWOA-V	BALO1	BALO2	BALO3	PSO	bGWO	bDA	bFA	bGSK-V4	CBi-GSK1	FS-pBGSK	bBSO	bBSOFA
1	0.054	0.052	0.079	0.100	0.099	0.076	0.031	0.035	0.032	0.019	0.029	0.034	0.032	0.025	0.012
2	0.220	0.207	0.215	0.245	0.252	0.246	0.204	0.215	0.209	0.219	0.212	0.208	0.195	0.180	0.165
3	0.153	0.148	0.120	0.183	0.146	0.141	0.078	0.096	0.071	0.010	0.049	0.041	0.042	0.003	0.046
4	0.925	0.928	0.910	0.935	0.938	0.938	0.884	0.903	0.882	0.053	0.043	0.042	0.040	0.005	0.003
5	0.313	0.307	0.289	0.319	0.321	0.312	0.242	0.280	0.255	0.135	0.088	0.085	0.083	0.078	0.140
6	0.304	0.286	0.254	0.278	0.298	0.285	0.168	0.235	0.194	0.128	0.102	0.083	0.090	0.081	0.045
7	0.159	0.158	0.152	0.156	0.169	0.165	0.113	0.141	0.124	0.134	0.067	0.063	0.076	0.065	0.046
8	0.328	0.308	0.259	0.319	0.324	0.308	0.158	0.233	0.167	0.228	0.144	0.142	0.146	0.175	0.110
9	0.389	0.380	0.372	0.393	0.397	0.384	0.337	0.359	0.341	0.046	0.036	0.031	0.035	0.029	0.021
10	0.071	0.074	0.081	0.074	0.072	0.074	0.040	0.061	0.053	0.030	0.042	0.039	0.037	0.024	0.023
11	0.193	0.193	0.195	0.198	0.195	0.193	0.182	0.187	0.188	0.168	0.183	0.183	0.168	0.158	0.158
12	0.303	0.308	0.301	0.301	0.307	0.308	0.151	0.272	0.226	0.238	0.142	0.136	0.128	0.005	0.005
13	0.241	0.244	0.252	0.237	0.244	0.253	0.238	0.244	0.243	0.248	0.385	0.381	0.298	0.227	0.222
14	0.139	0.133	0.155	0.151	0.150	0.136	0.022	0.112	0.072	0.054	0.052	0.049	0.039	0.005	0.005
15	0.084	0.084	0.081	0.089	0.090	0.085	0.048	0.069	0.052	0.073	0.036	0.035	0.033	0.028	0.026
16	0.081	0.058	0.062	0.086	0.088	0.086	0.033	0.057	0.031	0.073	0.087	0.087	0.078	0.05	0.046
17	0.044	0.043	0.037	0.043	0.043	0.044	0.032	0.034	0.030	0.119	0.025	0.020	0.014	0.074	0.071
18	0.191	0.187	0.176	0.184	0.192	0.197	0.136	0.158	0.149	0.161	0.097	0.080	0.053	0.056	0.036
19	0.052	0.052	0.049	0.051	0.052	0.052	0.041	0.044	0.042	0.033	0.056	0.051	0.048	0.026	0.033
20	0.235	0.230	0.223	0.258	0.243	0.237	0.138	0.211	0.160	0.165	0.495	0.478	0.489	0.018	0.066
21	0.260	0.244	0.242	0.276	0.262	0.274	0.149	0.217	0.180	0.275	0.131	0.130	0.129	0.129	0.060

Table 5.

Comparison of the Average Accuracy over Different Approaches

No.	WOA	bWOA-S	bWOA-V	BALO1	BALO2	BALO3	PSO	bGWO	bDA	bFA	bGSK-V4	CBi-GSK1	FS-pBGSK	bBSO	bBSOFA
1	0.785	0.619	0.628	0.740	0.725	0.726	0.802	0.962	0.789	0.987	0.973	0.971	0.972	0.980	0.991
2	0.787	0.799	0.786	0.686	0.681	0.686	0.720	0.764	0.673	0.786	0.794	0.795	0.807	0.827	0.841
3	0.841	0.839	0.822	0.656	0.706	0.680	0.789	0.900	0.779	0.995	0.952	0.962	0.961	1.000	0.958
4	0.065	0.056	0.053	0.039	0.033	0.031	0.039	0.086	0.031	0.951	0.957	0.961	0.963	0.999	1.000
5	0.678	0.670	0.664	0.635	0.623	0.625	0.656	0.707	0.649	0.869	0.872	0.893	0.895	0.924	0.863
6	0.698	0.703	0.703	0.645	0.639	0.647	0.721	0.765	0.705	0.875	0.895	0.916	0.908	0.910	0.958
7	0.835	0.836	0.831	0.819	0.803	0.802	0.835	0.860	0.827	0.869	0.933	0.934	0.922	0.933	0.957
8	0.656	0.654	0.652	0.625	0.621	0.623	0.668	0.751	0.652	0.775	0.854	0.857	0.856	0.824	0.893
9	0.598	0.582	0.595	0.573	0.559	0.577	0.589	0.631	0.571	0.958	0.978	0.980	0.977	0.974	0.983
10	0.936	0.930	0.918	0.766	0.765	0.757	0.794	0.943	0.754	0.975	0.976	0.981	0.981	0.981	0.982
11	0.812	0.808	0.804	0.642	0.649	0.647	0.763	0.816	0.747	0.836	0.822	0.820	0.837	0.844	0.847
12	0.687	0.683	0.691	0.644	0.656	0.648	0.664	0.706	0.642	0.765	1.000	1.000	1.000	1.000	1.000
13	0.738	0.740	0.735	0.733	0.711	0.703	0.723	0.735	0.712	0.756	0.739	0.742	0.770	0.776	0.781
14	0.865	0.865	0.833	0.734	0.732	0.744	0.761	0.883	0.728	0.952	0.953	0.968	0.971	1.000	1.000
15	0.915	0.908	0.900	0.829	0.823	0.829	0.884	0.930	0.866	0.931	0.968	0.969	0.972	0.974	0.976
16	0.761	0.610	0.615	0.730	0.744	0.727	0.810	0.944	0.769	0.930	0.939	0.940	0.939	0.951	0.955
17	0.964	0.964	0.965	0.924	0.939	0.925	0.956	0.972	0.959	0.884	0.977	0.982	0.991	0.924	0.934
18	0.815	0.818	0.803	0.729	0.720	0.724	0.806	0.845	0.791	0.842	0.902	0.921	0.950	0.937	0.969
19	0.956	0.956	0.955	0.908	0.910	0.911	0.953	0.962	0.952	0.972	0.977	0.981	0.983	0.978	0.972
20	0.756	0.755	0.749	0.639	0.672	0.659	0.705	0.786	0.709	0.838	0.504	0.520	0.506	0.987	0.938
21	0.744	0.755	0.725	0.553	0.568	0.563	0.765	0.781	0.730	0.727	0.895	0.899	0.901	0.873	0.943

The bold values in both of the tables indicate the best result among all comparable methods.

By analyzing the obtained results, we can conclude that the proposed bBSOFA is a robust and high-performing method in feature selection problem solving. In the case of 15 data sets, the best resulted approach is bBSOFA in the case of average fitness value comparison. Moreover, on the classification accuracy comparison, the proposed method has the best average accuracy on 16 data sets out of 21 data sets. In the case of three data sets, Exactly, M-of-N, and Waveform EW, bSOFA, and bBSO have the same value, both methods have the best average fitness value.

Comparing the proposed method with other approaches, its average fitness is for 7% less than bBSO's average fitness, and for 49% less than bFA average fitness, and for 48%–66% less than the other nine comparable methods.

Over all 21 data sets, the average classification accuracy of bBSOFA is 1% higher than the average accuracy of bBSO, and 6% higher than bFA's classification accuracy, and for 10%–22% higher than the other nine comparable methods. The proposed method excels especially on data sets with a high number of features.

To test the performance of the proposed method, Wilcoxon nonparametric procedure is employed to detect significant differences between the proposed bBSOFA and the other 14 comparable methods. Table 6 presents the Wilcoxon statistical test results at significance level $α = 0.05$ , where all the comparable methods have <0.05 p-value that indicates the significant difference between the proposed method and other algorithms. The second non-parametric test used for the method assessment is the Friedman test followed by Holm step-down procedure. The results are presented in Table 7, where the proposed method significantly outperformed the compared methods at significance level $α = 0.05$ .

Table 6.

Wilcoxon Signed-Rank Test Nonparametric Statistical Test Results at Significance Level $α = 0.05$

Methods	Fitness p-value	Accuracy p-value
bBSOFA vs.WOA	2.38E-06	9.54E-04
bWOA-S	3.33E-06	9.54E-04
bWOA-V	3.33E-06	9.54E-04
BALO1	2.38E-06	1.03E-03
BALO2	2.38E-06	9.92E-04
BALO3	1.43E-06	1.03E-03
PSO	4.19E-05	9.54E-04
bGWO	6.67E-06	8.47E-04
bDA	4.25E-04	3.80E-04
bGSK-V4	1.63E-03	1.90E-04
CBi-GSK1	1.63E-03	1.59E-04
FS-pBGSK	3.33E-05	9.54E-04
bFA	3.33E-05	6.26E-04
bBSO	1.42E-03	6.62E-05

Table 7.

Friedman Statistical Test Results at Significance Level $α = 0.05$

Methods	p	Ranking
bBSOFA vs.BALO2	1.67E-13	1
BALO1	8.59E-12	2
BALO3	8.59E-12	3
WOA	1.22E-11	4
bWOA-S	2.24E-09	5
bWOA-V	1.03E-08	6
bGWO	1.19E-05	7
bGSK-V4	1.78E-04	8
bFA	3.61E-04	9
PSO	2.46E-03	10
bDA	3.21E-03	11
CBi-GSK1	7.15E-03	12
FS-pBGSK	7.50E-03	13
bBSO	1.62E-02	14

Convergence graphs for the average fitness function have been generated over the course of 20 runs and 70 iterations in each run over all 21 data sets, and shown in Figure 1. The algorithms with the best results have been shown in the convergence graphs, including the proposed bBSOFA method, WOA, PSO, BALO3, bDA, bFA, bBSO, and bGSK-V4. As observed in Figure 1, the proposed bBSOFA method exhibits the fastest overall convergence speed, that is most obvious in case of Wine EW, Ionosphere EW, and Congress EW data sets.

FIG. 1.

Average fitness convergence speed graphs of the proposed hybrid BSO and other competitor methods over 21 data sets. BSO, brain storm optimization.

The fitness value of the binary FA, BSO, and the proposed approach is shown in Figures 2 and 3. The presented box plot diagrams allow visual comparison of the performances of both basic algorithms and the resulting hybridized method. It can be clearly seen that the basic version of the BSO has a larger dispersion over the runs, meaning that the proposed bBSOFA method is more stable. In addition, in some runs, it is obvious that the basic BSO does not have the exploitation that is powerful enough to converge to the better solutions. The proposed hybrid BSO has a powerful exploitation mechanism and it converges faster to the optimum.

FIG. 2.

Box plot of the original BSO, original FA, and the proposed hybrid BSO. FA, firefly algorithm.

FIG. 3.

Box plot of the original BSO, original FA, and the proposed hybrid BSO.

Moreover, the bBSOFA approach is employed for COVID-19 and compared with the other state-of-the-art methods, where the proposed method is overperformed by the other approaches. The accuracy of bBSOFA is 93.57%, whereas the second best performing approach is HLBDA, with a classification accuracy of 92.21%. In the case of the total selected features, both approaches selected two to three features on average, and the feature size comparison is depicted in Figure 4. Based on the analysis of the selected features, we can draw a conclusion that specific features are not important for the prediction, and symptom4, symptom5, and symptom6 are never selected by the algorithm in the experiments.

FIG. 4.

Feature size of comparable methods.

5. CONCLUSION

Feature selection is a very important preprocessing method in data mining for reducing the dimensionality of data, which leads to computation time reduction and also can positively affect the classification accuracy. In this article, a novel approach is proposed for the feature selection problem by employing the hybridized BSO algorithm with the FA. The hybrid algorithm overcomes the lack of exploitation in the original BSO algorithm. For the purpose of performance evaluation, 21 data sets are utilized from the UCI Machine Learning repository. Moreover, the method is employed for COVID-19 classification.

The obtained statistical results prove the efficiency and robustness of the proposed method over other state-of-the-art approaches.

For future study, we plan to incorporate the bBSOFA method for other machine learning problems and predictions, such as neural network weight and architecture optimization, as well as to experiment with more different data sets.

Footnotes

AUTHOR DISCLOSURE STATEMENT

The authors declare they have no conflicting financial interests.

FUNDING INFORMATION

This research is supported by the Project Grant No. III-44006 by Ministry of Education, Science, and Technological Development of Republic of Serbia.

References

Agrawal

, Abutarboush

H.F.

, Ganesh

, et al. 2021a. Metaheuristic algorithms on feature selection: A survey of one decade of research (2009–2019). IEEE Access 9, 26766–26791.

Agrawal

, Ganesh

, and Mohamed

A.W.

2021b. Chaotic gaining sharing knowledge-based optimization algorithm: An improved metaheuristic algorithm for feature selection. Soft Comput. 25, 9505–9528.

Agrawal

, Ganesh

, and Mohamed

A.W.

2021c. A novel binary gaining-sharing knowledge-based optimization algorithm for feature selection. Neural Comput. Appl. 33, 5989–6008.

Agrawal

, Ganesh

, Oliva

, et al. 2021d. S-shaped and v-shaped gaining-sharing knowledge-based algorithm for feature selection. Appl. Intell. 52, 81–112.

Bacanin

, Bezdan

, Tuba

, et al. 2019a. Task scheduling in cloud computing environment by grey wolf optimizer, 1–4. In 2019 27th Telecommunications Forum (TELFOR). IEEE, Belgrade, Serbia.

Bacanin

, Bezdan

, Tuba

, et al. 2020a. Monarch butterfly optimization based convolutional neural network design. Mathematics 8, 936.

Bacanin

, Bezdan

, Tuba

, et al. 2020b. Optimizing convolutional neural network hyperparameters by enhanced swarm intelligence metaheuristics. Algorithms 13, 67.

Bacanin

, Tuba

, Bezdan

, et al. 2019b. Artificial flora optimization algorithm for task scheduling in cloud computing environment, 437–445. In International Conference on Intelligent Data Engineering and Automated Learning. Springer, Cham.

Bacanin

, Tuba

, Zivkovic

, et al. 2019c. Whale optimization algorithm with exploratory move for wireless sensor networks localization, 328–338. In International Conference on Hybrid Intelligent Systems. Springer.

10.

Basha

, Bacanin

, Vukobrat

, et al. 2021. Chaotic Harris Hawks optimization with quasi-reflection-based learning: An application to enhance CNN design. Sensors, 21, 19.

11.

Bezdan

, Tuba

, Strumberger

, et al. 2020a. Automatically designing convolutional neural network architecture with artificial flora algorithm, 371–378. In Tuba, M., Akashe, S., and Joshi, A., eds. ICT Systems and Sustainability. Springer, Singapore.

12.

Bezdan

, Zivkovic

, Antonijevic

, et al. 2020b. Enhanced flower pollination algorithm for task scheduling in cloud computing environment, 163–171. In Joshi, A., Khosravy, M., and Gupta, N. (eds): Machine Learning for Predictive Analysis. Springer, Springer.

13.

Bezdan

, Zivkovic

, Tuba

, et al. 2020c. Multi-objective task scheduling in cloud computing environment by hybridized bat algorithm, 718–725. In International Conference on Intelligent and Fuzzy Systems. Springer.

14.

Cao

, Hei

, Wang

, et al. 2015. An improved brain storm optimization with differential evolution strategy for applications of ANNs. Math. Probl. Eng. 2015, 1–18.

15.

Chen

, Chen

, Mohamed

A.W.

, et al. 2020. Swarm intelligence application to UAV aided IoT data acquisition deployment optimization. IEEE Access, 8, 175660–175668.

16.

Chen

, Cheng

, Chen

, et al. 2015. Enhanced brain storm optimization algorithm for wireless sensor networks deployment, 373–381. In Tan, Y., Shi, Y., Buarque, F., et al. (eds): International Conference in Swarm Intelligence. Springer, Cham.

17.

Cheng

, Shi

, Qin

, et al. 2013. Solution clustering analysis in brain storm optimization algorithm, 111–118. In 2013 IEEE Symposium on Swarm Intelligence (SIS). IEEE.

18.

ElQuliti

S.A.H.

, and Mohamed

A.W.

2016. A large-scale nonlinear mixed-binary goal programming model to assess candidate locations for solar energy stations: An improved real-binary differential evolution algorithm with a case study. J. Comput. Theor. Nanosci. 13, 7909–7921.

19.

Hassan

S.A.

, Alnowibet

, Agrawal

, et al. 2020. Optimum scheduling the electric distribution substations with a case study: An integer gaining-sharing knowledge-based metaheuristic algorithm. Complexity. Vol. 2020, article ID: 6675741.

20.

, Zhang

, and Gong

2020. Multiobjective particle swarm optimization for feature selection with fuzzy cost. IEEE Trans. Cybern. 51, 874–888.

21.

Hussien

A.G.

, Oliva

, Houssein

E.H.

, et al. 2020. Binary whale optimization algorithm for dimensionality reduction. Mathematics, 8, 10.

22.

Iwendi

, Bashir

A.K.

, Peshkar

, et al. 2020. COVID-19 patient health prediction using boosted random forest algorithm. Front. Public Health. 8, 357.

23.

Milosevic

, Bezdan

, Zivkovic

, et al. 2021. Feed-forward neural network training by hybrid bat algorithm, 52–66. In Modelling and Development of Intelligent Systems: 7th International Conference. Springer International Publishing, Sibiu, Romania.

24.

Mohamed

A.W.

, Hadi

A.A.

, and Mohamed

A.K.

2020. Gaining-sharing knowledge based algorithm for solving optimization problems: A novel nature-inspired algorithm. Int. J. Mach. Learn. Cybern. 11, 1501–1529.

25.

Shi

2011. Brain storm optimization algorithm, 303–309. In International Conference in Swarm Intelligence. Springer, Berlin, Heidelberg.

26.

Song

X.F.

, Zhang

, Gong

D.W.

, et al. 2021. A fast hybrid feature selection based on correlation-guided clustering and particle swarm optimization for high-dimensional data. IEEE Trans. Cybern. DOI: 10.1109/TCYB.2021.3061152

27.

Song

X.F.

, Zhang

, Guo

Y.N.

, et al. 2020. Variable-size cooperative coevolutionary particle swarm optimization for feature selection on high-dimensional data. IEEE Trans. Evol. Comput. 24, 882–895.

28.

Too

, and Mirjalili

2021. A hyper learning binary dragonfly algorithm for feature selection: A COVID-19 case study. Knowl. Based Syst. 212, 106553.

29.

Tuba

, Strumberger

, Bezdan

, et al. 2019. Classification and feature selection method for medical datasets by brain storm optimization algorithm and support vector machine. Procedia Comput. Sci. 162, 307–315.

30.

Van Der Maaten

, Postma

, Van den Herik

, et al. 2009. Dimensionality reduction: A comparative. J. Mach. Learn. Res. 10, 13.

31.

Wolpert

D.H.

, and Macready

W.G.

1997. No free lunch theorems for optimization. Trans. Evol. Comp. 1, 67–82.

32.

Xiong

, Li

, Mohamed

A.W.

, et al. 2021. A new method for parameter extraction of solar photovoltaic models using gaining-sharing knowledge based algorithm. Energy Rep. 7, 3286–3301.

33.

Xue

, Zhang

, and Browne

W.N.

2014. Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms. Appl. Soft Comput. 18, 261–276.

34.

Xue

, Wu

, Shi

, et al. 2012. Brain storm optimization algorithm for multi-objective optimization problems, 513–519. In Tan, Y., Shi, Y., and Ji, Z. (eds): In International Conference in Swarm Intelligence. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642.30976-2_62.

35.

Xue

, Xue

, and Zhang

2019. Self-adaptive particle swarm optimization for large-scale feature selection in classification. ACM Trans. Knowl. Discovery Data. 13, 1–27.

36.

Yang

X.-S.

2009. Firefly algorithms for multimodal optimization, 169–178. In Watanabe, O., and Zeugmann, T., eds, Stochastic Algorithms: Foundations and Applications. Springer, Berlin, Heidelberg.

37.

Zhang

, Wang

Y.H.

, Gong

D.W.

, et al. 2021. Clustering-guided particle swarm feature selection algorithm for high-dimensional imbalanced data with missing values. IEEE Trans. Evol. Comput. DOI: 10.1109/TEVC.2021.3106975.

38.

Zivkovic

, Bacanin

, Tuba

, et al. 2020a. Wireless sensor networks life time optimization based on the improved firefly algorithm, 1176–1181. In 2020 International Wireless Communications and Mobile Computing (IWCMC). IEEE.

39.

Zivkovic

, Bacanin

, Venkatachalam

, et al. 2021. COVID-19 cases prediction by using hybrid machine learning and beetle antennae search approach. Sustainable Cities Soc. 66, 102669.

40.

Zivkovic

, Bacanin

, Zivkovic

, et al. 2020b. Enhanced grey wolf algorithm for energy efficient wireless sensor networks, 87–92. In 2020 Zooming Innovation in Consumer Technologies Conference (ZINC). IEEE.