Abstract
In this study, we propose an analysis system combined with feature selection to further improve the classification accuracy of single-trial electroencephalogram (EEG) data. Acquiring event-related brain potential data from the sensorimotor cortices, the system comprises artifact and background noise removal, feature extraction, feature selection, and feature classification. First, the artifacts and background noise are removed automatically by means of independent component analysis and surface Laplacian filter, respectively. Several potential features, such as band power, autoregressive model, and coherence and phase-locking value, are then extracted for subsequent classification. Next, artificial bee colony (ABC) algorithm is used to select features from the aforementioned feature combination. Finally, selected subfeatures are classified by support vector machine. Comparing with and without artifact removal and feature selection, using a genetic algorithm on single-trial EEG data for 6 subjects, the results indicate that the proposed system is promising and suitable for brain–computer interface applications.
Keywords
Introduction
BCI supplies humans with a communication channel to transmit messages directly from brain to computer by analyzing mental activities.1-3 Single-trial analysis of EEG signals, associated with finger lifting (FL), has grown rapidly in the past decade. 2 It focuses on the discrimination between left and right FL data using event-related brain potentials (ERP) for EEG analysis, which reveals that there are special characteristics of event-related desynchronization (ERD) and synchronization (ERS) in mu and beta rhythms over the sensorimotor cortices during FL tasks. 4 Many studies discriminate FL EEG data by means of ERD/ERS components.5,6 However, most of the studies work on multichannel data and do not perform the selection of features, which will decrease the performance and practicality. The principal goal of this study is to propose a practical EEG-based BCI system, which performs on fewer channels and selects significant features using an ABC algorithm from several potential features for FL classification. The feature selection with ABC algorithm further enhances the performance in BCI applications. The proposed noninvasive BCI system can effectively increase the accuracy of motor control for the subjects that are disabled.
Independent component analysis (ICA) is a kind of blind source separation that estimates the source components under unknown knowledge of sources. It is a statistical method that transforms observed multidimensional mixed signals into statistically independent components. Compared with principal component analysis (PCA), which only ensures output patterns are uncorrelated, ICA makes certain their statistical independence. It has been applied in a variety of fields to remove artifacts, such as ICA used to reduce the artifacts of muscle,7,8 and the blind source separation could show neurophysiolgically and neuroanatomically meaningful neuronal components without the assumption of prior physic models. 9 In this study, a modified method is proposed to automatically remove the electro-oculography (EOG) artifacts by means of the FastICA algorithm 10 and correlation coefficient.
BCI feature extraction greatly affects classification accuracy. In this study, several potential features are extracted for subsequent classification. The first is band power. The powers of spectral bands are obtained by calculating mean and the variance of data of mu and beta rhythms. It is advantageous in recognizing mental tasks in specific frequencies of brain activity. The second is the autoregressive (AR) model. In general, parameters change very slowly. AR models have been widely applied to feature extraction in BCI. 11 The all-pole AR model lends itself well to modeling the EEG as filtered white noise with certain preferred energy bands. The EEG time series is fitted with an AR model, which can be intuitively rephrased in the frequency domain as white noise source driving an all-pole spectral shaping network. 12 The third is coherence 13 and phase-locking value (PLV).14,15 Phase synchronization is a phenomenon that occurs in the numerous oscillatory processes of biological human brains. 16 It has been applied to various biomedical signal analyses, such as probabilistic graphical models in artificial intelligence 17 and high-speed target tracking. 18 The motivation to study the phase relations of scalp-recorded EEG signals is based on reflecting the cooperative interactions between anatomically disparate neural populations. 19 Such cooperative brain processes, detectable at a variety of spatial scales, are supposed to be fundamental to the dynamic organization of sensory and cognitive brain functions. 20 The phase relations, quantified by the coherence and PLV, are used as features.
To select representative subfeatures for each subject, the ABC algorithm 21 is used. It is an optimization algorithm based on the intelligent foraging behavior of honey bees. 21 The objective of ABC algorithm is to increase classification accuracy by selecting the appropriate subfeatures to reduce redundant information.
Support vector machine (SVM) 22 recognizes the patterns in 2 categories from a set of data and is used for the analyses of classification and regression. Since it can balance accuracy and generalization simultaneously, 22 it is used for classification in this study.
Materials and Methods
A flowchart of EEG analysis system for single-trial FL EEG classification is illustrated in Figure 1. It chiefly consists of EOG artifact removal, background removal, feature extraction, and feature selection using ABC algorithm and classification. Several common features, such as band power, AR models, coherence and PLV, are extracted from acquired EEG signals. Next, the ABC algorithm is used for feature selection for each subject in the off-line training. Finally, the SVM is used for classification.

Flowchart of the proposed brain–computer interface system. It consists of electro-oculography (EOG) artifact removal, background removal, feature extraction, feature selection using artificial bee colony (ABC) algorithm, and feature classification.
Data Acquisition
EEG signals from 6 untrained subjects (5 males and 1 female, 4 right-handed and 2 left-handed) were recorded in a shielded room, using 13 silver/silver chloride electrodes, including 10 scalp EEG channels (C3, C5, FC3, C1, CP3, C4, C2, C6, FC4, and CP4), 2 electromyography (EMG) channels for monitoring left and right muscle activity, and 1 channel on the forehead to record possible EOG artifacts and eye blinks during the experiment, 23 as illustrated in Figure 2. All electrodes were referenced to the A1 lead at the left earlobe. Prior to sampling, at the rate of 256 Hz, the EEG data were filtered by an analog band-pass filter, with cutoff frequencies at 0.5 and 100 Hz, and amplified by a multiple of 10 000. During the experiments, each subject was asked to perform 2 trials that included left and right FL in each test. Figure 3 illustrates the experimental protocol. Each trial was 10 seconds in length, which makes the tests 20 seconds long. For each left/right lifting trial, the first 4 seconds were quiet, and then an acoustic stimulus was given as a cue to signify the beginning of left/right FL. At the same time, each subject was asked to execute left/right FL. We recorded 60 tests for each subject; therefore, there were 120 trials for each. No trials were removed during processing EEG data. Data segments for FL were acquired from second −2 to second 2, where second 0 stands for the trigger of movement, by detecting the peak EMG signal after linear envelope processing (only the data recorded between seconds −2 and 2 were considered event related).

Ten EEG electrode locations in international standard 10-20 system, including C3, C5, FC3, C1, CP3, C4, C2, FC4, C6, and CP4. All electrodes are referenced to the A1 lead at the left earlobe.

Experimental protocol.
Electro-oculography Artifact Removal
Multidimensional mixed signals are transformed by ICA into components that are as statistically independent from others as possible. It also resolves the blind source separation problem. That is, the source components are calculated under almost no advance knowledge of the nature of sources. The PCA only ensures output patterns are uncorrelated, while the ICA guarantees they are statistically independent. Statistical independence requires that all high-order correlations are zero, whereas decorrelation only minimizes the second-order statistics. The ICA is applied to the blind source separation of EEG signals based on a reasonable assumption that EEG data acquired from multiple scalp electrodes are linear combinations of temporally independent components.
Each test was arranged into an
In addition, a natural measure of similarity, the absolute value of correlation coefficient, between the EOG channel and estimated independent components is proposed to automatically eliminate the EOG artifacts. The independent component with maximal similarity, which must be larger than a predefined threshold, is regarded as pure EOG artifacts. In the experiments, the threshold is chosen between 0.8 and 0.85. After the removal of EOG artifacts, the EEG signals eliminating EOG artifacts are recovered from remaining independent components.
Background Removal
Non-EEG noise is significantly different from EEG signals in both topography and frequency. The mu and beta rhythms of the EEG are those components with frequencies distributed between 8 and 30 Hz and located over the sensorimotor cortex. EOG signals are maximal at low frequencies (<5 Hz) and are prominently situated over the anterior head regions. Hence, an appropriate filtering method can increase the signal-to-noise ratio by enhancing EEG signals and reducing non-EEG noise at the same time. The surface Laplacian filter is a simple, but effective, filtering method. 24 It calculates the second derivative of the spatial voltage distribution for a selected electrode. It is a high-pass spatial filter that enhances localized activities and reduces background noise. This filter is achieved by subtracting the average potential of a set of surrounding electrodes from the electrode of interest,
where
Feature Extraction
Feature extraction is performed on the acquired EEG data. Its aim is to find a proper transform of data to further enhance subsequent classification. It greatly affects the performance of classification. In other words, the better the extracted features, the higher the classification accuracy. To achieve good recognition rates, several potential features are selected in this study. They are described respectively in detail as follows.
Band Power
The preprocessed EEG data are filtered into the respective bands with pass-band spectral range of mu and beta rhythms, using a Butterworth band-pass filter. The powers of specific spectral bands are calculated using both mean and variance of filtered data.
Autoregressive
Model. Autoregressive models have been applied to feature extraction in BCI. 11 The all-pole AR model lends itself well to modeling the EEG as filtered white noise with certain preferred energy bands. The EEG time series is fitted with an AR model. The AR model can be intuitively rephrased in the frequency domain as white noise source driving an all-pole spectral shaping network. 12 In this study, the AR model is constructed with the least mean square and the order of model is chosen as 6.
Coherence and Phase-Locking Value
A variety of approaches have been proposed to measure the synchronization of 2 signals. The coherence
13
is popular in analyzing EEG signals. It is derived from the cross-spectrum of 2 time series signals. More specifically, the Fourier transform of a signal xi(t) is represented in terms of its amplitude ri and phase
The cross-spectrum of 2 signals are then defined as,
where
The coherence is then obtained by calculating the absolute value of complex coherence,
In addition, PLV is another popular term recently used to measure the synchrony of 2 signals in EEG studies.14,15 It is defined as
It is similar to coherence. That is, the PLV only contains the phase difference between 2 signals, but their amplitudes are not included in the PLV. Since only the synchronization of phases is evaluated, it may be a more suitable measure to investigate the phenomena of synchronization in EEG signals. 15 In single-trial applications, the coherence and PLV are calculated by carrying out the average process over time,
where
Feature Selection
An ABC algorithm 21 is used as the optimization procedure of feature selection for the combination of features in this study. The features, including band power, AR model, and coherence and PLV, are concatenated to yield a 1-dimensional vector. The ABC algorithm, which is an optimization algorithm, based on the intelligent foraging behavior of honey bees, was originally designed for continuous optimization problems. 21 In addition, its extended versions have been applied to a wide variety of fields to resolve different optimization problems, such as image segmentation and clustering. 25 The ABC algorithm is simple and uses only common control parameters, such as colony size and maximum cycle number. Therefore, it has the advantages of few control parameters, fast convergence, and both exploration and exploitation.
The ABC algorithm provides a population-based search procedure, where food sources are updated by the artificial bees with time, and the bee’s aim is to discover the places of food sources with high nectar amount, and finally the one with the highest nectar. The model of intelligent forage selection in a honeybee swarm contains 3 kinds of bees: employed bees associated with specific food sources, onlooker bees watching the dance of employed bees within the hive to choose a food source, and scout bees searching for food sources randomly. 26 More specifically, employed bees are responsible for exploiting the nectar sources explored before, and they give information to the onlooker bees in the hive about the quality of the food source that they are exploiting. The onlooker bees tend to select good food sources from those found by the employed bees, and then further search food around the selected food source. Scout bees perform the job of exploration, whereas employed and onlooker bees perform the job of exploitation. Accordingly, the ABC algorithm combines local search methods, carried out by employed and onlooker bees, with global search methods, processed by onlookers and scouts, attempting to balance exploration and exploitation process.
In ABC algorithm, the position of a food source stands for a solution of the optimization problem, while the nectar amount of a food source represents the quality (fitness) of the solution represented by that food source. The process of bee seeking for good food sources is to find the optimal solution. The algorithm begins with a population of randomly distributed positions of food sources.
The steps of ABC algorithm are described as follows:
Step 1: Assign control parameters
Step 2: Initialize solutions
Step 3: Perform employed bee phase and evaluate the fitness
Step 4: Perform onlooker bee phase and evaluate the fitness
Step 5: Perform scout bee phase
Step 6: Calculate the best solution and update it
Step 7: Check whether stopping criteria is met. If not, return to step 3 and repeat the iteration.
Classification
It may be difficult to establish stable neural networks, since an appropriate number of hidden layers and neurons needs to be carefully selected to approximate the function in question to the desired accuracy. The SVM, first proposed by Vapnik, 19 not only has a very steady theory in statistical learning, but guarantees the optimal decision function from a set of training data. The chief idea of SVM is to construct a hyperplane, as the decision surface, in such a way that the margin of separation between positive and negative examples is maximized. The SVM optimization problem is
where
where
K-Fold Cross-Validation
The data set for each subject is divided into k subsets, and the following procedure is repeated k times. Each time, one of the k subsets is used as the test set and the other k − 1 are used as training set. The average recognition rate is immediately evaluated across all k folds.
Results
Performance of EOG Artifact Removal
To verify the practicality and efficacy of automatic EOG artifact removal, we compare the classification accuracy between without and with EOG artifact removal for features selected by genetic algorithm (GA) and classified by SVM. In this experiment, the features are all extracted from the event-related windows for these 2 conditions. The comparison results of classification accuracy are listed in Table 1. The average accuracy for without EOG artifact removal is 77.1%, whereas that for EOG artifact removal increases to 83.1%. Hence, it indicates that performing automatic EOG artifact removal can improve the performance of FL EEG classification for each subject by average 6.0%.
Comparison of Classification Accuracy Between Without and With EOG Artifact Removal.
Performance of Feature Selection
An experiment for evaluating the performance of different feature selection algorithms from the combination of several potential features is performed. Table 2 shows the results of classification accuracy between feature selection using GA and ABC algorithm, where features include band power, AR model, and coherence and PLV, under EOG artifact removal and SVM classifier. These 2 selected features are compared after EOG artifact removal for all subjects. The listed values demonstrate only the deviations of performance between different feature selection algorithms. The average accuracy for feature selection using GA is 83.1%, while that for feature selection using ABC is 88.8%. Feature selection using ABC produces better results for all subjects. Therefore, feature selection using ABC obtains better results in FL EEG classification.
Comparison of Classification Accuracy Between Feature Selection Using Genetic Algorithm (GA) and Artificial Bee Colony (ABC) Algorithm.
Discussion
Statistical Evaluation of EOG Artificial Removal
Independent component analysis can separate multivariate signals into additive subcomponents, assuming mutual statistical independence, and consequently it is applied to eliminate artifacts from acquired EEG data. A natural measure of similarity is then used to detect EOG artifacts from independent components. Finally, the artifact-removal EEG signals are obtained from the recovery of remaining independent components. Table 1 lists the comparisons of performance between without and with EOG artificial removal. Moreover, 2-way analysis of variance is performed to evaluate whether the results of EOG artificial removal are significantly different. Performance improvement is significant (P = .0007). Artificial removal of EOG artifacts from event-related windows further improves classification accuracy.
Statistical Evaluation of Feature Selection
Band powers of mu and beta rhythms are calculated by the mean and variance of data. AR models, which slightly vary with time, are estimated with least mean square. Phase relations, quantified by coherence and PLV, are based on cooperative interactions reflecting anatomically disparate neural populations. ABC is a global search algorithm, which takes both the exploration and exploitation into consideration, and can efficiently select the appropriate sub-features to increase classification accuracy by reducing redundant data. Table 2 lists performance between feature selection using GA and ABC algorithm. In addition, 2-way analysis of variance is performed one more time, to validate if these 2 feature selection methods are significantly different. There are significant differences between them (P = .0005). Accordingly, feature selection using ABC is a better choice in BCI applications.
Conclusion
A BCI system has been proposed for the classification of single-trial EEG data in this study. Combined with the removal of artifacts and background noise, and the use of several potential features, the ABC algorithm is applied to the selection of significant features. Several features, including band power, AR model, and coherence and PLV, are extracted and then concatenated. Next, the ABC algorithm is used for feature selection from the feature combination, which can greatly improve classification accuracy. Finally, the SVM is used for feature classification. Experimental results illustrate that feature selection using ABC can further enhance the performance in the applications of BCI.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: They express their sincere appreciation for grants from NSC102-2633-E-194-002, MOST103-2410-H-194-070-MY2, NSC103-2622-E-194-003-CC2, MOST103-2815-C-194-021-E, and MOST103-2815-C-194-007-H, Ministry of Science and Technology, Taiwan.
