Abstract
In recent decades, the automatic emotion state classification is an important technology for human-machine interactions. In Electroencephalography (EEG) based emotion classification, most of the existing methodologies cannot capture the context information of the EEG signal and ignore the correlation information between dissimilar EEG channels. Therefore, in this study, a deep learning based automatic method is proposed for effective emotion state classification. Firstly, the EEG signals were acquired from the real time and databases for emotion analysis using physiological signals (DEAP), and further, the band-pass filter from 0.3 Hz to 45 Hz is utilized to eliminate both high and low-frequency noise. Next, two feature extraction techniques power spectral density and differential entropy were employed for extracting active feature values, which effectively learn the contextual and spatial information of EEG signals. Finally, principal component analysis and artificial neural network were developed for feature dimensionality reduction and emotion state classification. The experimental evaluation showed that the proposed method achieved 96.38% and 97.36% of accuracy on DEAP, and 92.33% and 89.37% of accuracy on a real-time database for arousal and valence emotion states. The achieved recognition accuracy is higher compared to the support vector machine on both databases.
Keywords
Introduction
Human emotions are psychological experiences, which are characterized by the mental activities that entail synchronized features such as action tendency, response, expression, and knowledge [1, 2]. Generally, human emotions are recognized by body postures, facial expressions, speech, physiological activities, etc. [3, 4, 5]. The available techniques are based on the expressed emotions, which do not capture the innermost feeling of the humans. However, the Electroencephalography (EEG) signal superiorly provides the human emotional pattern and secret information [6, 7]. Since, due to the synchronized firing of the neurons, human emotions originate from the peripheral and central nervous systems, which cause transient agitation. The visual EEG signal analysis is a tedious and time-consuming procedure, and it leads to false point detection [8, 9]. In recent periods, many researchers implemented several machine learning methods: random forest, k-nearest neighbor, decision tree, etc. for emotion state classification using bio-signals [10, 11]. Compared to the conventional machine learning methods, the deep learning methods provide successful results in the emotion state classification using EEG signals [12]. Still, deep learning methods need to solve the non-stationary characteristics of the EEG signal [13, 14]. To enhance the emotion classification, a novel dimensionality reduced deep learning method is implemented in this article. The contributions of this article are stated below:
Initially, the EEG signals are acquired from databases for emotion analysis using physiological signals (DEAP) and a real time database. Further, the bandpass filter from 0.3 Hz to 45 Hz frequency is applied to eliminate higher and lower frequency noises like electrooculography from the acquired EEG signals. In addition, a hybrid feature extraction: Power Spectral Density (PSD) and differential entropy techniques are utilized for extracting feature values from the denoised EEG signals. In this application, the hybrid feature extraction includes the benefits such as improved data visualization, over-fitting risk reduction, speed up the training process, and accuracy improvement in the classification model. The extracted multi-dimensional feature values are optimized by the Principal Component Analysis (PCA) method to improve the computational time and system complexity of the proposed frame-work. Additionally, the feature dimensionality reduction enhances the interpretation of the hyper-parameters in the Artificial Neural Network (ANN) method. Lastly, the ANN method is applied for emotion classification as arousal and valence. The PCA based ANN method’s performance is evaluated in terms of F1 score, recall, and recognition accuracy.
This article is prepared as follows: some papers related to emotion state classification are surveyed in Section 2. The theoretical and the mathematical explanation of the PCA based ANN method is denoted in Section 3. The experimental results and the conclusion of the PCA based ANN method is stated in Sections 4 and 5.
Yin et al. [15] initially used differential entropy for extracting feature values from the EEG data, and then the extracted feature values were given to the deep learning models: Long Short Term Memory (LSTM) network and Graph Convolutional Neural Network (GCNN) to achieve efficient emotion classification result. In this study, the developed model’s efficiency was tested on the DEAP database in terms of accuracy. However, the presented deep learning model was too expensive for practical problems. Liu and Fu [16] integrated relief and PCA techniques for selecting significant signal channels. Further, the frequency and time domain feature extraction techniques were employed for extracting active feature values from the acquired EEG data. Finally, the extracted feature values were given to the Support Vector Machine (SVM) for emotion classification. The experiment conducted on the DEAP database showed the efficiency of the presented model. The SVM classifier supports only binary class, which was the main issue in this literature. Subasi et al. [17] implemented a noise reduction methodology: Discrete Wavelet Transform (DWT) to eliminate noise from the acquired EEG signals. Next, Tunable Q Wavelet Transform (TQWT) and Multi-Scale PCA (MS-PCA) were applied for discriminative feature extraction and dimensionality reduction. Lastly, the rotation forest ensemble classifier was utilized for emotion classification [41, 42]. However, the computational complexity of the presented model was higher, due to the incorporation of several models.
Salankar et al. [18] integrated both 2nd order difference plot and the Empirical Mode Decomposition (EMD) technique for extracting feature values from DEAP database. Further, the Multi-Layer Perceptron (MLP) classification technique was used for emotion recognition. The simulation result revealed that the developed model was useful for the clinical analysis of high and low dominance regions in the patients, whereas the labelling procedure was hard in the MLP classifier compared to other learning models. Song et al. [19] developed a Graph Embedded CNN (GECNN) model for EEG based emotion classification. The GECNN model was computationally costly because it required higher end graphics processing units for data validation. Gao et al. [20] integrated power spectral densities, sample entropy, differential and hjorth entropy for feature extraction. Then, GoogleNet and SVM techniques were utilized for feature fusion and emotion classification. The experimental results on the DEAP database showed that the developed model has attained excellent performance by means of recognition accuracy. Hence, the SVM was inappropriate for multi-class classification, because it mainly supports binary class classification. Shen et al. [21] presented a new multi-scale frequency band based ensemble learning method for emotion classification utilizing EEG signals. The experimental result represents the efficiency of the presented model in emotion classification. Still, it was essential to find the optimal frequency band scale to enhance emotion detection.
Sharma et al. [22] initially decomposed the acquired EEG signals using DWT technique, and then Particle Swarm Optimization (PSO) technique and LSTM network were applied for data reduction and emotion classification. In this literature, the presented model’s effectiveness was evaluated on the DEAP database with a ten-fold cross validation method. The simulation result confirmed that the presented model has the potential for rapid and accurate recognition of human emotions. In addition, Yang et al. [23] implemented a multi-column CNN model for emotion classification by using different bio-signals. The developed model achieved high classification accuracy on the DEAP database compared to the traditional models. As denoted earlier, the CNN and LSTM needed a larger number of training data that was computationally costly. Mert and Akan [24] integrated Multivariate EMD (MEMD) and Independent Component Analysis (ICA) for feature extraction and dimensionality reduction. Additionally, the dimensionally reduced feature values were given to the ANN for emotion classification as low/high valence and low/high arousal states. However, the linear decomposition delivers an in-appropriate analysis in detecting emotions from non-stationary signals. Song et al. [25] developed a Dynamical Graph CNN (DGCNN) model for multi-channel EEG based emotion classification. The simulation result demonstrated that the developed DGCNN model attained a significant classification performance than the prior models in terms of recognition accuracy. However, over-fitting was the major problem faced by the researchers in this literature.
Chen et al. [26] integrated approximation entropy and EMD technique for extracting feature values from the acquired EEG signal. Further, the extracted feature values were given to the Deep Belief Network (DBN) for classifying human emotions like fear, sadness, happiness, and calm. The simulation results demonstrated that the presented model attained higher accuracy than the existing models. The reduction of interference and the selection of effective emotion properties needed to be concentrated as a future work to further improve emotion recognition. He et al. [27] developed a Firefly Integrated Optimization Algorithm (FIOA) for effective emotion classification, whereas the computation was an imperative factor in this literature. Chao et al. [28] implemented a novel method based on a capsule network and multi-band feature matrix for emotion classification using EEG signal. The simulation outcomes showed that the capsule network obtained satisfactory results on the DEAP database. However, it was difficult to integrate spatial characteristics and common characteristics of the EEG signals. In addition, Ullah et al. [29] integrated sparse projection coefficients and the PCA technique for effective emotion classification. The experimental results on the online databases confirmed the superiority of the developed model over the conventional models. The trade-off between dimensionality reduction and information loss was a major concern in the PCA technique.
Li et al. [30] implemented Hierarchical CNN (HCNN) model for classifying negative, neutral, and positive emotional states. In this literature, the HCNN model determines the higher frequency waves like Gamma and Beta for processing the human emotions, since this process was computationally expensive. Additionally, Pandey and Seeja [31] combined Deep Neural Network (DNN) and Variational Mode Decomposition (VMD) technique for emotion classification. The simulation examination on the DEAP database confirmed the effectiveness of the developed model by means of recognition accuracy. However, the DNN model needed a larger number of data to perform better classification than the existing models. Liu et al. [32] integrated both capsule network and multilevel features for effective human emotion detection. As denoted earlier, the capsule network faces difficulty in integrating the common and spatial characteristics of the EEG signal. Zhang et al. [33] used two logistic regression models and a sparse autoencoder method for valence and arousal classification. The test results showed that the presented model attained acceptable classification performance on the DEAP database. The assumption of linearity between the independent and dependent variables was a main concern in the logistic regression models.
Aboneh et al. [43] has implemented a stack-based ensemble learning model for land use and land cover image classification. In this literature study, the implemented ensemble learning model integrates XGBoost classifier for improving the classification accuracy. The performance of the implemented ensemble learning model was tested on a real time database, which was collected from Bishoftu town, Ethiopia. The main objective of this literature study was to analyze the performance of land use and land cover classification on the multi-spectral images, but the implemented model was computationally complex. Alexandropoulos et al. [44] developed a hybrid decision support system based on the ensemble of classifiers in order to make strong predictions. In this literature study, the developed hybrid decision support system’s performance was compared to other existing known classifiers. In the resulting segment, the extensive experiments conducted on the standard benchmark databases showed the effectiveness of the developed hybrid decision support system. Kwon et al. [45] integrated machine learning classifiers such as generalized linear model, gradient boosted model, deep neural network and distributed random forest for breast cancer classification. As specified in the resulting section, the developed stacking ensemble model obtained higher classification accuracy, but computationally expensive and complex. For highlighting the above-stated concerns, a novel PCA based ANN method is implemented for effective human emotion classification.
Principal Component Analysis (PCA) based Artificial Neural Network (ANN) method
Generally, human emotion states are recognized from body movement, facial expression and speech, but the EEG signal is effective in identifying the human emotion state because it directly records and estimates the electrical activity of the brain. Additionally, the EEG signal estimates the voltage fluctuation in the brain that helps in extracting the essential information about the human emotion state. The proposed framework includes five phases signal collection: real-time and DEAP database, pre-processing: band-pass filter, feature extraction: differential entropy and PSD, feature dimensional reduction: PCA and emotion state classification: ANN. The schematic diagram of the proposed framework is stated in Fig. 1.
Schematic diagram of the proposed framework.
In this article, the PCA based ANN method’s effectiveness is tested on DEAP and real time databases. First, the DEAP database comprises physiological EEG signals of thirty two individuals (16 male and female individuals), who range between 19 to 37 age. In the DEAP database, the EEG signal is recorded, when the individuals watch a 42-minute music video on different arguments. In this database, the EEG signal is sampled at 512 Hz frequency, and the signal is recorded from 32 locations like CP1, FC2, FC6, CP2, FC1, CP6, Fp1, AF3, Fp2, T7, FC5, P4, PO3, AF4, F8, P7, CP5, F3, T8, PO4, P3, Pz, Fz, P8, F4, C3, C4, F7, O1, Oz, Cz and O2. Few properties of the DEAP database are: number of individuals is 32, number of channels is 32, number of video/stimuli is 40, sampling rate is 128 Hz frequency, and labels (valence and arousal). DEAP database link:
In the real time database, the peripheral physiological EEG signals are acquired from 50 individuals, whose age ranges from 9 to 32. Few important characteristics of the real time database are: number of individuals is 50, number of channels is 50, number of video/stimuli is 2, sampling rate is 128 Hz frequency and labels (valence and arousal). The sample acquired EEG signals are depicted in Fig. 2.
Sample acquired EEG signals.
After signal denoising, a bandpass filter from 0.3 Hz to 45 Hz is employed for eliminating high and low frequency noises like electrooculography. The pre-processed EEG data is given to the feature extraction techniques differential entropy and PSD for extracting feature values from the denoised EEG signals. The differential entropy is applied to find the local non-linear information between adjacent elements, which are used to achieve a non-linear complexity of the EEG signals. The feature values extracted from the differential entropy are appropriate for classifying the human emotion states (valence and arousal) by using the energy spectrum and it is mathematically determined in Eq. (1).
where
By using Eq. (2), the discriminating pattern of the EEG signal is balanced between the high and low-frequency bands. Additionally, the power spectral properties of the EEG signals are divided into 5 frequency bands theta 4–8 Hz, lower-alpha 8–10 Hz, higher alpha 10–12 Hz, beta 12–30 Hz, and gamma 30 Hz. In this manuscript, the fourteen pairs of electrodes on the left and right hemisphere is chosen to estimate the asymmetry potential of the brain area for enhancing the power spectral properties of the EEG signal. Around 670 feature values are extracted using PSD and differential entropy, which are fed to the PCA technique for feature dimensionality reduction [36, 37].
The extracted 670 feature values are given to the PCA for dimensionality reduction by transforming the larger set of features into a smaller set of features with valuable information. The transformed feature values are named as principal components and it is a popular method used for predictive modelling and exploratory data analysis. The PCA is an effective method for drawing strong patterns from DEAP and real time databases by reducing the variance. In this scenario, the PCA concentrates on the co-variance and variance structure of the extracted feature values
The extracted 670 feature values are partitioned into different scales for decreasing the feature dimensions. Next, the correlation matrix
where
where
The dimensionally reduced 429 feature values are given to the ANN method for emotion recognition. In this manuscript, the proposed ANN method comprises 2 softmax layers and 3 stacked autoencoders for recognition. The arousal and valence emotions are 2 softmax layers, where it shares the results of unlabelled raw EEG data. However, the input and output layers comprise numerous nodes which mainly depends on the input and output feature values. The dimensionally reduced feature values move between the layers in ANN across several weight connections. The current layer accepts the feature values from the previous layers and further, calculates the weighted sum of all the inputs
where
Architecture of the ANN model.
As mentioned in Eq. (6), it holds only true neuron values with the linear function. Therefore, a non-linear function is employed for identifying the partial error sub-ordinates by utilizing chain rule [39], which is specified in Eqs (7) and (8).
where
where
In this scenario, the PCA based ANN method is implemented using a python software tool on a computer with 2 GB random access memory, windows 10 (64-bit operating system), Intel Core i5 processor and 1 TB hard disk. The developed PCA based ANN method efficiency was investigated in terms of F1-score, recall, and recognition accuracy on the DEAP and real-time databases. In the human emotion classification, the F1-score, recall, and recognition accuracy are the important statistical performance metrics, which are utilized to validate the efficiency of the proposed PCA based ANN method. The recognition accuracy is defined as the ratio of correct predictions to the total observations. Correspondingly, recall is defined as the ratio of correct positive predictions to the total positive observations. The mathematical illustration of the recognition accuracy and recall is depicted in the Eqs (9) and (10).
Similarly, the F1-score is defined as the harmonic mean of recall and precision value that is mathematically determined in Eq. (11), where FN indicates the false-negative examples and FP states the false positive examples of the error classification. TP states the true positive samples, which are precisely identified as low valence/arousal and TN states the true negative samples, which are precisely identified as high valence/arousal [40].
In this segment, the PCA based ANN method’s efficiency is validated on the real time database. Further, dissimilar cross-fold validations are accomplished to analyze the efficiency of the proposed PCA based ANN method. The experimental outcome of the proposed method is analyzed for overall channels of valence and Arousal on the real time database. By viewing table 1, the results of over-all channels (FH, FS, FH
Experimental results of overall channels in valence on the real time database
Experimental results of overall channels in valence on the real time database
Graphical evaluation of the proposed method in valence on the real time database.
Similar to Table 1, the experimental result of the proposed method is analysed for overall channels of Arousal on the real time database, which is indicated in Table 2. By investigating Table 2, the ANN classification technique with PCA attained maximum results on all channels related to the comparative SVM classification technique. In this scenario, the ANN classifier has attained an 84.47% of F1-score, 83.15% of recall and 92.33% of recognition accuracy in the human emotion classification. However, the comparative SVM method attained only 81.23% of F1-score, 82.36% of recall, and 89.97% of recognition accuracy. The graphical evaluation of the proposed method in Arousal on the real-time database is represented in Fig. 5.
Experimental results of overall channels in Arousal on the real time database
On the other hand, the proposed PCA based ANN method’s efficiency is tested on the real time multi-modal data such as body movements, facial expressions and EEG signals, and the obtained experimental results are compared with pre-trained models like AlexNet, and Graph convolutional networks. By investigating Table 3, the proposed PCA based ANN method achieved higher classification results compared to the existing models by means of recall and classification accuracy.
Experimental results of the proposed PCA based ANN method and the existing models on the real time multi-modal data
Graphical evaluation of the proposed method in Arousal on the real time database.
In this segment, the proposed PCA based ANN method’s efficiency is tested on the DEAP database. In the DEAP database, the PCA based ANN method utilizes 80:20% of the data for training and testing with a ten-fold cross-validation method. By viewing Table 4, the proposed PCA based ANN methodology attained f1-score of 96.79%, recall of 97.20% and accuracy of 97.36% in the EEG based emotion recognition. The obtained experimental outcome is higher compared to the PCA based SVM methodology, which achieved f1-score of 91.57%, recall of 91.23% and recognition accuracy of 92.52%. The graphical valuation of the proposed method in valence on the DEAP database is denoted in Fig. 6. In addition, the incorporation of PCA technique includes few benefits like low noise sensitivity, lack of data redundancy, and decreased requirements of memory and capacity.
Experimental results of overall channels in valence on the DEAP database
Experimental results of overall channels in valence on the DEAP database
Graphical evaluation of the proposed method in valence on the DEAP database.
The experimental results of the proposed method in Arousal on the DEAP database are stated in Table 5. By inspecting Table 5, the proposed method is individually investigated for over-all channels (FH, FS, FH
Experimental results of overall channels in Arousal on the DEAP database
Comparative results between PCA based ANN method and the existing methods
Graphical evaluation of the proposed method in Arousal on the DEAP database.
The comparative valuation between PCA based ANN method and the published methods are specified in Table 6. Yin et al. [15] utilized differential entropy as a feature extractor for extracting features from the EEG signals, which were acquired from the DEAP database. The hybrid deep learning models: LSTM and GCNN were implemented for emotion classification. The extensive experiments showed that the LSTM with the GCNN model achieved 90.60%, and 90.45% of accuracy for arousal and valence emotion states on the DEAP database. Gao et al. [20] combined GoogleNet and SVM for feature fusion and emotion classification. Hence, the experimental result on the DEAP database showed that the developed model has achieved 75.22% and 80.52% of recognition accuracy for arousal and valence emotion states.
Ullah et al. [29] used PCA and sparse projection coefficients for feature dimensionality reduction and emotion state classification. The experiments conducted on the DEAP database showed that the implemented model attained 74.50% and 82.80% of recognition accuracy for arousal and valence states on the DEAP database. Zhang et al. [33] integrated two logistic regression models and sparse auto-encoder for emotion state classification. The implemented model attained 80.80% and 73.10% of accuracy for arousal and valence emotion states on the DEAP database. Compared to these published methods, the PCA based ANN method attained significant classification performance in the human emotion state classification, whereas the proposed PCA based ANN method achieved 96.38% and 97.36% of accuracy for arousal and valence emotion states on the DEAP database. The comparative result is graphically specified in Fig. 8.
Comparative evaluation of the proposed and the existing methods.
In this article, a PCA based ANN method is proposed for human emotion state classification. After the acquisition of EEG signals from real time and DEAP databases, the band-pass filter from 0.3 Hz to 45 Hz is employed for eliminating high and low-frequency noises. Further, the feature values were extracted from the denoised EEG signal utilizing differential entropy and PSD. The multi-dimensional feature values were decreased by implementing PCA and further the dimensionally reduced features were given to the ANN classifier for emotion detection as arousal and valence. The conducted experiments showed that the PCA based ANN achieved 96.38% and 97.36% of accuracy for arousal and valence states on the DEAP database. In the real time database, the PCA based ANN attained 92.33% and 89.37% of accuracy for arousal and valence emotion states. In addition, the PCA based ANN is computationally effective by decreasing the extracted feature dimension and it consumes 24.22 and 32.12 seconds for data training on the DEAP and real time databases. As a future extension, the multi-modal data such as body movements, facial expressions, and EEG signals can be given as the input to the proposed model to further enhance emotion state classification. In addition to this, an ensemble based deep learning model can be integrated with the proposed model to further improve classification accuracy.
Funding
This study was not funded by any organization.
Data availability statement
The datasets generated during and/or analysed during the current study are available in the [DEAPdataset] repository, [https://www.eecs.qmul.ac.uk/mmv/datasets/deap/].
Footnotes
Conflict of interest
The authors declare that they have no conflict of interest.
Author’s Bios
