The aim of this study is to improve performance through feature extraction and sampling in the feature engineering stage of P300-based BCIs. I studied which method is effective among feature extraction methods, whether feature extraction through deep learning algorithm is effective, and how to improve performance by solving data imbalance problem with sampling technique. In this study, I focused on feature extraction and sampling to evaluate the performance of each technique.
The models extracted by deep learning were also significantly more affected on training duration than models without deep learning feature extraction methods. This result suggests that deep learning can be used as feature extraction methods in the P300-based BCIs, and borderline-SMOTE can improve the accuracy and ITR by preventing the classification result from being biased towards non-target class.
- Introduction to Brain-Computer Interface
- P300-based Brain-Computer Interfaces
- Feature extraction for BCIs
- Data imbalance problem for BCIs
The P300, first reported by Sutton (Sutton et al., 1965), is one of the event-related potentials (ERPs) components and refers to the positive peak amplitude that appears between 250 and 500 ms after the onset of the cognitive process ( Polich, 2007). To apply the P300-based BCI system to real life, a smart home BCI system that can control living devices has also been investigated (Kim et al., 2019). Liu achieved a good information transfer rate (ITR) at small blinking epoch numbers through a convolutional neural network (CNN) including a batch normalization layer (M. Liu et al., 2018).
- Experimental design
- EEG acquisition and preprocessing
Simultaneously, the EEG device is used to measure the user's EEG, and the measured EEG is transmitted to the PC for analysis and middleware. The EEG is then analyzed on a PC through MATLAB to classify the user's intentions and output the classification results to the home device for control through the middleware. The home device operates based on the transmitted result, and the process by which the MATLAB analysis result is transmitted to the device is connected to TCP (Figure 4).
EEG data were acquired by the amplifier (actiCHamp, Brain Products GmBH, Germany) with 32-channel electrodes and the sampling rate was 500 Hz. EEG data were preprocessed through a pipeline in the following order: high-pass filtering (> 0.5 Hz), bad channel rejection, common average re-referencing, low-pass filtering (< 50 Hz) and object subspace reconstruction (ASR). EEG data were epoched from −200 to 600 ms after stimulus onset, and baseline correction was performed with baseline data from −200 ms to onset.
After epoching, the data were standardized by removing the mean and scaling to unit variance.
Principal Component Analysis (PCA)
Deep learning approach
- Convolutional Neural Network (CNN)
- Convolutional Long-Short Term Memory (ConvLSTM)
- Stacked Auto-Encoder (SAE)
The second model applies two convolution layers than the first model, but the shape of the input feature is different. Therefore, I mapped the preprocessed EEG data to the same location as the 10-20 system montage of EEG (Montage Mapping). Like the first model, the second model is divided into two models (Model 2-1 and Model 2-2) according to the core size.
This method is proposed to solve the decomposition problem where the data is not well trained due to the gradient vanishing/exploding problem in a deeper neural network. The first is a stacked auto-encoder step to extract only important features in the input data. The second is a fine-tuning process where supervised learning is done by accumulating hidden layers that are copied from the learned weights and new hidden layers.
In other words, the stacked autoencoder step reduces the dimension of the input data, extracts important features and trains the classification model. The first to third hidden layers are reduced by 1/2 of the input shape and then stretched again, finally making the output shape the same as the input shape. Then the weights of 1-3 hidden layers were copied from the stacked autoencoder and two new hidden layers were stacked on top.
At this time, up to the third hidden layer, the weights learned in the stacked autoencoder step are copied and the weight freeze is performed, so that the learning is not performed in the fine-tuning step (Figure 12).
Techniques for data balancing
- Random Over-Sampling (ROS)
- Synthetic Minority Over-sampling Technique (SMOTE)
- Borderline-SMOTE (B-SMOTE)
- Support Vector Machine SMOTE (SVM-SMOTE)
- Adaptive Synthetic sampling (ADASYN)
The borderline minority samples (filled squares) (c) Borderline synthetic minority samples (hollow squares) (Han et al., 2005). SVM-SMOTE, another version of SMOTE that uses the support vectors improved by Nguyen et al., is similar to borderline SMOTE except that the support vector machine is used to define the boundary, which is a decision boundary (Nguyen et al., 2011). Find the m-nearest neighbors for each minority class support vector and the number of majority class instances in m nearest neighbors;.
If the number of majority class instances is less than half, the distance between the minority class support vector and the m nearest neighbors is multiplied by a random value between 0 and 1 and added to the minority class support vector. ADASYN, an adaptive oversampling method proposed to solve the problem of SMOTE, adjusts the number of synthetic data to be generated according to the distribution of surrounding majority class instances (H. He et al., 2008). Determine the number of synthetic data ( ) to be generated for each instance of the minority class.
- Random Under-Sampling (RUS)
- Neighborhood Cleaning Rule (NCR)
- Tomek’s Links (Tomek)
- Weighted Under-Sampling bagging (WUS)
Support Vector Machine (SVM)
If the support vector machine cannot classify the data linearly, it uses kernel tricks to improve the classification performance by mapping the data in lower to higher dimensional space. Fit binary classification model that classifies target and non-target in SVM with training block data;. Extract the target probability for each stimulus by feeding the test blocks data into the trained SVM model;.
The stimulus with the highest target probability is classified as the target stimulus, and the remaining stimuli are classified as nontarget stimuli.
BCI performance evaluation
Information Transfer Rate (ITR)
BCI performance improvement by better features
Performance of the all home appliances BCI system
Average accuracy of test blocks of the electric light BCI system for each feature extraction method. It is the result of the average of all test blocks of all subjects and the red line represents the chance level. Average accuracy of test blocks of Bluetooth speaker BCI system for each feature extraction method.
Compared with Recall, Precision and , F1 score, there was a large difference between the Raw model and the CNN Full ch model. The CNN Full ch model showed the opposite trend, meaning that most models were classified as targets. F1 scores were higher for models without feature extraction, that is, they showed higher performance for binary classification.
Oddly, all the Morph., DCNN, ConvLSTM, and SAE models classify the test data as non-target. Average accuracy of BCI system test blocks for all home appliances for each feature extraction method.
Comparing performances of two CNN-applied models
According to this section, the average accuracy of the CNN Full ch model is 87.13%, the average of 10 trials, and the average accuracy of the CNN model is 85.67%, the average of 10 trials.
BCI performance improvement by balancing data
Application of sampling techniques in the CNN Full ch model
In the Bluetooth speaker BCI system, the accuracy of the CNN Full ch model without balancing was the best, and the performance of oversampling methods such as SMOTE and ADASYN was significantly reduced. Considering the BCI system for all household appliances, the accuracy of the model without balancing was the highest, and all the models except the ROS and ADASYN techniques were significantly reduced.
Application of sampling and ensemble techniques in the Raw model
In the Bluetooth speaker BCI system, the performance of the model using borderline SMOTE was the best, but not a significant improvement in performance. In the BCI system for all home appliances, the performance of borderline-SMOTE, which had the highest performance on the three home appliances, was marginally significant (p-. On the other hand, SVM-SMOTE, RUS, and Tomek techniques significantly reduced BCI performances (Figure 30).
The results of applying methods dealing with the data imbalance problem to poor
Information Transfer Rate (ITR) comparison
Effect of borderline-SMOTE on decision boundary
Effect of the ratio between target and non-target data
As the target ratio increases, the Recall value increases and the Precision value decreases (Figure 37). Ultimately, it had a high correlation with the test accuracy resulting from binary classification (correlation between recall and test accuracy: 0.96, between precision and test accuracy: -0.91), and a significant positive correlation with the accuracy of BCI system , which is evaluated as the target probability (table 12). Performance of raw model with borderline SMOTE according to target and non-target data conditions in the test set.
The accuracy of the BCI system is the accuracy to select a target from stimuli presented via target probability.
Effect of the training duration
At this point, the performance of the model extracted by learning spatial and temporal information through CNN using full channel size kernel was the best, 87.13% (accuracy is the average of 10 trials). The performance of the model fitted to the SVM classifier was also 86.06% with only ERP waveform features, showing good accuracy after the 'CNN Full ch' model. But the reason why the test block accuracy of the CNN Full ch model was good is that the smart home BCI system is a process of selecting the highest target probability among the many icons.
In other words, the binary classification may achieve better model performance without deep learning algorithms, but the CNN Full ch model may perform better in target detection through target probability among multiple stimuli. When the data imbalance problem of the oddball paradigm was solved using sampling or ensemble techniques, the results were different from those of feature extraction. The borderline SMOTE provides an accuracy improvement of 1.21%, comparable to the accuracy of the CNN Full ch model.
To alleviate this, applying Borderline-SMOTE has increased the distribution of target class data near the decision boundary, and reduced the tendency to classify target class test data as a non-target class (G. Wu & Chang, 2003). Because the model decision boundary of the Poor Performers is less clear, the borderline SMOTE can clarify the decision boundary by creating synthetic data around the decision boundary. This can be interpreted that the performance of the CNN model is more affected by the number of training, and more training blocks will yield more accuracy gains than the Raw model.
If the oversampling method is not applied, the feature-extracted model via CNN has the best performance, and if the oversampling method is applied, fitting the raw ERP waveform can be expected to improve BCI performance. Proceedings of the 31st Annual International Conference of the IEEE Engineering in Medicine and Biology Society: Engineering the Future of Biomedicine, EMBC 2009. Proceedings of the 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society08 - EMPcareonal through technology .” https://doi.org/10.1109/iembs.