An emotion electroencephalogram classification method based on self-supervised contrastive learning

By employing a self-supervised contrastive learning method and pre-training emotional EEG signals using the SEED dataset, combined with a multilayer perceptron classifier, the problem of time-consuming EEG data annotation was solved, achieving efficient emotion recognition and downstream task adaptation with an accuracy of 93.77%.

CN117204864BActive Publication Date: 2026-06-23HANGZHOU DIANZI UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HANGZHOU DIANZI UNIV
Filing Date
2023-09-13
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

In existing technologies, the field of emotion-based brain-computer interfaces faces bottlenecks in real-world applications due to the time-consuming and resource-intensive nature of EEG data annotation, resulting in limited label recognition of emotions.

Method used

A self-supervised contrastive learning method is adopted. Emotional EEG signals are pre-trained using the SEED dataset, the encoder part is retained and connected to a multilayer perceptron classifier, and contrastive learning of the representation of unlabeled EEG signals in a high-dimensional feature space is used to combine with fine-tuning the training model for emotion recognition.

Benefits of technology

It achieved a recognition accuracy of up to 93.77% for emotional EEG signals and can be fine-tuned for other downstream tasks, reducing reliance on labeled data.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117204864B_ABST
    Figure CN117204864B_ABST
Patent Text Reader

Abstract

The application discloses an emotion electroencephalogram classification method based on self-supervised contrast learning, comprising the following steps: step 1, preprocessing electroencephalogram data, normalizing data, dividing into a training set and a test set, and dividing electroencephalogram signal data into segments of 1s as input; step 2, constructing a pre-training framework module; step 3, using electroencephalogram data without labels for self-supervised contrast pre-training; step 4, retaining the encoder part in the pre-training process, connecting a multi-perception machine to the output thereof, and constructing a one-dimensional CNN classifier; and step 5, using data with labels to train the constructed one-dimensional CNN classifier to obtain a required classification network, so that the emotion electroencephalogram signal can be effectively classified, the highest recognition accuracy can reach 93.77%, and the method can also be fine-tuned to other downstream tasks.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of emotion EEG classification technology, specifically to an emotion EEG classification method based on self-supervised contrastive learning. Background Technology

[0002] Significant progress has been made in the field of emotion-based brain-computer interfaces, with researchers able to effectively interpret labeled electroencephalogram (EEG) data collected in controlled laboratory environments. However, the time-consuming and resource-intensive nature of EEG data annotation limits its practical application in real-world scenarios. How to identify emotions using limited labels has become a new research and application bottleneck. Summary of the Invention

[0003] This invention addresses the shortcomings of existing technologies by proposing an emotion EEG classification method based on self-supervised contrastive learning. This method can effectively classify emotion EEG signals with a maximum recognition accuracy of 93.77%, and can also be fine-tuned for other downstream tasks.

[0004] To solve the above-mentioned technical problems, the technical solution of the present invention is as follows:

[0005] This invention uses the SEED dataset as samples. After preprocessing the emotional EEG data, it performs pre-training. After training, only the encoder part is retained, and then a multilayer perceptron classifier is connected for fine-tuning training for emotion recognition and classification. The aim is to learn a high-dimensional general representation of the signal through self-supervised pre-training, comparing the representations of unlabeled EEG signals in a high-dimensional feature space after different signal transformations. Then, in the fine-tuning stage, the model is fine-tuned to specific downstream classification tasks with minimal label transfer. Specifically, the following steps are included:

[0006] Step 1: Preprocess the EEG data, normalize the data, divide it into training and testing sets, and divide the EEG signal data into 1-second segments as input.

[0007] Step 2: Construct the pre-trained framework module.

[0008] Step 3: Perform self-supervised contrastive pre-training using unlabeled EEG data.

[0009] Step 4: Retain the encoder part from the pre-training process, connect its output to a multiple perceptron, and construct a one-dimensional CNN classifier.

[0010] Step 5: Use labeled data to train the constructed one-dimensional CNN classifier to obtain the desired classification network.

[0011] Preferably, in step 1, the preprocessing of the EEG data includes signal noise reduction, downsampling to 200Hz, and filtering with a 0-75Hz low-pass filter. The preprocessed signal can filter out most of the noise and contamination in the signal acquisition process.

[0012] Preferably, in step 2, a pre-training framework module is constructed. The main structure of the framework is a two-stream branch structure, with one branch being a training branch and the other a target branch. The training branch consists of an encoder, a projector, and a predictor, while the target branch consists of an encoder and a projector. The parameters of the two branches are not shared. The encoder is composed of a one-dimensional convolutional neural network, while the projector and predictor are composed of a multilayer perceptron.

[0013] Preferably, in step 3, compared to the pre-training process, the model parameters of the training branch are updated using gradient descent, while the model parameters of the target branch are updated using the moving exponential average of the parameters of the training branch. Finally, the prediction result p of the training branch... θ The projection result z′ of the target branch ε Calculate the contrast loss:

[0014] 3-1. The EEG signal segment x is first processed by two random signal transformation methods V(x) and U(x) to obtain the transformed signals v and v'.

[0015] 3-2. High-dimensional features y and y' of two transformed signals are obtained through a one-dimensional convolutional neural network f(x).

[0016] 3-3. The EEG feature data of the training branch needs to go through a projector and a predictor to obtain feature p, while the data of the target branch only goes through a projector to obtain feature z.

[0017] 3-4. Calculate the contrastive loss using features p and z. The closer the features are, the smaller the loss, and vice versa.

[0018] Preferably, in step 4, the encoder training method for retaining the pre-training stage is as follows:

[0019] 4-1. The network starts by building a convolutional layer as input, followed by a BN layer and a ReLU layer.

[0020] 4-2. Subsequently, a residual connection module with max pooling is connected. The module mainly consists of a one-dimensional convolution, a batch normalization (BN) layer, a ReLU activation unit, a dropout layer, and another convolutional layer. The dropout layer is mainly used to randomly disable some neurons in the network to prevent overfitting.

[0021] 4-3. The third part is a one-dimensional convolutional layer module stacked 8 times. This module first goes through BN and ReLU, and then through convolution, BN, ReLU, Dropout, and convolution again, which also includes residual connections from the module's input.

[0022] 4-4. The encoder output is obtained after passing through a BN and average pooling layer.

[0023] Preferably, in step 5, the encoder output should be followed by a fully connected multilayer sensor classifier to perform the emotion classification task.

[0024] 5-1. A multilayer perceptron starts with a fully connected layer with an output dimension of 384, followed by a batch normalization (BN) layer and a relu layer, and then a dropout layer.

[0025] 5-2. Repeat the structure in step 4-6, with the dimension of the fully connected layer being 192.

[0026] 5-3. Finally, a fully connected layer is used to finish the output, which has a dimension of 3.

[0027] This invention has the following characteristics and beneficial effects:

[0028] By employing the above technical solution, the training branch learns, imitates, and compares with the target branch during the unlabeled pre-training process, enabling it to learn high-dimensional general representations of EEG signals without requiring a large number of negative samples. This allows the pre-trained encoder to achieve excellent emotion classification accuracy even with fine-tuning using only partially labeled data. Furthermore, the learned general representations can be used for other downstream classification tasks. Attached Figure Description

[0029] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0030] Figure 1 This is a flowchart of the present invention;

[0031] Figure 2 The structure of the classifier in this embodiment of the invention;

[0032] Figure 3 This is a framework diagram of the pre-training framework module in an embodiment of the present invention. Detailed Implementation

[0033] It should be noted that, unless otherwise specified, the embodiments and features described in the present invention can be combined with each other.

[0034] In the description of this invention, it should be understood that the terms "center," "longitudinal," "lateral," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," and "outer," etc., indicating orientations or positional relationships based on the orientations or positional relationships shown in the accompanying drawings, are only for the convenience of describing the invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of the invention. Furthermore, the terms "first," "second," etc., are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, a feature defined with "first," "second," etc., may explicitly or implicitly include one or more of that feature. In the description of this invention, unless otherwise stated, "a plurality of" means two or more.

[0035] In the description of this invention, it should be noted that, unless otherwise explicitly specified and limited, the terms "installation," "connection," and "linking" should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal connection of two components. Those skilled in the art will understand the specific meaning of the above terms in this invention based on the specific circumstances.

[0036] This embodiment uses the SEED dataset as samples, preprocessing and segmenting the emotional EEG data, then normalizing the data. First, a contrastive learning method is used for pre-training, enabling the model to learn a high-dimensional, general representation of the EEG data. Then, the pre-trained encoder is extracted, other modules are discarded, and a classifier composed of multilayer perceptrons is connected. Labeled data is then used as input to fine-tune the entire model, allowing the high-dimensional feature representation obtained from pre-training to be fine-tuned for specific data distributions and classification tasks. This method can effectively classify emotional EEG signals, achieving a maximum recognition accuracy of 93.77%, and can also be fine-tuned for other downstream tasks.

[0037] Specifically, such as Figure 1 As shown, it includes the following steps:

[0038] Step 1: Use an EEG acquisition device to collect human brainwave emotion signals. The EEG signal source is obtained using the international 10-20 lead system, and the sampling frequency is 1000Hz.

[0039] Step 2: Downsample the acquired EEG data to 200Hz and perform bandpass filtering from 0-75Hz.

[0040] Step 3: Segment the processed EEG data, dividing it into segments of 1 second each, and expanding the data. The input data dimension is 1*12400. The data is then normalized. The preprocessed EEG data includes unlabeled EEG data and labeled EEG data.

[0041] Step 4: The preprocessed unlabeled EEG data is input into the pre-training framework. During training, the unlabeled EEG data undergoes two different random data amplification processes to obtain two different data perspectives from the same data source. These perspectives are then used as inputs for the two branches for training. The parameters of the training branch are updated using the gradient descent algorithm, while the target branch has no gradient. Its parameters are updated using the moving exponential average of the training branch. Given a decay rate τ∈[0,1], the parameter ξ is obtained as ξ=τξ+(1-τ)θ, where ξ on the right side of the equation is the original parameter of the target branch, and θ is the parameter of the training branch.

[0042] Specifically, such as Figure 3 As shown, the pre-trained framework module includes a training branch and a target branch. The training branch consists of an encoder, a projector, and a predictor. The target branch consists of an encoder and a projector. The parameters of the two branches are not shared. The encoder is composed of a one-dimensional convolutional neural network, and the projector and predictor are composed of a multilayer perceptron.

[0043] Step 5: Contrastive learning learns a general representation of the data by minimizing the differences between the output features of the two branches during the training process.

[0044] Step 6: After pre-training is complete, retain only the encoder part and its parameters, and remove the other parts.

[0045] Step 7: Connect a multilayer perceptron after the encoder in the training branch to form a classifier. The parameters of the multilayer perceptron are randomly initialized.

[0046] Step 8: Use labeled EEG data to perform regular fully supervised training on the classifier built in Step 7 to obtain a one-dimensional convolutional neural network model that can use EEG data for emotion recognition.

[0047] Specifically, such as Figure 2 As shown, the encoding method of the encoder in the constructed classifier is as follows:

[0048] S8-1. The network starts by building a convolutional layer as the entry point. The input is the EEG data after data segmentation, with an input dimension of 1×12400. Then, a batch normalization (BN) layer and a ReLU activation layer are connected. The one-dimensional convolutional layer with BN and ReLU has 12 filters, a convolutional kernel of 1*32, a stride of 1*1, and an output dimension of 12×6200.

[0049] S8-2, followed by a residual connection module with max pooling, takes the output of S7-1 as input. The residual connection module mainly consists of a one-dimensional convolution, a BN layer, a ReLU activation unit, a Dropout layer, and another convolutional layer. The one-dimensional convolutional layer with max pooling residual blocks starts with a convolutional layer, goes through BN and ReLU activation, performs Dropout, ends with another one-dimensional convolutional layer, and finally adds the residual blocks after max pooling. The max pooling kernel size is 1*3, the stride is 2, the convolutional kernel size is 1*33, and the output dimension is 24×3100.

[0050] S8-3. Next is a set of convolutional layer modules stacked 8 times. The convolutional layer module first goes through BN and ReLU, and then through convolution, BN, ReLU, Dropout, and convolution. It also contains residual blocks from the input of the convolutional layer modules stacked 8 times.

[0051] The outputs obtained from S8-4 and S8-3 are then passed through a BN and an average pooling layer to obtain the encoder output, which has a dimension of 1×384.

[0052] S8-5. The multilayer perceptron starts with a fully connected layer with a 384-dimensional input feature, followed by a BN layer and a ReLU layer, and then a Dropout layer.

[0053] S8-6. Repeat the structure in step S8-5 once, where the output dimension of the fully connected layer is 192-dimensional.

[0054] S8-7. Finally, a fully connected layer is used to finish the output, which has a dimension of 3 and is then processed by softmax to obtain the classification result.

[0055] A flowchart of the entire process, such as Figure 1 As shown, using the SEED dataset as an example, 16 subjects were selected for the experiment. Each subject participated in 3 experiments, with 62 channels recorded in each experiment. After preprocessing and data segmentation, the samples were obtained as input to the model network. Table 1 compares the three-class classification performance of the fully supervised classification model based on the SEED dataset.

[0056] Table 1 Comparison of model classification accuracy

[0057] Classification model ResNet(2021) 93、43% The model in this study was not pre-trained. 90、64% The pre-trained model used in this study 93、77%

[0058] The self-supervised learning model, combining pre-training and fine-tuning, achieved a respectable accuracy of 93.77% in the three-class classification of emotion recognition, representing an improvement over other classification algorithms. Table 2 compares the model's three-class classification performance using different percentages of labels during the fine-tuning phase, based on the SEED dataset.

[0059] Table 2 Comparison of classification accuracy of models with different label rates

[0060]

[0061] It can be concluded that after pre-training, the model can still achieve a considerable accuracy when fine-tuning the training using partial labeled data, thus reducing the need for data labels in subsequent related research.

[0062] The embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. For those skilled in the art, various changes, modifications, substitutions, and variations can be made to these embodiments, including components, without departing from the principles and spirit of the present invention, and these variations still fall within the protection scope of the present invention.

Claims

1. A method for classifying emotion-related EEG patterns based on self-supervised contrastive learning, characterized in that, Includes the following steps: S1. Acquire and preprocess EEG data. Segment the preprocessed EEG data into segments with a time window of 1 second in length and perform normalization transformation. The EEG data includes unlabeled EEG data and labeled EEG data. S2. Construct a pre-training framework module, which includes a training branch and a target branch. The training branch consists of an encoder, a projector, and a predictor. The target branch consists of an encoder and a projector. The parameters of the two branches are not shared. The encoder is composed of a one-dimensional convolutional neural network. The projector and predictor are composed of a multilayer perceptron. S3. Input unlabeled EEG data collected in the laboratory according to standard experimental paradigms into the pre-training framework module for self-supervised comparative pre-training. S4. Retain the encoder of the training branch in the pre-training process, and connect the output of the encoder of the training branch to a multilayer perceptron to form a one-dimensional convolution classifier. S5. The constructed one-dimensional convolutional integrator is trained using labeled EEG data collected in the laboratory according to standard experimental paradigms to obtain the required classification network.

2. The emotion EEG classification method based on self-supervised contrastive learning according to claim 1, characterized in that, The preprocessing method for the EEG data includes principal component analysis, noise reduction, downsampling to 200Hz, and filtering with a 0-75Hz low-pass filter.

3. The emotion EEG classification method based on self-supervised contrastive learning according to claim 1, characterized in that, In step S1, the segmented EEG data are normalized using a mean of 0 and a variance of 0.

5.

4. The emotion EEG classification method based on self-supervised contrastive learning according to claim 1, characterized in that, The unlabeled and labeled EEG data were collected in the same way: the subjects watched relevant film clips to induce related emotions, and then the data was collected by an EEG acquisition device.

5. The emotion EEG classification method based on self-supervised contrastive learning according to claim 1, characterized in that, The specific method for step S3 is as follows: S3-1. First, randomly select any two signal transformation methods V(x) and U(x) from the available signal transformation method set and apply them to the segmented signal x of the normalized EEG data to obtain the transformed signals v and u. S3-2. Signals v and u are convolved through a one-dimensional convolutional neural network f(x) to obtain high-dimensional features y and y' of the two transformed signals. S3-3, the high-dimensional features Through a projector consisting of a multilayer perceptron in the training branch. and a predictor , to obtain features The high-dimensional features After the projector in the target branch get ; S3-4, Feature p and Feature z are derived from the formula The contrast loss is calculated as follows: the closer the features are, the smaller the loss, and vice versa.

6. The emotion EEG classification method based on self-supervised contrastive learning according to claim 5, characterized in that, The model parameters of the training branch are updated via backpropagation using gradient descent, while the model parameters of the target branch are updated using the moving exponential average method of the training branch.

7. The emotion EEG classification method based on self-supervised contrastive learning according to claim 5, characterized in that, In step S4, the encoder's encoding method is as follows: S4-1. The network starts with a convolutional layer as the entry point, and the input is the data y after signal processing in S3-2. The input dimension is 1×12400. Then, a batch normalization layer and a ReLU activation layer are connected, and the output dimension is 12×6200. S4-2 is then connected to a residual connection module with max pooling. The input is the output from S4-1. The residual connection module includes a one-dimensional convolution, a BN layer, a ReLU activation unit, a Dropout layer, and another convolutional layer. The output dimension is 24×3100. S4-3, a convolutional layer module that is repeatedly stacked 8 times, takes the output of S4-2 as input. This convolutional layer module first passes through BN and ReLU, and then through convolution, BN, ReLU, Dropout, and convolution. It also contains residual blocks from the input of the convolutional layer module that is repeatedly stacked 8 times. The outputs obtained from S4-4 and S4-3 are then passed through a BN and an average pooling layer to obtain the encoder output, which has a dimension of 384.

8. The emotion EEG classification method based on self-supervised contrastive learning according to claim 7, characterized in that, In step S5, the classification method of the one-dimensional convolution classifier is as follows: S5-1, the multilayer perceptron starts with a fully connected layer, the input of which is the output of the encoder in S4-4, and the output feature vector has a dimension of 384. Then it is followed by a batch normalization layer and a ReLU activation layer, and then a dropout random deactivation layer. Finally, the output feature vector still has a dimension of 384. S5-2. Repeat the structure in step S5-1 once. The input is the 384-dimensional output of S5-1, where the dimension of the fully connected layer is 192. Finally, the output is a 192-dimensional feature vector. S5-3. Finally, a fully connected layer is used to finish and obtain the output. The input dimension is the 192-dimensional feature vector output from S5-2, and the output dimension is 3. After softmax operation, the classification result is obtained.