Hybrid brain-computer interface control method and device based on motor imagery and steady-state visual evoked potential, and computer readable storage medium

By combining adaptive filtering and independent component analysis with wavelet transform, fast Fourier transform, and a fusion model of bidirectional long short-term memory network and attention mechanism, the problems of poor adaptability and insufficient robustness in traditional brain-computer interface control methods are solved, and efficient EEG signal processing and accurate control command generation are achieved.

CN122219771APending Publication Date: 2026-06-16CHINA ACADEMY OF INFORMATION & COMM

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINA ACADEMY OF INFORMATION & COMM
Filing Date
2026-03-23
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing brain-computer interface control methods based on motor imagery and steady-state visual evoked potentials suffer from poor adaptability to traditional classification algorithms, failing to effectively address differences in EEG signals between different users and at different times for the same user, resulting in large fluctuations in recognition rate and insufficient system robustness.

Method used

A preprocessing method combining adaptive filtering and independent component analysis is adopted, and features are extracted by combining wavelet transform and fast Fourier transform. A fusion model of bidirectional long short-term memory network and attention mechanism is used for feature weighting and fusion, and control commands are generated through an ensemble learning classifier.

🎯Benefits of technology

It improves the purity of EEG signals, reduces noise interference, enhances the robustness and recognition accuracy of the system in complex scenarios, adapts to the differences in EEG signals of different users and the signal changes of the same user at different times, and reduces the fluctuation of recognition rate.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122219771A_ABST
    Figure CN122219771A_ABST
Patent Text Reader

Abstract

The application relates to the technical field of brain-computer interfaces, and particularly relates to a hybrid brain-computer interface control method and device based on motor imagery and steady-state visual evoked potentials, and a computer readable storage medium. The control method comprises the following steps: acquiring original electroencephalogram signals of a user, wherein the original electroencephalogram signals comprise motor imagery electroencephalogram signals and steady-state visual evoked potential signals; performing adaptive filtering processing and independent component analysis processing on the collected original electroencephalogram signals; performing wavelet transform on the processed motor imagery signals to extract time-frequency features in a predetermined frequency band; performing fast Fourier transform on the processed steady-state visual evoked potential signals to extract frequency domain features at a stimulation frequency and a harmonic frequency; inputting the extracted time-frequency features and frequency domain features into a fusion model constructed based on a bidirectional long short-term memory network and an attention mechanism to perform weighted fusion, to obtain fused features; and inputting the fused features into a learning classifier to perform classification, to generate a control instruction.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of brain-computer interface technology, such as a hybrid brain-computer interface control method and device based on motor imagery and steady-state visual evoked potentials, and a computer-readable storage medium. Background Technology

[0002] Brain-computer interface is a human-computer interaction method that does not rely on peripheral nerves and muscle tissue. It collects and analyzes electrical signals generated by the brain, converts them into control commands, and enables control of external devices.

[0003] In related technologies, a hybrid brain-computer interface control method based on Motor Imagery (MI) and Steady-State Visual Evoked Potential (SSVEP) simultaneously acquires the user's MI and SSVEP signals using an electroencephalogram (EEG) device; preprocesses the raw signals by filtering and noise reduction; extracts event-related desynchronization / synchronization features from the MI signals and extracts harmonic components or signal-to-noise ratio features of specific frequencies from the SSVEP signals; performs feature fusion using weighted averaging or serial fusion; and uses traditional classification algorithms such as support vector machines or linear discriminant analysis for command recognition and output.

[0004] During implementation, at least the following problems exist: Traditional classification algorithms are poorly adapted to complex EEG signals, exhibiting significant fluctuations in recognition rates among different users and at different times for the same user, demonstrating insufficient robustness. Furthermore, the signal processing workflow is cumbersome, with inefficient preprocessing algorithms such as filtering and noise reduction.

[0005] It should be noted that the information disclosed in the background section above is only used to enhance the understanding of the background of this application, and therefore may include information that does not constitute prior art known to those skilled in the art. Summary of the Invention

[0006] To provide a basic understanding of some aspects of the disclosed embodiments, a brief summary is given below. This summary is not intended as a general commentary, nor is it intended to identify key / important components or describe the scope of protection of these embodiments, but rather as a prelude to the detailed description that follows.

[0007] This disclosure provides a hybrid brain-computer interface control method and device based on motor imagery and steady-state visual evoked potentials, as well as a computer-readable storage medium, to solve the technical problems of related technologies that use single traditional classification algorithms such as support vector machines and linear discriminant analysis, which have poor adaptability to complex EEG signals, cannot effectively cope with individual differences in EEG signals between different users and changes in EEG signals of the same user at different times, resulting in large fluctuations in command recognition rate and insufficient system robustness.

[0008] In some embodiments, a hybrid brain-computer interface control method based on motor imagery and steady-state visual evoked potentials is provided, comprising: acquiring the user's raw EEG signals, the raw EEG signals including motor imagery EEG signals and steady-state visual evoked potential signals; performing adaptive filtering and independent component analysis on the acquired raw EEG signals; performing wavelet transform on the processed motor imagery signals to extract time-frequency features within a predetermined frequency band; performing fast Fourier transform on the processed steady-state visual evoked potential signals to extract frequency domain features at the stimulus frequency and harmonic frequencies; inputting the extracted time-frequency features and frequency domain features into a fusion model constructed based on a bidirectional long short-term memory network and an attention mechanism for weighted fusion to obtain fused features; and inputting the fused features into a learning classifier for classification to generate control commands for controlling external devices.

[0009] Optionally, the acquired raw EEG signals are subjected to adaptive filtering and independent component analysis, including: using a normalized least mean square adaptive filter, with a sine and cosine signal synchronized with power frequency interference as a reference, to estimate and filter out power frequency interference components; using the FastICA algorithm to perform blind source separation on the adaptively filtered signal, identifying and removing independent components related to electrooculography and electromyography, and then reconstructing the EEG signal.

[0010] Optionally, the extracted time-frequency features and frequency domain features are input into a fusion model constructed based on a bidirectional long short-term memory network and an attention mechanism for weighted fusion to obtain fused features. This includes: inputting the time-frequency feature sequence and the frequency domain feature sequence into two independent bidirectional long short-term memory networks respectively to learn their respective temporal context representations; calculating the original attention weights for the temporal context representation at each time step; calculating the recognition confidence of the time-frequency features and frequency domain features at each time step in real time using an auxiliary classifier; multiplying the original attention weights by the corresponding recognition confidences and then normalizing them to obtain fusion weights; and using the fusion weights to perform a weighted summation of the temporal context representations to output a fused feature vector.

[0011] Optionally, the step of calculating the recognition confidence of time-frequency features and frequency domain features at each time step in real time through an auxiliary classifier includes: inputting the temporal context representation of each time step into the auxiliary classifier, which consists of a fully connected layer and a softmax function, outputting the probability of each category corresponding to the feature stream at that time step, and taking the maximum value as the recognition confidence at that time step. Optionally, the step of multiplying the original attention weights with the corresponding recognition confidence and then normalizing them to obtain the fusion weights includes: for each time step of each feature stream, multiplying the original attention weights with the recognition confidence corresponding to that time step to obtain a corrected attention score; performing softmax normalization on the corrected attention scores of all time steps of each feature stream to obtain the fusion weights of each time step of the feature stream.

[0012] Optionally, the step of using dynamic fusion weights to perform weighted summation of the temporal context representations and outputting a fused feature vector includes: for each feature stream, using the fusion weights of the feature stream to perform weighted summation of the temporal context representations at each time step to obtain the fused context representation of the feature stream; combining the fused context representations corresponding to the time-frequency feature streams and the fused context representations corresponding to the frequency domain feature streams to output the final fused feature vector; wherein the combination is to take the average of the two fused context representations, or to concatenate the two fused context representations and then reduce their dimensionality through a linear transformation.

[0013] Optionally, the steps of constructing a fusion model based on bidirectional long short-term memory networks and attention mechanisms include: constructing two parallel bidirectional long short-term memory networks; constructing a confidence evaluation subnetwork consisting of fully connected layers and a softmax function; and constructing an attention fusion subnetwork.

[0014] Optionally, the fused features are input into a learning classifier for classification to generate control commands, including: inputting the fused features into a support vector machine model and a random forest model respectively to obtain preliminary classification class decision values ​​and class probabilities; concatenating the class decision values ​​output by the support vector machine model with the class probability vector output by the random forest model to form a new feature vector; inputting the new feature vector into a meta-learner to obtain the final classification result, and generating corresponding control commands based on the classification result.

[0015] In some embodiments, a hybrid brain-computer interface control device based on motor imagery and steady-state visual evoked potentials is provided, including a processor and a memory storing program instructions, characterized in that the processor is configured to execute the hybrid brain-computer interface control method based on motor imagery and steady-state visual evoked potentials as described in any of the above embodiments when running the program instructions.

[0016] In some embodiments, a computer-readable storage medium is provided storing program instructions that, when executed, cause a computer to perform the hybrid brain-computer interface control method based on motor imagery and steady-state visual evoked potentials as described in any of the above embodiments.

[0017] The hybrid brain-computer interface control method and device based on motor imagery and steady-state visual evoked potentials, and the computer-readable storage medium provided in this disclosure can achieve the following technical effects: This invention presents a hybrid brain-computer interface control method based on motor imagery and steady-state visual evoked potentials. It employs a preprocessing approach combining adaptive filtering and independent component analysis (ICA). Both algorithms are highly efficient signal processing algorithms. Adaptive filtering can track and filter out power frequency interference in real time and quickly, while ICA efficiently separates and reconstructs EEG signals from artifacts. Compared to existing ordinary filtering and noise reduction methods, this method more thoroughly removes noise such as power frequency interference, electrooculogram (EOG) artifacts, and electromyogram (EMG) artifacts, resulting in high-purity EEG signals. The pure signal effectively reduces noise interference on feature extraction, fusion, and classification, enabling the classifier to recognize commands based on real and effective features, avoiding recognition errors caused by noise, and further improving the system's robustness in complex scenarios. Wavelet transform and fast Fourier transform are used to target the characteristics of MI and SSVEP signals, respectively. These two transform methods are highly efficient and targeted in feature extraction for their corresponding signals, capturing the core features of the signal in a short time and avoiding redundant calculations of irrelevant features. Compared to existing feature extraction methods, this simplifies the computational logic and reduces the time cost of feature extraction. Instead of using a single traditional classification algorithm, the ensemble learning classifier inputs the fused features into a learning classifier for classification. This method combines the advantages of multiple algorithms, effectively compensating for the limitations of a single algorithm. It possesses stronger learning and recognition capabilities for complex EEG signal features, effectively adapting to differences in EEG signals among different users and signal variations at different times for the same user, significantly reducing fluctuations in recognition rate and improving system robustness.

[0018] The above general description and the description below are exemplary and illustrative only and are not intended to limit this application. Attached Figure Description

[0019] One or more embodiments are illustrated by way of example with reference to the accompanying drawings. These illustrations and drawings do not constitute a limitation on the embodiments. Elements having the same reference numerals in the drawings are shown as similar elements. The drawings are not to be scaled. And wherein: Figure 1 This is a flowchart illustrating a hybrid brain-computer interface control method based on motor imagery and steady-state visual evoked potentials provided in an embodiment of this disclosure. Figure 2This is a schematic flowchart of adaptive filtering and independent component analysis processing of the acquired raw EEG signals provided in an embodiment of this disclosure; Figure 3 This is a flowchart illustrating a fusion model based on a bidirectional long short-term memory network and an attention mechanism provided in an embodiment of this disclosure. Figure 4 This is a flowchart illustrating how extracted time-frequency and frequency-domain features are input into a fusion model based on a bidirectional long short-term memory network and an attention mechanism for weighted fusion to obtain fused features, according to an embodiment of this disclosure. Figure 5 This is a schematic diagram of a process provided by an embodiment of the present disclosure, in which fused features are input into a learning classifier for classification and control commands are generated. Figure 6 This is a schematic diagram of a hybrid brain-computer interface control system based on motor imagery and steady-state visual evoked potentials provided in an embodiment of this disclosure; Figure 7 This is a schematic diagram of a hybrid brain-computer interface control device based on motor imagery and steady-state visual evoked potentials provided in an embodiment of this disclosure. Detailed Implementation

[0020] To provide a more detailed understanding of the features and technical content of the embodiments of this disclosure, the implementation of the embodiments of this disclosure will be described in detail below with reference to the accompanying drawings. The accompanying drawings are for illustrative purposes only and are not intended to limit the embodiments of this disclosure. In the following technical description, for ease of explanation, several details are used to provide a full understanding of the disclosed embodiments. However, one or more embodiments may still be implemented without these details. In other cases, well-known structures and devices may be simplified in their depiction to simplify the drawings.

[0021] The terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this disclosure are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate for the embodiments of this disclosure described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion.

[0022] Unless otherwise stated, the term "multiple" means two or more.

[0023] In this embodiment of the disclosure, the character " / " indicates that the objects before and after it are in an "or" relationship. For example, A / B means: A or B.

[0024] The term "and / or" describes an association between objects, indicating that three relationships can exist. For example, A and / or B means: A or B, or A and B.

[0025] The term "correspondence" can refer to an association or binding relationship. The correspondence between A and B means that there is an association or binding relationship between A and B.

[0026] In some embodiments, combined with Figure 1 As shown, a hybrid brain-computer interface control method based on motor imagery and steady-state visual evoked potentials is provided, including: S101, acquire the user's raw EEG signals, which include motor imagery EEG signals and steady-state visual evoked potential signals.

[0027] Optionally, the user's raw EEG signals are collected in real time by contacting the user's scalp with the multi-channel EEG (Electroencephalogram) electrodes of the EEG acquisition module. The sampling frequency is set to 500Hz. The raw EEG signals include motor imagery (MI) EEG signals and steady-state visual evoked potential (SSVEP) EEG signals.

[0028] S102 performs adaptive filtering and independent component analysis on the acquired raw EEG signals.

[0029] S103 performs wavelet transform on the processed motion imagery signal to extract time-frequency features within a predetermined frequency band; performs fast Fourier transform on the processed steady-state visual evoked potential signal to extract frequency domain features at the stimulus frequency and harmonic frequency.

[0030] Optionally, wavelet transform is performed on the processed motion imagery signal, and the db4 wavelet basis function is selected to decompose the signal into 5 levels. Energy and phase features within a predetermined frequency band of 8-30Hz are extracted and combined to form a time-frequency feature sequence of the motion imagery signal. The introduction of phase features increases the feature dimension, which helps improve classification accuracy.

[0031] Optionally, a Fast Fourier Transform (FFT) is performed on the processed steady-state visual evoked potential (SSVEP) signal to convert the time-domain signal into a frequency-domain signal. The amplitude and phase features at the visual stimulus frequency and its second and third harmonic frequencies are extracted and combined to form a frequency-domain feature sequence of the SSVEP signal. By utilizing the amplitude features at the stimulus frequency and harmonic frequencies, the harmonic characteristics of the SSVEP signal are leveraged to enhance the robustness of frequency recognition. Simultaneously, extracting phase features increases the feature dimension, which helps improve discriminative power in multi-target classification.

[0032] S104, the extracted time-frequency features and frequency domain features are input into the fusion model constructed based on bidirectional long short-term memory network and attention mechanism for weighted fusion to obtain the fused features; S105 inputs the fused features into the learning classifier for classification and generates control commands to control external devices.

[0033] The hybrid brain-computer interface control method based on motor imagery and steady-state visual evoked potentials (SVPs) provided in this disclosure simultaneously acquires two types of EEG signals: motor imagery and SVPs. This approach takes into account the characteristics of both signals, laying the foundation for subsequent precise control. A preprocessing method combining adaptive filtering and independent component analysis (ICA) effectively removes power frequency interference, electrooculography (EOG), and electromyography (EMG) artifacts, improving the purity of the EEG signals compared to traditional single-filtering methods. Wavelet transform is used to extract the time-frequency features of the MI signal, simultaneously capturing energy and phase information. Fast Fourier transform is used to extract the frequency domain features of the SVP signal, including harmonic amplitude and phase information, improving the robustness of frequency recognition. A fusion model based on bidirectional long short-term memory (LSTM) networks and attention mechanisms can learn the temporal dependencies between the two signals and perform weighted fusion, overcoming the limitations of traditional simple fusion methods. This fully exploits the dynamic characteristics and complementarity of the two types of signals, solving the problem that traditional simple fusion strategies fail to fully utilize signal complementarity. A learning classifier is used to classify the fused features, improving classification accuracy. Thus, by adopting the control method disclosed herein, control accuracy is effectively improved, control delay is reduced, and the system's adaptability to different scenarios and users is enhanced.

[0034] Combination Figure 2 As shown, a workflow for adaptive filtering and independent component analysis of the acquired raw EEG signals is provided, including: S201 employs a normalized minimum mean square adaptive filter, using a sine and cosine signal synchronized with the power frequency interference as a reference to estimate and filter out the power frequency interference component.

[0035] Optionally, a Normalized Least Mean Square (NLMS) adaptive filter with order 32 and step size factor of 0.01 is used. A sine and cosine signal synchronized with 50Hz power frequency interference is used as a reference signal. The power frequency interference component is estimated in real time through an adaptive algorithm and then filtered out from the original EEG signal.

[0036] S202 uses the FastICA algorithm to perform blind source separation on the adaptively filtered signal, identifies and removes independent components related to electrooculography and electromyography, and then reconstructs the electroencephalogram (EEG) signal.

[0037] Optionally, the FastICA algorithm is used to perform blind source separation on the adaptively filtered EEG signal, decompose the signal into multiple independent components, identify and remove artifact independent components related to electrooculography and electromyography through preset feature thresholds, and reconstruct the remaining effective independent components to obtain the purified EEG signal.

[0038] In this embodiment, by employing a normalized least mean square adaptive filter, changes in power frequency interference can be tracked in real time. Compared to traditional fixed-parameter filtering, it has a better effect on removing power frequency interference and does not cause excessive attenuation of the EEG signal itself. The FastICA algorithm, as a highly efficient blind source separation algorithm, can quickly separate EEG signals from artifact components such as electrooculography (EOG) and electromyography (EMG). After accurately identifying and removing artifact components, the signal is reconstructed, effectively improving the efficiency and effect of signal preprocessing, significantly reducing the interference of artifacts on subsequent feature extraction and fusion, and providing high-quality EEG signals for subsequent steps. At the same time, the efficiency of the preprocessing algorithm can also reduce the overall processing time and control latency.

[0039] For example, the FastICA algorithm was used to perform blind source separation on 16 channels × 1000 points of 2-second sliding window data, decomposing it into 16 independent components. Based on the spectral characteristics, the electrooculogram (EOG) was identified as low-frequency (<5Hz), and the electromyogram (EMG) was identified as high-frequency (>30Hz) artifact components. After zeroing, the pure EEG signal was reconstructed.

[0040] In some embodiments, combined with Figure 3 As shown, the steps of constructing a fusion model based on bidirectional long short-term memory networks and attention mechanisms include: S301, construct two parallel bidirectional long short-term memory networks (BiLSTM).

[0041] The two parallel bidirectional long short-term memory network layers are used to receive time-frequency feature sequences and frequency domain feature sequences as inputs, respectively, and output their respective time-series context representations.

[0042] For example, identical BiLSTM networks are constructed for both the time-frequency feature stream and the frequency domain feature stream, with independent and non-shared parameters. The network consists of two bidirectional LSTM (Long Short-Term Memory) layers. The number of hidden units is 32 units per direction per layer, resulting in an output dimension of 64 for each layer (32 units each for the forward and backward directions). Dropout layers with a dropout rate of 0.3 are inserted between layers to prevent overfitting.

[0043] S302 constructs a confidence evaluation subnetwork consisting of fully connected layers and a softmax function.

[0044] The confidence evaluation subnetwork designs a lightweight auxiliary classifier for each time step of each feature stream. It is configured to: input the temporal context representation of each time step into the auxiliary classifier, classify it, output the probability of each control instruction category corresponding to the feature stream at that time step, and take the maximum value of the probability as the recognition confidence of the time-frequency feature or frequency domain feature at that time step.

[0045] For example, the confidence evaluation subnetwork has the following structure: a single-layer fully connected network, with the input being the BiLSTM output at the current time step, and the output dimension being the number of categories C=4, corresponding to forward, backward, left turn, and right turn, followed by a Softmax function. The confidence is calculated by taking the maximum probability value of the Softmax output as the recognition confidence at that time step. Confidence vectors of length T are calculated separately for the time-frequency feature stream and the frequency domain feature stream.

[0046] S303, construct the attention fusion subnetwork.

[0047] The attention fusion subnetwork is configured to connect to the outputs of two parallel bidirectional long short-term memory network layers for weighted fusion of the temporal context representation.

[0048] The attention fusion subnetwork includes: a query vector generation unit, used to generate a global query vector based on the temporal context representation; an attention score calculation unit, used to calculate the dot product of the global query vector and the temporal context representation at each time step to obtain the original attention weights; a confidence fusion unit, used to multiply the original attention weights by the recognition confidence at the corresponding time step to obtain the corrected attention score; a normalization unit, used to perform softmax normalization on the corrected attention score at each time step to obtain the fusion weights; and a weighted summation unit, used to perform weighted summation on the temporal context representation using the fusion weights to output the fusion feature vector.

[0049] Specifically, the query vector generation unit is used to: take the output of the last time step of the bidirectional long short-term memory network layer of the time-frequency feature stream and the frequency domain feature stream, and calculate the average of the two as the global query vector.

[0050] In this embodiment, two parallel bidirectional long short-term memory networks can simultaneously perform temporal learning on MI and SSVEP features, improving the model's computational efficiency and avoiding the time loss caused by serial processing. Using the average of the outputs of the last time step of the two networks as the global query vector can integrate the global feature information of the two types of signals, making the global query vector more representative. The original attention weights calculated based on this vector can more accurately capture the importance of features at each time step. The standardized configuration of the attention mechanism layer realizes standardized and automated processing from the calculation of the original attention weights to the generation of fusion weights and then to the feature weighted fusion, ensuring the stability and repeatability of the fusion model. At the same time, the standardized model structure also facilitates subsequent model optimization and porting. In addition, the construction of the entire fusion model fully considers computational efficiency, and the operation logic of each step is simple and efficient, which can effectively reduce the processing time of feature fusion and reduce control latency.

[0051] In some embodiments, combined with Figure 4 As shown, the steps for inputting the extracted time-frequency features and frequency domain features into a fusion model constructed based on a bidirectional long short-term memory network and an attention mechanism for weighted fusion to obtain the fused features are provided, including: S401, input the time-frequency feature sequence and the frequency domain feature sequence into two independent bidirectional long short-term memory networks respectively, and learn their respective temporal context representations.

[0052] S402, calculate the original attention weights for the temporal context representation at each time step.

[0053] S403 uses an auxiliary classifier to calculate the recognition confidence of time-frequency features and frequency domain features at each time step in real time.

[0054] Optionally, the recognition confidence of time-frequency features and frequency domain features at each time step is calculated in real time by an auxiliary classifier, including: inputting the temporal context representation of each time step into the auxiliary classifier, which consists of a fully connected layer and a softmax function, outputting the probability of each category corresponding to the feature stream at that time step, and taking the maximum value among them as the recognition confidence of that time step.

[0055] In this embodiment, the auxiliary classifier, composed of fully connected layers and a softmax function, is simple in structure and computationally efficient. It can quickly classify the temporal context representation at each time step and output the class probability, avoiding increased processing time due to complex calculations. Using the maximum class probability as the recognition confidence level can intuitively and accurately reflect the reliability of the feature representation of the control intention at that time step, providing a precise and real-time reference for the dynamic adjustment of fusion weights, ensuring the effectiveness and real-time performance of the dynamic fusion strategy. At the same time, the simple network structure can also reduce the computational load of the model and reduce processing latency.

[0056] S404 multiplies the original attention weights by the corresponding recognition confidence scores, and then normalizes them to obtain the fusion weights.

[0057] Optionally, the original attention weights are multiplied by the corresponding recognition confidence scores and then normalized to obtain the fusion weights, including: for each time step of each feature stream, multiplying the original attention weights by the recognition confidence scores corresponding to that time step to obtain the corrected attention score; and performing softmax normalization on the corrected attention scores of all time steps of each feature stream to obtain the fusion weights of each time step of that feature stream.

[0058] In this embodiment, the corrected attention score is obtained by multiplying the original attention weights by the recognition confidence, thus achieving accurate correction of the attention weights. This allows the fusion weights to simultaneously consider the inherent importance of features and the reliability of real-time recognition. The corrected attention score is processed using softmax normalization, which maps the weight values ​​of each time step to the 0-1 range, and the sum of the weights of all time steps is 1. This ensures the rationality and normalization of the fusion weights, enabling the features of each time step to be reasonably weighted and fused according to the weights. This avoids the fusion features deviating from the user's true control intention due to weight imbalance, thereby improving the accuracy of the fusion features.

[0059] S405, use fusion weights to perform a weighted summation on the temporal context representation and output a fusion feature vector.

[0060] Optionally, the temporal context representation is weighted and summed using dynamic fusion weights to output a fused feature vector, including: for each feature stream, the temporal context representation at each time step is weighted and summed using the fusion weights of the feature stream to obtain the fused context representation of the feature stream; the fused context representation corresponding to the time-frequency feature stream and the fused context representation corresponding to the frequency domain feature stream are combined to output the final fused feature vector; wherein, the combination is to take the average of the two fused context representations, or to concatenate the two fused context representations and then reduce the dimensionality through a linear transformation.

[0061] In this embodiment, a weighted summation of the temporal context representations at each time step of each feature stream is performed. This integrates the effective features from different time steps in the feature stream, resulting in a more representative fused context representation. The fused context representations of the two types of feature streams are combined using either averaging or concatenation followed by linear transformation for dimensionality reduction. This approach ensures effective fusion of the two types of features while allowing for flexible selection of the combination method based on the specific application scenario. Averaging is computationally simple, quickly yielding the fused feature vector and reducing processing latency. Concatenation followed by linear transformation for dimensionality reduction retains more feature information while reducing the feature dimension, decreasing the computational load on subsequent classifiers, and improving classification efficiency. Both methods ensure the compactness and effectiveness of the fused feature vector, providing high-quality feature input for subsequent instruction classification.

[0062] In this embodiment, two independent bidirectional long short-term memory networks can respectively learn the temporal context information of MI and SSVEP features, fully exploring the temporal dynamic characteristics of the two types of signals. Compared with traditional feature extraction methods, this approach can capture the dynamic evolution patterns of the two signals. The original attention weights are calculated using an attention mechanism, automatically focusing on the feature portions of the two types of signals that are more representative of the control intent. The attention weights are adjusted based on the recognition confidence calculated in real-time by the auxiliary classifier, enabling dynamic adjustment of the fusion weights. This allows the fusion process to allocate weights according to the real-time recognition reliability of the two types of features, giving higher weights to features with high confidence and strong representational value. This fully leverages the complementarity of MI and SSVEP signals, enabling the fused features to more accurately and comprehensively reflect the user's control intent, significantly improving the effectiveness and relevance of feature fusion, and ultimately enhancing the accuracy of subsequent command recognition.

[0063] In some embodiments, the learning classifier includes a support vector machine model, a random forest model, and a meta-learner, wherein the meta-learner employs a logistic regression model; combined with Figure 5 As shown, a process is provided for inputting the fused features into a learning classifier for classification and generating control commands, including: S501, the fused features are input into the support vector machine model and the random forest model respectively to obtain the preliminary classification decision value and class probability.

[0064] The Support Vector Machine (SVM) model uses a radial basis function as its kernel function and outputs preliminary classification class decision values. The Random Forest model sets the number of decision trees to 100 and outputs preliminary classification class probability vectors.

[0065] S502 concatenates the class decision values ​​output by the support vector machine model with the class probability vectors output by the random forest model to form a new feature vector.

[0066] S503 inputs the new feature vector into the meta-learner to obtain the final classification result, and generates the corresponding control command based on the classification result.

[0067] The new feature vector is input into the logistic regression meta-learner, which performs secondary classification on the new feature vector and outputs the final classification result. Based on the classification result, corresponding control commands are generated, such as forward, backward, left turn, right turn, etc.

[0068] In this embodiment, the Support Vector Machine (SVM) model excels in classifying small samples and high-dimensional features, outputting accurate class decision values. The Random Forest (RFR) model exhibits strong resistance to overfitting and generalization capabilities, outputting reliable class probabilities. Combining these two models for initial classification fully leverages their respective algorithmic advantages and compensates for the shortcomings of a single algorithm. Concatenating the class decision values ​​and class probability vectors to form a new feature vector integrates the feature information from both initial classifications, providing a richer and more comprehensive basis for final classification. A meta-learner performs secondary classification on the new feature vector, optimizing and calibrating the initial classification results, thus improving the accuracy and reliability of the classification. The overall design of the ensemble learning classifier enhances the system's adaptability to differences in EEG signals among different users and signal variations within the same user at different times, significantly reducing fluctuations in the recognition rate and enhancing the system's robustness. Simultaneously, the computational logic of each model is highly efficient, and the concatenation and secondary classification steps do not significantly increase processing time, ensuring real-time instruction generation.

[0069] Combination Figure 6 As shown, this disclosure provides a hybrid brain-computer interface control system 60 based on motor imagery and steady-state visual evoked potentials, including: an EEG acquisition module 610, a signal processing module 620, a fusion decision module 630, and a control output module 640.

[0070] The EEG acquisition module 610 includes a multi-channel EEG electrode, a signal amplification circuit, and an A / D converter for acquiring EEG signals.

[0071] The signal processing module 620 includes a signal preprocessing unit and a feature extraction unit, which are used for signal preprocessing and feature extraction.

[0072] The fusion decision module 630 includes a fusion model and an ensemble learning classifier built on a bidirectional long short-term memory network and an attention mechanism, for feature fusion and instruction classification.

[0073] The control output module 640 consists of a communication interface and an instruction execution unit, and is used for outputting control instructions and controlling external devices.

[0074] The control system provided in this disclosure uses multi-channel EEG electrodes of the EEG acquisition module 610 to contact the user's scalp and acquire the user's raw EEG signals in real time. The raw EEG signals include motor imagery (MI) EEG signals and steady-state visual evoked potential (SSVEP) EEG signals. After the acquired EEG signals are amplified by the signal amplification circuit, they are converted into digital signals by the A / D converter and transmitted to the signal preprocessing unit of the signal processing module 620.

[0075] The signal preprocessing unit sequentially performs adaptive filtering and independent component analysis on the raw digital EEG signal. Specifically, the preprocessing unit employs a Normalized Least Mean Square (NLMS) adaptive filter, using a sine and cosine signal synchronized with 50Hz power frequency interference as a reference signal. An adaptive algorithm is used to estimate the power frequency interference component in real time and filter it out from the raw EEG signal. Then, the FastICA algorithm is used to perform blind source separation on the adaptively filtered EEG signal, decomposing the signal into multiple independent components. Artifacts related to electrooculography (EOG) and electromyography (EMG) are identified and removed using preset feature thresholds. The remaining effective independent components are reconstructed to obtain the purified EEG signal, which is then transmitted to the feature extraction unit.

[0076] The feature extraction unit separates the purified EEG signal into a motor imagery signal and a steady-state visual evoked potential signal, and performs feature extraction on each. Specifically, the processed motor imagery signal is subjected to wavelet transform to extract energy and phase features within a predetermined frequency band, which are then combined to form a time-frequency feature sequence of the motor imagery signal. The processed steady-state visual evoked potential signal is subjected to Fast Fourier Transform (FFT) to convert the time-domain signal into a frequency-domain signal, and its amplitude and phase features at the visual stimulus frequency and its second and third harmonic frequencies are extracted and combined to form a frequency-domain feature sequence of the steady-state visual evoked potential signal. The extracted time-frequency feature sequence and frequency-domain feature sequence are then transmitted to the fusion model of the fusion decision module 630.

[0077] The fusion model is a dynamic fusion model based on a bidirectional long short-term memory network and an attention mechanism. Its weighted fusion process for features is as follows: Two parallel and structurally identical bidirectional long short-term memory networks are constructed. The time-frequency feature sequence and the frequency domain feature sequence are respectively input into two independent BiLSTM networks. The two BiLSTM networks perform temporal learning on the two types of feature sequences, mining temporal context information and outputting their respective temporal context representations at each time step. An attention mechanism layer is constructed, taking the average of the outputs of the last time step of the two BiLSTM networks as the global query vector. The dot product of this global query vector and the temporal context representation at each time step is calculated to obtain the original attention weights for each time step. Finally, a fully connected layer and a softmax layer are constructed. An auxiliary classifier, composed of functions, takes the temporal context representation at each time step as input, classifies it, and outputs the probability of each control instruction category corresponding to the feature stream at that time step. The maximum value of the probability is taken as the recognition confidence of the time-frequency feature or frequency domain feature at that time step. For each time step of each feature stream, the original attention weight is multiplied by the recognition confidence corresponding to that time step to obtain the corrected attention score. The corrected attention scores of all time steps of each feature stream are subjected to softmax normalization to obtain the fusion weight of each time step of the feature stream. For each feature stream, the temporal context representation of each time step is weighted and summed using the fusion weight of the feature stream to obtain the fusion context representation of the feature stream. The fusion context representations corresponding to the time-frequency feature stream and the fusion context representations corresponding to the frequency domain feature stream are concatenated, and then the dimensionality of the concatenated feature vector is reduced to a preset dimension through a linear transformation to output the final fusion feature vector.

[0078] The fused features are input into a learning classifier for classification, generating control commands. The learning classifier is an ensemble learning model, consisting of a Support Vector Machine (SVM) model, a Random Forest model, and a meta-learner. The meta-learner uses a logistic regression model. Its classification process for the fused features is as follows: the fused feature vectors are input into the SVM model and the Random Forest model respectively. The SVM model uses a radial basis function as its kernel function and outputs preliminary classification decision values. The Random Forest model has 100 decision trees and outputs preliminary classification probability vectors. The classification decision values ​​output by the SVM model and the classification probability vector output by the Random Forest model are concatenated to form a new feature vector with higher dimensions. This new feature vector is then input into the logistic regression meta-learner, which performs secondary classification on the new feature vector and outputs the final classification result. Based on this classification result, corresponding control commands are generated, such as forward, backward, left turn, and right turn.

[0079] The fusion decision module 630 transmits the generated control commands to the control output module 640. The control output module 640 receives the commands through its communication interface, converts them into signals that can be recognized by external devices through the command execution unit, and then sends them to external devices via wireless Bluetooth, thereby achieving precise and real-time control of external devices.

[0080] Combination Figure 7 As shown, this disclosure provides a hybrid brain-computer interface control device 70 based on motor imagery and steady-state visual evoked potentials, including a processor 700 and a memory 701. Optionally, the device 70 may further include a communication interface 702 and a bus 703. The processor 700, communication interface 702, and memory 701 can communicate with each other via the bus 703. The communication interface 702 can be used for information transmission. The processor 700 can call logical instructions in the memory 701 to execute the hybrid brain-computer interface control method based on motor imagery and steady-state visual evoked potentials described in the above embodiment.

[0081] Furthermore, the logic instructions in the aforementioned memory 701 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium.

[0082] The memory 701, as a computer-readable storage medium, can be used to store software programs and computer-executable programs, such as program instructions / modules corresponding to the methods in the embodiments of this disclosure. The processor 100 executes functional applications and data processing by running the program instructions / modules stored in the memory 701, that is, it implements the hybrid brain-computer interface control method based on motor imagery and steady-state visual evoked potentials in the above embodiments.

[0083] The memory 701 may include a program storage area and a data storage area. The program storage area may store the operating system and application programs required for at least one function; the data storage area may store data created based on the use of the terminal device. Furthermore, the memory 701 may include high-speed random access memory and may also include non-volatile memory.

[0084] In some embodiments, a computer-readable storage medium is provided storing program instructions that, when executed, cause a computer to perform the hybrid brain-computer interface control method based on motor imagery and steady-state visual evoked potentials as described in any of the above embodiments.

[0085] The technical solutions of this disclosure can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes one or more instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the method described in this disclosure. The aforementioned storage medium can be a non-transitory storage medium, such as a USB flash drive, external hard drive, read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk, etc., and other media capable of storing program code.

[0086] The foregoing description and accompanying drawings fully illustrate embodiments of this disclosure to enable those skilled in the art to practice them. Other embodiments may include structural, logical, electrical, procedural, and other changes. The embodiments represent only possible variations. Individual components and functions are optional unless explicitly required, and the order of operation may vary. Parts and features of some embodiments may be included in or replace parts and features of other embodiments. Moreover, the terminology used in this application is for describing embodiments only and is not intended to limit the claims. As used in the description of embodiments and claims, the singular forms “a,” “an,” and “the” are intended to equally include the plural forms unless the context clearly indicates otherwise. Similarly, the term “and / or” as used in this application means including one or more of the associated listed items and all possible combinations thereof. Additionally, when used in this application, the term "comprise" and its variations "comprises" and / or "comprising" refer to the presence of stated features, integrals, steps, operations, elements, and / or components, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components, and / or groups thereof. Without further limitations, an element defined by the phrase "comprises a..." does not exclude the presence of other identical elements in the process, method, or apparatus that includes said element. In this document, each embodiment may focus on the differences from other embodiments, and similar or identical parts between embodiments can be referred to mutually. For methods, products, etc., disclosed in the embodiments, if they correspond to the method section disclosed in the embodiments, the relevant parts can be referred to the description of the method section.

[0087] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of the embodiments of this disclosure. Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0088] The methods and products disclosed in the embodiments herein (including but not limited to devices and equipment) can be implemented in other ways. For example, the device embodiments described above are merely illustrative. For instance, the division of units may be merely a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical, or other forms. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units, that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to implement this embodiment according to actual needs. In addition, the functional units in the embodiments of this disclosure may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

[0089] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than that shown in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. In the descriptions corresponding to the flowcharts and block diagrams in the accompanying drawings, the operations or steps corresponding to different blocks may also occur in a different order than disclosed in the description, and sometimes there is no specific order between different operations or steps. For example, two consecutive operations or steps may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. Each block in a block diagram and / or flowchart, and combinations of blocks in a block diagram and / or flowchart, can be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.

Claims

1. A hybrid brain-computer interface control method based on motor imagery and steady-state visual evoked potentials, characterized in that, include: Acquire the user's raw EEG signals, which include motor imagery EEG signals and steady-state visual evoked potential signals; The acquired raw EEG signals were subjected to adaptive filtering and independent component analysis. Wavelet transform is performed on the processed motion image signal to extract time-frequency features within a predetermined frequency band; Fast Fourier Transform was performed on the processed steady-state visual evoked potential signal to extract frequency domain features at the stimulation frequency and harmonic frequency. The extracted time-frequency features and frequency domain features are input into a fusion model based on a bidirectional long short-term memory network and an attention mechanism for weighted fusion to obtain the fused features. The fused features are input into a learning classifier for classification, generating control commands to control external devices.

2. The method according to claim 1, characterized in that, The acquired raw EEG signals underwent adaptive filtering and independent component analysis, including: A normalized minimum mean square adaptive filter is used, with a sine and cosine signal synchronized with the power frequency interference as a reference, to estimate and filter out the power frequency interference component. The FastICA algorithm was used to perform blind source separation on the adaptively filtered signal, and after identifying and removing independent components related to electrooculography and electromyography, the EEG signal was reconstructed.

3. The method according to claim 1, characterized in that, The extracted time-frequency and frequency-domain features are input into a fusion model based on a bidirectional long short-term memory network and an attention mechanism for weighted fusion to obtain the fused features, including: The time-frequency feature sequence and the frequency domain feature sequence are respectively input into two independent bidirectional long short-term memory networks to learn their respective temporal context representations; For each time step, calculate its original attention weights based on the temporal context representation. The identification confidence level of time-frequency features and frequency domain features at each time step is calculated in real time by an auxiliary classifier. The original attention weights are multiplied by the corresponding recognition confidence scores, and then normalized to obtain the fusion weights. The temporal context representation is weighted and summed using fusion weights to output a fusion feature vector.

4. The method according to claim 3, characterized in that, The step of calculating the recognition confidence of time-frequency features and frequency domain features at each time step in real time using an auxiliary classifier includes: The temporal context representation at each time step is input into the auxiliary classifier, which consists of a fully connected layer and a softmax function. The auxiliary classifier outputs the probability of each category corresponding to the feature stream at that time step, and takes the maximum value as the recognition confidence at that time step.

5. The method according to claim 3, characterized in that, The process of multiplying the original attention weights by their corresponding recognition confidence scores and then normalizing them to obtain the fusion weights includes: For each time step of each feature stream, the original attention weights are multiplied by the recognition confidence corresponding to that time step to obtain the corrected attention score; The modified attention scores for all time steps of each feature stream are normalized using softmax to obtain the fusion weights for each time step of that feature stream.

6. The method according to claim 3, characterized in that, The step of using dynamic fusion weights to perform weighted summation on the temporal context representation and outputting a fused feature vector includes: For each feature stream, the fusion weights of the feature stream are used to perform a weighted summation of the temporal context representations at each time step to obtain the fusion context representation of the feature stream. The fusion context representations corresponding to the time-frequency feature stream and the frequency domain feature stream are combined to output the final fusion feature vector. The combination is either the average of the two fused context representations, or the concatenation of the two fused context representations followed by a linear transformation to reduce dimensionality.

7. The method according to any one of claims 1 to 6, characterized in that, The steps involved in building a fusion model based on bidirectional long short-term memory networks and attention mechanisms include: Construct two parallel bidirectional long short-term memory networks; Construct a confidence evaluation subnetwork consisting of fully connected layers and a softmax function; and, Construct an attention fusion subnetwork.

8. The method according to any one of claims 1 to 6, characterized in that, The fused features are input into a learning classifier for classification, generating control commands, including: The fused features are then input into the support vector machine model and the random forest model respectively to obtain the preliminary classification decision values ​​and class probabilities. The class decision value output by the support vector machine model is concatenated with the class probability vector output by the random forest model to form a new feature vector; The new feature vector is input into the meta-learner to obtain the final classification result, and corresponding control instructions are generated based on the classification result.

9. A hybrid brain-computer interface control device based on motor imagery and steady-state visual evoked potentials, comprising a processor and a memory storing program instructions, characterized in that, The processor is configured to execute, when running the program instructions, the hybrid brain-computer interface control method based on motor imagery and steady-state visual evoked potentials as described in any one of claims 1 to 8.

10. A computer-readable storage medium storing program instructions, characterized in that, When the program instructions are executed, they cause the computer to perform the hybrid brain-computer interface control method based on motor imagery and steady-state visual evoked potentials as described in any one of claims 1 to 8.