Neural decoding method based on attention blur visual prior and light-weight electroencephalogram coding
By introducing attention-weighted multi-level blurred visual priors on the visual side and lightweight adaptive EEG coding on the EEG side, the problem of mismatch between visual stimuli and EEG signal information is solved, and the performance of cross-modal alignment and visual neural decoding is improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SICHUAN UNIV
- Filing Date
- 2026-05-18
- Publication Date
- 2026-06-19
AI Technical Summary
In existing technologies, there is an information mismatch between visual stimuli and EEG signals. Fixed fuzzy prior adaptation is insufficient, and lightweight EEG encoders have limited ability to model channel differences and temporal dynamics, resulting in poor cross-modal alignment quality and visual neural decoding performance.
By introducing attention-weighted multi-level fuzzy visual priors and a lightweight adaptive EEG coding structure, we improve cross-modal alignment capability and robustness by performing attention-weighted multi-level fuzzy processing on the visual side and channel scaling, temporal gating, residual correction and feature reweighting on the EEG side.
It effectively reduces the information mismatch between visual stimuli and EEG representations, improves cross-modal alignment ability and visual neural decoding performance, and enhances the stability and alignment quality of the model.
Smart Images

Figure CN122241610A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the fields of brain-computer interface, EEG signal processing, computer vision, and cross-modal learning technology, and specifically relates to a neural decoding method based on attention-blurred visual priors and lightweight EEG coding. Background Technology
[0002] Visual neural decoding aims to infer a subject's perception and cognitive content of visual stimuli based on brain signals. Current mainstream methods typically employ a frozen visual encoder to extract image features and train an EEG encoder to map EEG signals to the same latent space. Then, cross-modal alignment between the visual and EEG modalities is achieved through contrastive learning. This type of visual-brain signal contrastive learning framework has been used for visual neural decoding tasks such as brain-to-image retrieval.
[0003] In existing technologies, visual stimuli contain a large number of high-frequency details. However, EEG signals are constrained by visual perception processes, cognitive fluctuations, and acquisition noise during their formation, and cannot stably reflect all visual details. Therefore, there is an information mismatch between visual stimuli and EEG representations. Existing methods have mitigated this problem by introducing fuzzy priors, and it has been demonstrated that moderately reducing high-frequency details helps improve the information discrepancy between visual stimuli and brain signals.
[0004] However, existing fuzzy priors are mostly in fixed or concave fuzzy forms, with relatively rigid fuzz intensity allocation, making it difficult to perform finer-grained adaptive adjustments based on image content and spatial location. Furthermore, existing lightweight EEG encoders are typically shallow projection structures, limiting their ability to model channel differences, temporal dynamics, and inter-sample fluctuations in EEG signals. Therefore, it is necessary to improve both the visual prior construction method and the EEG encoding method while maintaining the existing visual-brain signal contrastive learning framework, thereby further enhancing cross-modal alignment quality and visual neural decoding performance. Summary of the Invention
[0005] The purpose of this invention is to address the problems in existing technologies, such as significant information mismatch between visual stimuli and EEG signals, insufficient adaptability of fixed fuzzy priors, and limited ability of lightweight EEG encoders to model channel differences and temporal dynamics. This invention provides a neural decoding method based on attention-weighted multi-level fuzzy visual priors and lightweight EEG coding. By introducing attention-weighted multi-level fuzzy visual priors on the visual side and a lightweight adaptive EEG coding structure on the EEG side, this invention effectively reduces the information mismatch between visual stimuli and EEG representations, and improves cross-modal alignment capability, robustness, and visual neural decoding performance.
[0006] The objective of this invention is achieved through the following technical solution: a neural decoding method based on attention-blurred visual priors and lightweight EEG coding, the specific implementation steps of which are as follows:
[0007] S1. Acquire paired visual stimulus images and EEG signals;
[0008] S2. Perform attention-weighted multi-level blurring processing on the visual stimulus image; first, generate a structural saliency response based on the image grayscale gradient information, and generate an attention map based on the center prior; construct low-blurred, medium-blurred, and high-blurred images respectively according to the preset weak blur kernel size, intermediate blur kernel ratio, and strong blur kernel size; calculate the weights of the low-blurred, medium-blurred, and high-blurred branches based on the attention map, and perform weighted fusion on the low-blurred, medium-blurred, and high-blurred images to obtain the transformed visual input image;
[0009] S3. Input the visual input image into the frozen visual encoder, extract visual features, and obtain the visual embedding representation. ;
[0010] S4. Input the EEG signal into a lightweight adaptive EEG encoder. First, perform channel scaling and time-gated input calibration on the EEG signal. Then, obtain the basic EEG embedding through a shallow projection backbone network. Finally, combine adapter residual correction, sample-level residual gating, and feature reweighting to obtain the EEG embedding representation. ;
[0011] S5. Embedding visual representation With EEG Embedded Representation Mapping to a shared latent space aligns visual modal features with EEG modal features in a unified representation space;
[0012] S6. Visual embedding representation in the shared latent space based on contrastive learning loss function and EEG embedded representation Cross-modal alignment training is performed to obtain a visual neural decoding model, and the trained visual neural decoding model is used for visual neural decoding.
[0013] The beneficial effects of this invention are as follows: By introducing an attention-weighted multi-level blurred visual prior on the visual side, the blur intensity can be adaptively allocated according to the local content of the image, which is more conducive to preserving task-related structural information and suppressing high-frequency details that are detrimental to stable EEG representation. Simultaneously, by introducing a lightweight adaptive EEG coding structure on the EEG side, channel calibration, temporal modulation, residual correction, sample-level gating, and feature reweighting are organically combined, effectively reducing the information mismatch between visual stimuli and EEG representation, and improving cross-modal alignment capability, robustness, and visual neural decoding performance. Under the condition that the basic task setting of visual-brain signal contrast learning remains unchanged, on the visual side, the attention-weighted multi-level blurred visual prior of this invention can replace the existing visual prior module, while the visual encoder itself can still use the existing frozen pre-trained visual encoder; on the EEG side, the lightweight adaptive EEG encoder of this invention can replace the existing EEG coding module, improving EEG representation capability and training stability while maintaining lightweight structural characteristics, thereby enhancing visual neural decoding performance. Attached Figure Description
[0014] Figure 1 This is an overall flowchart of a neural decoding method based on attention-blurred visual priors and lightweight EEG coding.
[0015] Figure 2 The flowchart shows the attention-weighted multi-level fuzzy processing step S2 of the neural decoding method based on attention-fuzzy visual priors and lightweight EEG coding.
[0016] Figure 3 The attention map and fuzzy instance map generated in step S2;
[0017] Figure 4 The flowchart shows the lightweight adaptive EEG coding process for step S4 of the neural decoding method based on attention-blurred visual priors and lightweight EEG coding. Detailed Implementation
[0018] This embodiment performs visual neural decoding within a visual-brain signal contrastive learning framework. The visual neural decoding task employed is a brain-to-image retrieval task, with inputs including a visual stimulus image and the corresponding EEG signal. The visual stimulus image is input to the visual branch, and the EEG signal is input to the EEG branch. The visual branch uses a frozen pre-trained visual encoder to extract visual embedding representations, while the EEG branch uses a lightweight adaptive EEG encoder to extract EEG embedding representations. The embedding representations of both modalities are then mapped to the same shared latent space, and cross-modal alignment training is performed using a contrastive learning loss function to obtain the visual neural decoding model. The technical solution of this invention is further described below with reference to the accompanying drawings and specific embodiments.
[0019] In this embodiment, the THINGS-EEG dataset is used for illustration. This dataset contains 10 subjects. The training set contains 1654 concepts, with each concept corresponding to 10 images, and each image is presented 4 times per subject. The test set contains 200 concepts, with each concept corresponding to 1 image, and each image is presented 80 times per subject. To improve the signal-to-noise ratio, both training and testing use EEG signals after repeated averaging. For a single subject, the number of training samples is 16540, and the number of test samples is 200. The data preprocessing frequency configuration used corresponds to 250 Hz preprocessed data, and the time window is set to [0, 250].
[0020] In this embodiment, the visual encoder uses the CLIP model as the visual backbone network, with the default pre-training parameters set to RN50. Its input image size is 224×224, and its output embedding dimension is 1024. The EEG encoder uses a lightweight adaptive EEG encoder. The training method adopts an intra-subject setting, meaning that each subject is trained and tested separately. The number of training rounds is 50, and the learning rate is 1× The optimizer was AdamW; the training batch size was 1024, and the test batch size was 200. The training script used the same settings for different subjects. For EEG signal input, this embodiment selected 17 channels related to the posterior parietal and occipital lobes as input channels: P7, P5, P3, P1, Pz, P2, P4, P6, P8, PO7, PO3, POz, PO4, PO8, O1, Oz, and O2. This allows for more focused retention of brain region signals related to visual stimulus processing. For visual priors, this embodiment employed attention-weighted multi-level fuzzy visual priors.
[0021] like Figure 1 As shown in the figure, this embodiment provides a visual neural decoding method based on attention-weighted multi-level fuzzy visual priors and lightweight adaptive EEG coding. The specific implementation steps are as follows:
[0022] S1. Acquire paired visual stimulus images and EEG signals, wherein the visual stimulus images serve as the visual branch input and the EEG signals serve as the EEG branch input, and construct a one-to-one correspondence between the visual stimulus images and the EEG signals. Specifically, this is achieved by constructing a paired training dataset. , Each sample group includes visual stimulus images. and the EEG signals corresponding to the visual stimulus image During training, small batches of paired samples are sampled from the training dataset. , The system inputs visual stimulus images into a visual encoder and EEG signals into an EEG encoder to establish a pairing correspondence between visual and EEG modalities. In this embodiment, the training phase uses a single-subject intra-subject setting, meaning that data from only one subject is used for each training session; for a single subject, the number of training samples is 16,540 and the number of test samples is 200.
[0023] S2. Perform attention-weighted multi-level blurring processing on the visual stimulus image; first, generate a structural saliency response based on the image grayscale gradient information, and generate an attention map based on the center prior; construct low-blurred, medium-blurred, and high-blurred images respectively according to the preset weak blur kernel size, intermediate blur kernel ratio, and strong blur kernel size; calculate the weights of the low-blurred, medium-blurred, and high-blurred branches based on the attention map, and perform weighted fusion on the low-blurred, medium-blurred, and high-blurred images to obtain the transformed visual input image;
[0024] The process of attention-weighted multi-level fuzzy processing is as follows: Figure 2 As shown, the specific implementation method is as follows: For a visual stimulus image, it is first converted into a grayscale image, and the gradient response of the grayscale image in the horizontal and vertical directions is calculated to obtain a gradient magnitude map reflecting the saliency of the local structure of the image; the gradient magnitude map satisfy:
[0025] (1);
[0026] in, This represents the gradient response of a grayscale image in the horizontal direction. This represents the gradient response of the grayscale image in the vertical direction; in this embodiment, the image height h is 224, the image width w is 224, and the Sobel gradient kernel size is 3.
[0027] Then, the gradient magnitude diagram obtained according to formula (1) is normalized and the nonlinear response is adjusted to obtain:
[0028] (2);
[0029] in, ) represents the normalization operation, and γ represents the gradient response adjustment parameter. In this embodiment, γ is set to 0.8. The adjusted gradient magnitude map is then generated; the adjusted result obtained according to formula (2) is then smoothed, and a center prior map is introduced. Generate attention map :
[0030] (3);
[0031] in Indicates the central prior weight. This represents a two-dimensional Gaussian center prior map; in this embodiment, the attention map smoothing kernel size is 9. We set the value to 0.15 and the central prior scale ratio to 0.45.
[0032] Then, based on the preset weak blur kernel size, intermediate blur kernel ratio, and strong blur kernel size, low-blur images are constructed respectively. Medium blur image and highly blurred images And the attention map obtained according to formula (3) Attention value at any spatial location Calculate the soft-assignment weights for the low-fuzzy branch, medium-fuzzy branch, and high-fuzzy branch respectively. :
[0033] (4);
[0034] in, This represents the attention center corresponding to the fuzziness level. This represents the smoothing control parameter; in this embodiment, the attention center corresponds to the low-fuzziness branch. The attention center corresponding to the fuzzy branch is set to 0.85. Taking 0.5, the attention center corresponding to the highly blurred branch. Set the value to 0.15 for smooth control parameters. Take 0.15.
[0035] The soft allocation weights obtained according to formula (4) are then normalized to obtain:
[0036] (5);
[0037] in, This represents the numerical stability term, which is taken as 1× in this embodiment. .
[0038] Finally, the normalized weights obtained according to formula (5) are used to perform pixel-by-pixel weighted fusion of the low-blurred image, medium-blurred image, and high-blurred image to obtain the transformed visual input image:
[0039] (6);
[0040] Where ⊙ represents element-wise multiplication. In this embodiment, the strong fuzzy kernel size is 51, the weak fuzzy kernel size is 3, and the intermediate fuzzy kernel ratio is 0.35. Therefore, the intermediate fuzzy kernel size is determined by the ratio of the weak fuzzy kernel to the strong fuzzy kernel at 0.35, and is uniformly converted to an odd kernel size in the implementation. Figure 3The attention map and the third-level blurred example image are obtained by performing attention-weighted multi-level blurring on three example images selected from the test set of the THINGS-EEG dataset.
[0041] S3. Input the visual input image into the frozen visual encoder, extract the corresponding visual features, and map the visual feature representation into a visual embedding representation. During training, the parameters of the visual encoder remain frozen and are used only as a feature extractor for the visual modality. In this embodiment, the visual encoder uses the CLIP model with pre-trained parameters of RN50, an input image size of 224×224, and an output embedding dimension z_dim of 1024.
[0042] S4. Input the EEG signal into a lightweight adaptive EEG encoder. First, perform channel scaling and time-gated input calibration on the EEG signal. Then, obtain the basic EEG embedding through a shallow projection backbone network. Finally, combine adapter residual correction, sample-level residual gating, and feature reweighting to obtain the EEG embedding representation. .
[0043] The complete lightweight adaptive EEG coding workflow is as follows: Figure 4 As shown, this EEG encoder, while maintaining the shallow projection backbone, introduces channel scaling and time gating on the input side, and small residual adapter, sample-level residual gating, and feature reweighting on the output side, to improve EEG representation ability and training stability without significantly increasing model complexity. The specific implementation method is as follows: for the input EEG signal... First, adaptive scaling is performed along the channel dimension to obtain:
[0044] (7);
[0045] in, This represents the learnable channel scaling parameter; in this embodiment, the number of EEG input channels C is 17, and the time length T is 250, therefore the EEG signal input size is 17×250, and the channel scaling parameter... The initial value is 1.
[0046] The EEG signal after channel scaling is obtained according to formula (7), and the mean value is calculated along the channel dimension to extract the signal. Contextual information at any moment , represented as:
[0047] (8);
[0048] in, Indicates the dimension along the channel Perform an average calculation. Indicates the brainwave channel dimension. Indicates a time index;
[0049] The time modulation term is obtained by mapping and transforming the context information at time t obtained from formula (8) through a time-gated network. :
[0050] (9);
[0051] in, ( This represents a time-gated network that transforms time context information. This represents the hyperbolic tangent function, used for bounded compression of network output. Indicates the first The time modulation result at each moment. In this embodiment, the hidden layer dimension of the time-gated network is max(8, 250 / 8) = 31, and the time modulation intensity parameter is... The initial value is 0.14, and the weights and biases of the last layer of the time-gated network are initialized to zero.
[0052] The input EEG signal was calibrated according to formulas (7) and (9) to obtain:
[0053] (10);
[0054] in, Indicates the time modulation intensity parameter. This indicates the calibrated EEG signal;
[0055] The calibrated EEG signal, according to formula (10), is then input into the shallow projection backbone network to obtain the basic EEG embedding. In this embodiment, the flattened dimension of the EEG input is 17 × 250 = 4250. The output dimension of the shallow projection backbone is consistent with the visual embedding dimension, which is 1024. The dropout value of the residual projection layer in the backbone network is 0.2. Based on this, residual correction terms are generated through the adapter residual correction branch. :
[0056] (11);
[0057] in, ( The ) represents the adapter transformation; in this embodiment, the adapter hidden dimension is determined by max(32, 1024 / 8), i.e., 128, the adapter branch dropout is 0.1, and the residual corrects the global scaling parameter. The initial value is 0.1, and the weights and biases of the last layer of the adapter are initialized to zero.
[0058] Then, sample-level modulation factors are generated through sample-level residual gating branches. :
[0059] (12);
[0060] in ( ) represents sample-level residual gating transformation. This represents the sample-level modulation intensity parameter; in this embodiment, the sample-level modulation intensity parameter... The initial value is 0.03.
[0061] Then, the intermediate EEG embedding is updated according to formulas (11) and (12). :
[0062] (13);
[0063] in, This represents the global scaling parameter for residual correction;
[0064] Then, feature modulation vectors are generated through feature reweighting. :
[0065] (14);
[0066] in, ( The ) represents the feature reweighting transformation; in this embodiment, the feature gate hiding dimension is determined by max(16, 1024 / 4), i.e., 256, and the feature reweighting strength parameter is... The initial value is 0.03, and the weights and biases of the last layer of feature gating are initialized to zero.
[0067] Finally, the final EEG embedding representation is obtained according to formulas (13) and (14). :
[0068] (15);
[0069] in, This represents the feature reweighting strength parameter.
[0070] S5. Embedding visual representation With EEG Embedded Representation Mapping to a shared latent space aligns visual modality features with EEG modality features in a unified representation space; the visual embedding representation obtained in step S3 is then mapped to this space. EEG embedding representation obtained in step S4 Mapping to the same shared latent space enables visual modality features and EEG modality features to be comparable in a unified representation space, which is then used for subsequent cross-modal similarity calculation and alignment training. In this embodiment, both the visual embedding representation and the EEG embedding representation have a dimension of 1024 and are normalized during training.
[0071] S6. Visual embedding representation in the shared latent space based on contrastive learning loss function and EEG embedded representation Cross-modal alignment training is performed to obtain a visual neural decoding model, which is then used for visual neural decoding. Specifically, this involves constructing cross-modal positive and negative sample pairs, and pairing the visual stimulus images... and EEG signals As a positive sample ( , ); Maintain visual stimulus images If unchanged, select other EEG signals from the current batch that do not match it. (A sample pair consisting of a marker - used to distinguish it from the paired EEG signal); or maintaining the EEG signal Keeping the image unchanged, select other visual stimulus images from the current batch that are not paired with it. The resulting sample pairs are considered as negative sample pairs, and negative sample pairs are represented as ( , )or( , ).
[0072] Constructing a contrastive learning loss function , represented as:
[0073] (16);
[0074] in, Represents a visual encoder. Indicates brain encoder, Indicates temperature parameter, and These represent negative sample EEG signals and negative sample visual stimulus images, respectively. Represents the mathematical expectation operation, where This refers to the expectation calculation performed on paired visual stimulus images and EEG signal sample pairs. This indicates that the expectation is calculated for negative sample EEG signals. This indicates that the expectation calculation is performed on the negative sample visual images, and the expectation operation is achieved by averaging the current batch of samples.
[0075] During training, the contrastive learning loss function shown in formula (16) is used as the optimization objective. The loss value is calculated for the samples in the current batch, and the EEG encoder parameters are updated through the backpropagation algorithm, so that the visual embedding representation and the EEG embedding representation achieve cross-modal alignment in the shared latent space. In this embodiment, the temperature parameter is obtained by softplus transformation of the learnable logit_scale in the EEG encoder, and its initial value corresponds to log(1 / 0.07). During training, the visual encoder parameters are frozen, and only the EEG encoder parameters are updated. After multiple rounds of iterative training, the visual neural decoding model is obtained. The visual neural decoding model is not an additional model independent of the visual encoder and the EEG encoder, but is composed of the frozen visual encoder, the trained and optimized EEG encoder, and the similarity matching mechanism in the shared latent space.
[0076] During the inference phase, i.e., when performing visual neural decoding using the trained visual neural decoding model, the EEG signal to be decoded is input into the EEG encoder to obtain the corresponding EEG embedding representation. Simultaneously, each candidate visual stimulus image is input into the visual blurring prior module for transformation, resulting in transformed candidate visual input images. These transformed candidate visual input images are then input into the frozen visual encoder to obtain the corresponding visual embedding representation. The similarity between the EEG embedding representation and each candidate visual embedding representation is then calculated and sorted according to similarity. The candidate visual image with the highest similarity or the top K candidate visual images are used as the visual neural decoding result. Under this experimental setup, this embodiment achieves TOP-1 and TOP-5 accuracy improvements on a 200-class zero-shot retrieval task on the THINGS-EEG dataset.
[0077] Those skilled in the art will recognize that the embodiments described herein are intended to help the reader understand the principles of the invention, and should be understood that the scope of protection of the invention is not limited to such specific statements and embodiments. Those skilled in the art can make various other specific modifications and combinations based on the technical teachings disclosed in this invention without departing from the spirit of the invention, and these modifications and combinations are still within the scope of protection of this invention.
Claims
1. A neural decoding method based on attention-blurred visual priors and lightweight EEG coding, characterized in that, The specific implementation steps are as follows: S1. Acquire paired visual stimulus images and EEG signals; S2. Perform attention-weighted multi-level blurring processing on the visual stimulus image; first, generate a structural saliency response based on the image gray-level gradient information, and generate an attention map based on the central prior; construct low-blurred, medium-blurred, and high-blurred images respectively according to the preset weak blur kernel size, intermediate blur kernel ratio, and strong blur kernel size. The weights of the low-blurred, medium-blurred, and high-blurred branches are calculated based on the attention map. The low-blurred, medium-blurred, and high-blurred images are then weighted and fused to obtain the transformed visual input image. S3. Input the visual input image into the frozen visual encoder to obtain the visual embedding representation. ; S4. Input the EEG signal into the lightweight adaptive EEG encoder. First, perform channel scaling and time-gated input calibration on the EEG signal. Then, obtain the basic EEG embedding through the shallow projection backbone network. By combining adapter residual correction, sample-level residual gating, and feature reweighting, an EEG embedding representation is obtained. ; S5. Embedding visual representation With EEG Embedded Representation Mapping to a shared latent space aligns visual modal features with EEG modal features in a unified representation space; S6. Visual embedding representation in the shared latent space based on contrastive learning loss function and EEG embedded representation Cross-modal alignment training is performed to obtain a visual neural decoding model, and the trained visual neural decoding model is used for visual neural decoding.
2. The neural decoding method based on attention-blurred visual priors and lightweight EEG coding according to claim 1, characterized in that, The specific implementation method of step S1 is as follows: construct a paired training dataset. , Each sample group includes visual stimulus images. and the EEG signals corresponding to the visual stimulus image ; During training, small batches of paired samples are sampled from the training dataset. , The system inputs visual stimulus images into a visual encoder and EEG signals into an EEG encoder, thereby establishing a pairing correspondence between visual modalities and EEG modalities.
3. The neural decoding method based on attention-blurred visual priors and lightweight EEG coding according to claim 1, characterized in that, The specific implementation method of step S2 is as follows: For a visual stimulus image, it is first converted into a grayscale image, and the gradient response of the grayscale image in the horizontal and vertical directions is calculated to obtain a gradient magnitude map reflecting the saliency of the local structure of the image; the gradient magnitude map satisfy: (1); in, This represents the gradient response of a grayscale image in the horizontal direction. This represents the gradient response of the grayscale image in the vertical direction; subsequently, the gradient magnitude map is normalized and the nonlinear response is adjusted to obtain: (2); Among them, Norm( ) represents the normalization operation, and γ represents the gradient response adjustment parameter. The adjusted gradient magnitude map is then generated; the adjusted result is then smoothed, and a central prior map is introduced. Generate attention map : (3); in Indicates the central prior weight. Represents a two-dimensional Gaussian center prior plot; Then, based on the preset weak blur kernel size, intermediate blur kernel ratio, and strong blur kernel size, low-blur images are constructed respectively. Medium blur image and highly blurred images And the attention map obtained according to formula (3) Attention value at any spatial location Calculate the soft-assignment weights for the low-fuzzy branch, medium-fuzzy branch, and high-fuzzy branch respectively. : (4); in, This represents the attention center corresponding to the fuzziness level. Indicates the smoothing control parameters; The soft allocation weights obtained according to formula (4) are then normalized to obtain: (5); in, Represents the numerically stable term; Finally, the normalized weights obtained according to formula (5) are used to perform pixel-by-pixel weighted fusion of the low-blurred image, medium-blurred image, and high-blurred image to obtain the transformed visual input image: (6); Here, ⊙ represents element-wise multiplication.
4. The neural decoding method based on attention-blurred visual priors and lightweight EEG coding according to claim 1, characterized in that, The specific implementation method of step S4 is as follows: for the input EEG signal First, adaptive scaling is performed along the channel dimension to obtain: (7); in, Indicates the learnable channel scaling parameter; extract Contextual information at any moment , is represented as: (8); in, Indicates the dimension along the channel Perform an average calculation. Indicates the brainwave channel dimension. Indicates a time index; The time modulation term is obtained by mapping transformation through a time-gated network. : (9); in, ( ) represents a time-gated network; Represents the hyperbolic tangent function; The input EEG signal was calibrated according to formulas (7) and (9) to obtain: (10); in, Indicates the time modulation intensity parameter. This indicates the calibrated EEG signal; The calibrated EEG signals were then input into the shallow projection backbone network to obtain the basic EEG embedding. Based on this, residual correction terms are generated through the adapter residual correction branch. : (11); in, ( The ) represents the adapter transformation; then, the sample-level modulation factor is generated through the sample-level residual gating branch. : (12); in ( ) represents sample-level residual gating transformation. This represents the sample-level modulation intensity parameter; Then, the intermediate EEG embedding is updated according to formulas (11) and (12). : (13); in, This represents the global scaling parameter for residual correction; Then, feature modulation vectors are generated through feature reweighting. : (14); in, ( ) indicates a feature reweighting transformation; Finally, the final EEG embedding representation is obtained according to formulas (13) and (14). : (15); in, This represents the feature reweighting strength parameter.
5. The neural decoding method based on attention-blurred visual priors and lightweight EEG coding according to claim 1, characterized in that, The specific implementation method of step S6 is as follows: construct cross-modal positive sample pairs and negative sample pairs, and pair the visual stimulus images. and EEG signals As a positive sample ( , ); Maintain visual stimulus images If unchanged, select other EEG signals from the current batch that do not match it. The sample pairs formed; or maintaining EEG signals Keeping the image unchanged, select other visual stimulus images from the current batch that are not paired with it. The resulting sample pairs are considered as negative sample pairs, and negative sample pairs are represented as ( , )or( , ); Constructing a contrastive learning loss function , is represented as: (16); in, Represents a visual encoder. Indicates brain encoder, Indicates temperature parameter, Represents the mathematical expectation operation, where This refers to the expectation calculation performed on paired visual stimulus images and EEG signal sample pairs. This indicates that the expectation is calculated for negative sample EEG signals. This indicates that the expectation is calculated for the negative sample visual image; Using the contrastive learning loss function as the optimization objective, the parameters of the EEG encoder are updated through the backpropagation algorithm to obtain the visual neural decoding model. The visual neural decoding model is composed of a frozen visual encoder, a trained and optimized EEG encoder, and a similarity matching mechanism in a shared latent space. During the inference phase, the EEG signal to be decoded is input into the EEG encoder to obtain the EEG embedding representation. Simultaneously, each candidate visual stimulus image is input into the visual fuzzing prior module for transformation to obtain the transformed candidate visual input image. The transformed candidate visual input image is then input into the frozen visual encoder to obtain the visual embedding representation. The similarity between the EEG embedding representation and each candidate visual embedding representation is then calculated and sorted according to the similarity. The candidate visual image with the highest similarity or the top K candidate visual images are taken as the visual neural decoding result.