Multimodal neurophysiological signal processing method, apparatus, server, and storage medium
By using deep learning networks to extract and fuse features from multimodal neurobiological signals, the problem of nonlinear relationship analysis of neurobiological signals is solved, enabling accurate prediction of biological representations, improving prediction accuracy, and providing personalized labeling.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHENZHEN INST OF ADVANCED TECH CHINESE ACAD OF SCI
- Filing Date
- 2022-11-24
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies struggle to effectively analyze the complex nonlinear relationships in neurobiological signals, making it difficult to accurately predict neurobiological characteristics and affecting the accuracy of predicting drug response phenotypes for mental illness treatments.
A multimodal neurobiological signal processing method is adopted, which uses deep learning networks for feature extraction and fusion, and deep regression models for vital sign prediction. A feature fusion layer, a fully connected layer and a regression layer are constructed, and signal processing is performed by combining the Transformer model and a convolutional neural network.
It enables high-dimensional feature extraction of multimodal neurobiological signals, improves the prediction accuracy of biological representations, eliminates the need for manually designed features, can analyze nonlinear relationships, provide personalized biomarkers, and assist clinicians in assessment.
Smart Images

Figure CN115730269B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data processing technology, specifically to a multimodal neurobiological signal processing method, device, server, and storage medium. Background Technology
[0002] With the widespread use of medications for treating mental illnesses, the neurobiological characterization of these drugs has received increasing attention. However, due to a lack of cross-validation across different datasets and small sample sizes, robust neurobiological characterization of drug response phenotypes for mental illnesses remains scarce.
[0003] Currently, classic machine learning methods are commonly used to analyze the complex and variable relationships in neurobiological data such as rsEEG and magnetic resonance imaging (MRI) to identify neurobiological representations after drug treatment. While classic machine learning computational models can reflect neurobiological characteristics to some extent, they rely on human experience in feature design or selection, making it difficult to effectively analyze the complex nonlinear relationships in neurobiological signals to determine the neurobiological characteristics of therapeutic drug response phenotypes. This limitation affects the accuracy of predicting neurobiological representations. Therefore, achieving accurate prediction of neurobiological representations remains a pressing issue. Summary of the Invention
[0004] In view of this, embodiments of the present invention provide a multimodal neurobiological signal processing method, apparatus, server, and storage medium to solve the problem of the difficulty in accurately predicting neurobiological features.
[0005] According to a first aspect, the multimodal neurobiological signal processing method provided by embodiments of the present invention is used in a deep learning network. The deep learning network includes a pre-constructed deep learning model and a deep regression model. The deep regression model includes a feature fusion layer, a fully connected layer, and a regression layer. The method includes: acquiring a multimodal neurobiological signal to be processed; preprocessing the multimodal neurobiological signal to obtain a preprocessed multimodal neurobiological signal; inputting the preprocessed multimodal neurobiological signal into the deep learning model; extracting deep features of each modality of the multimodal neurobiological signal based on the deep learning model; inputting multiple deep features into the feature fusion layer for feature fusion to obtain a target fusion feature; inputting the target fusion feature into the regression layer via the fully connected layer; using the regression layer to predict the vital signs of the target fusion feature to generate a biological vital sign prediction result.
[0006] The multimodal neurobiological signal processing method provided in this invention employs a deep learning network to extract features from multimodal neurobiological signals and fuse multiple deep features. Then, it uses the fused target features to predict biosignatures, obtaining the predicted biosignature results. This method enables the fusion of multiple deep features based on objective multimodal neurobiological signals, achieving effective capture of various deep features without the need for manual feature design or selection. Furthermore, the deep learning network's multi-level nonlinear structure effectively analyzes the nonlinear relationships present in neurobiological signals, thereby extracting effective biosignatures from multiple deep features and achieving effective prediction of the biological representation of neural signals, improving the accuracy of biological representation prediction.
[0007] In conjunction with the first aspect, in the first embodiment of the first aspect, the deep learning model constructs multiple deep learning sub-models; the multimodal neurobiological signals are input into the deep learning model, and the deep features of each modality of neurobiological signals in the multimodal neurobiological signals are extracted based on the deep learning model, including: determining the type of multimodal neurobiological signals; determining each deep learning sub-model corresponding to each type of neurobiological signal; and inputting each type of neurobiological signal into the corresponding deep learning sub-model to obtain the deep features corresponding to each type of neurobiological signal.
[0008] The multimodal neurobiological signal processing method provided in this invention improves the accuracy of deep feature extraction by inputting different types of neurobiological signals into corresponding deep learning sub-models for deep feature extraction.
[0009] In conjunction with the first embodiment of the first aspect, in the second embodiment of the first aspect, when the neurobiological signal is a first type of neurobiological signal, the first deep learning sub-model corresponding to the first type of neurobiological signal includes a model group composed of multiple first sub-models connected in series and a second sub-model; inputting the first type of neurobiological signal into the first deep learning sub-model to obtain the depth features corresponding to the first type of neurobiological signal includes: inputting the first type of neurobiological signal into the second sub-model for feature extraction to obtain first feature data; fusing the first feature data with a pre-configured position code to obtain fused data; and inputting the fused data into the model group for feature extraction to obtain the depth features corresponding to the first type of neurobiological signal.
[0010] In conjunction with the second embodiment of the first aspect, in the third embodiment of the first aspect, the first type of neurobiological signal is an electroencephalogram (EEG) signal; the first sub-model includes a Transformer model; the second sub-model includes a convolutional neural network (CNN) model; the CNN includes convolutional layers and average pooling layers; the Transformer model includes an attention module and a feedforward network module; the attention module includes a multi-head attention layer and a first normalization layer; the feedforward network module includes a fully connected feedforward layer and a second normalization layer; the multi-head attention layer is connected to the first normalization layer; the fully connected feedforward layer is connected to the second normalization layer; the first normalization layer is connected to the fully connected feedforward layer; the attention module and the feedforward network module have corresponding shortcut connections.
[0011] The multimodal neurobiological signal processing method provided in this invention constructs a first deep learning sub-model for electroencephalogram (EEG) signals, which is composed of a multi-layer deep structure (a model group consisting of a first sub-model and multiple second sub-models). This enables high-dimensional feature extraction of EEG signals, thereby enabling more accurate mining and extraction of the spatiotemporal features and long-range dependencies of EEG signals.
[0012] In conjunction with the first embodiment of the first aspect, in the fourth embodiment of the first aspect, when the neurobiological signal is a second type of neurobiological signal, the second deep learning sub-model corresponding to the second type of neurobiological signal includes a model group composed of multiple first sub-models connected in series and a third sub-model. The third sub-model includes a convolutional layer, a max pooling layer, and a fully connected layer. Inputting the second type of neurobiological signal into the second deep learning sub-model to obtain the depth features corresponding to the second type of neurobiological signal includes: inputting the second type of neurobiological signal into the convolutional layer of the third sub-model for multidimensional convolution processing to obtain the convolution processing result; inputting the convolution processing result into the model group for feature extraction, and outputting the depth features corresponding to the second type of neurobiological signal through the max pooling layer and the fully connected layer.
[0013] In conjunction with the fourth embodiment of the first aspect, in the fifth embodiment of the first aspect, the second type of neurobiological signal is a functional magnetic resonance imaging (fMRI) signal; the first sub-model includes a Transformer model; the third sub-model includes a point 4D convolutional network model; the point 4D convolutional network model includes a point 4D convolutional layer, a max pooling layer, and a fully connected layer; a model group composed of multiple Transformer models connected in series is set between the point 4D convolutional layer and the max pooling layer; the Transformer model includes an attention module and a feedforward network module; the attention module includes a multi-head attention layer and a first normalization layer; the feedforward network module includes a fully connected feedforward layer and a second normalization layer; the multi-head attention layer is connected to the first normalization layer; the fully connected feedforward layer is connected to the second normalization layer; the first normalization layer is connected to the fully connected feedforward layer; the attention module and the feedforward network module have corresponding shortcut connections.
[0014] The multimodal neurobiological signal processing method provided in this invention constructs a second deep learning sub-model for functional magnetic resonance (fMRI) signals. This second deep learning sub-model consists of a multi-layer deep structure (a model group composed of a third sub-model and multiple second sub-models), thereby achieving high-dimensional feature extraction of fMRI signals. This enables more accurate mining and extraction of the spatial and functional connectivity features of fMRI signals.
[0015] In conjunction with the first aspect, in the sixth embodiment of the first aspect, before inputting the target fusion feature into the regression layer via the fully connected layer, and using the regression layer to predict the vital signs of the target fusion feature to generate a biological vital sign prediction result, the method further includes: obtaining the data type corresponding to the target fusion feature; constructing a loss function corresponding to the data type; and using the loss function to optimize the parameters of the deep regression network, wherein the regression layer is deployed in the deep regression network.
[0016] The multimodal neurobiological signal processing method provided in this invention constructs a loss function, which facilitates joint learning and optimization of the target fusion feature learning loss and regression loss to improve the regression performance of the regression layer and ensure the analysis accuracy of the regression layer.
[0017] In conjunction with the sixth embodiment of the first aspect, in the seventh embodiment of the first aspect, the construction of the loss function corresponding to the data type includes: when the data type is continuous, constructing a loss function based on data error parameters; wherein, the data error parameters include mean absolute error, root mean square error, and median absolute error, and the loss function is: Loss=α*RMSE+β*MAE+μ*MedAE+λ*||W||; where Loss represents the loss function; RMSE represents the root mean square error; MAE represents the mean absolute error; MedAE represents the median absolute error; ||W|| represents the regularization term; and α, β, μ, and λ represent network parameters.
[0018] The multimodal neurobiological signal processing method provided in this invention adds a regularization term to the loss function during its construction to correct the loss function, thereby minimizing overfitting.
[0019] In conjunction with the first aspect, in the eighth embodiment of the first aspect, the method further includes: obtaining the numerical range of the predicted biosign result; quantifying the numerical range, dividing the numerical range into several intervals, and obtaining the prediction level corresponding to the several intervals.
[0020] The multimodal neurobiological signal processing method provided in this invention can assist clinicians in effectively evaluating biological representations by quantifying numerical ranges to divide them into several intervals and setting different prediction levels for different intervals.
[0021] In conjunction with the first aspect, in the ninth embodiment of the first aspect, the method further includes: extracting biomarkers from the biosignature prediction results based on a preset interpretable artificial intelligence method; performing replication analysis and convergence analysis on the biomarkers to determine individual biomarkers from the biomarkers.
[0022] The multimodal neurobiological signal processing method provided in this invention extracts biological features by applying interpretable artificial intelligence to the prediction results of biological signs, and determines individualized biomarkers through replication analysis and convergence analysis. This method has significant neurobiological implications for the portability and stability of biological features and can provide potentially powerful assistance to clinicians dealing with mental illnesses.
[0023] According to a second aspect, embodiments of the present invention provide a multimodal neurobiological signal processing device for use in a deep learning network. The deep learning network includes a pre-constructed deep learning model and a deep regression model. The deep regression model includes a feature fusion layer, a fully connected layer, and a regression layer. The device includes: an acquisition module for acquiring multimodal neurobiological signals to be processed and preprocessing the multimodal neurobiological signals to obtain preprocessed multimodal neurobiological signals; a feature extraction module for inputting the preprocessed multimodal neurobiological signals into the deep learning model and extracting deep features of each modality of the multimodal neurobiological signals based on the deep learning model; a feature fusion module for inputting multiple deep features into the feature fusion layer for feature fusion to obtain target fused features; and a prediction module for inputting the target fused features into the regression layer via the fully connected layer and using the regression layer to predict the vital signs of the target fused features to generate a biological vital sign prediction result.
[0024] According to a third aspect, embodiments of the present invention provide a server, including: a memory and a processor, wherein the memory and the processor are communicatively connected to each other, the memory stores computer instructions, and the processor executes the computer instructions to perform the multimodal neurobiological signal processing method described in the first aspect or any embodiment of the first aspect.
[0025] According to a fourth aspect, embodiments of the present invention provide a computer-readable storage medium storing computer instructions for causing a computer to perform the multimodal neurobiological signal processing method described in the first aspect or any embodiment of the first aspect.
[0026] It should be noted that the beneficial effects of the multimodal neurobiological signal processing device, server, and computer-readable storage medium provided in the embodiments of the present invention can be found in the description of the corresponding content in the multimodal neurobiological signal processing method, and will not be repeated here. Attached Figure Description
[0027] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0028] Figure 1 This is a flowchart of a multimodal neurobiological signal processing method according to an embodiment of the present invention;
[0029] Figure 2 This is a schematic diagram of a deep learning network according to an embodiment of the present invention;
[0030] Figure 3 This is another flowchart of a multimodal neurobiological signal processing method according to an embodiment of the present invention;
[0031] Figure 4 This is another schematic diagram of a deep learning network according to an embodiment of the present invention;
[0032] Figure 5 This is a schematic diagram of the fusion of the CNN-Trans model according to an embodiment of the present invention;
[0033] Figure 6 This is a schematic diagram of the fusion of a 4D-Trans model according to an embodiment of the present invention;
[0034] Figure 7 This is another flowchart of the multimodal neurobiological signal processing method according to an embodiment of the present invention;
[0035] Figure 8 This is a schematic diagram illustrating the determination of biometric features according to an embodiment of the present invention;
[0036] Figure 9 This is a schematic diagram illustrating a clinical auxiliary application according to an embodiment of the present invention;
[0037] Figure 10 This is a schematic diagram of a deep learning system according to an embodiment of the present invention;
[0038] Figure 11 This is a structural block diagram of a multimodal neurobiological signal processing device according to an embodiment of the present invention;
[0039] Figure 12 This is a schematic diagram of the hardware structure of the server provided in an embodiment of the present invention. Detailed Implementation
[0040] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0041] According to an embodiment of the present invention, an embodiment of a multimodal neural biological signal processing method is provided. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. Furthermore, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in a different order than that shown here.
[0042] This embodiment provides a multimodal neurobiological signal processing method that can be used in a deep learning network. The deep learning network includes a pre-built deep learning model and a deep regression model. The deep regression model includes a feature fusion layer, a fully connected layer, and a regression layer.
[0043] Figure 1 This is a flowchart of a multimodal neurobiological signal processing method according to an embodiment of the present invention, such as... Figure 1 As shown, the process includes the following steps:
[0044] S11, acquire the multimodal neurobiological signal to be processed, preprocess the multimodal neurobiological signal to obtain the preprocessed multimodal neurobiological signal.
[0045] Multimodal neurobiological signals are various types of neurobiological signals collected from individuals with mental illnesses using acquisition devices. These signals can be obtained from clinical databases containing datasets of neurobiological signals from multiple research sites targeting patients with mental illnesses.
[0046] Specifically, multimodal neurobiological signals can include electroencephalogram (EEG) signals, functional magnetic resonance imaging (fMRI) signals, and behavioral scale data, etc.
[0047] Since artifacts are inevitably introduced during the acquisition of multimodal neurobiological signals due to the acquisition equipment itself, the patient, or external interference, preprocessing is required before analyzing multimodal neurobiological signals.
[0048] For EEG, a fully automated artifact removal workflow is employed to minimize biases caused by manual artifact removal during preprocessing. Specific preprocessing steps include:
[0049] a) Downsample the EEG to 250Hz. b) Use a notch filter to eliminate 50Hz power frequency interference in the EEG. c) Use a 0.01Hz high-pass filter to remove non-physiological slow-wave drift in the EEG. d) Rereference the EEG data using whole-brain averaging. e) Segment the EEG along the time dimension, for example, each segment is 3 seconds long. f) Remove artifact segments using an amplitude thresholding method, i.e., remove artifact segments by detecting whether the EEG amplitude exceeds a set threshold. g) Remove bad channels by thresholding the spatial correlation between channels, using interpolation to interpolate the EEG of bad channels from the EEG of adjacent channels. Since the EEG of normal channels has a certain correlation with the EEG of adjacent channels, but the EEG of bad channels is uncorrelated or has a very low correlation with the EEG of adjacent channels, the removal of bad channels can be achieved by calculating the correlation between the EEG of adjacent channels. For example, calculate the Pearson correlation coefficient between adjacent channels. When the Pearson correlation coefficient is below a threshold, it indicates the presence of bad channels, which are then discarded. Since EEG data from bad channels may be unusable due to interference and noise introduced during the acquisition process, directly discarding EEG data from bad channels would result in a reduction in the number of EEG leads, leading to insufficient data. In this case, EEG data from bad channels can be simulated using interpolation methods (such as spherical spline interpolation, cubic spline interpolation, etc.) based on the correlation between EEG data from normal channels and EEG data from adjacent channels. h) Discard EEG data from subjects with a bad channel rate exceeding 20%. i) Use Independent Component Correlation Algorithm (ICA) to identify and remove artifacts such as electromyography (EMG), electrooculography (EOG), and electrocardiography (ECG).
[0050] For fMRI, preprocessing can be performed using the FSL tool, and the specific steps include:
[0051] a) Using affine registration matrices and boundary-based registration, the corresponding functional images from fMRI are re-registered to structural images. The T1-weighted images are non-linearly normalized and connected to standard templates of the MNI standard brain T1-weighted images. b) Interference signals corresponding to the segmented white matter and cerebrospinal fluid are regressed from the motion-corrected functional images. During MRI scans, subjects may move their bodies, leading to motion artifacts in the obtained images due to positional mismatches. These artifacts can be corrected using rigid body transformations or motion-corrected AIR software. The FSL tool can be used to directly segment brain tissue types such as white matter, gray matter, and cerebrospinal fluid. Then, interference signals corresponding to white matter and cerebrospinal fluid are regressed from the motion-corrected functional images using regression algorithms (e.g., Autoregression Moving Average model, ARMA). c) 6mm full-width Gaussian kernel smoothing is used for fMRI images. d) An absolute motion level cutoff point is set to ensure image measurement quality. This cutoff point can be determined based on the actual usage scenario, or a default value can be selected.
[0052] For behavioral scale data, this includes: age, heart rate, blood pressure, gender, ethnicity, and scale assessment data. Preprocessing behavioral scale data involves digitizing and quantifying the data to generate corresponding feature matrices.
[0053] The aforementioned multimodal neurobiological signals are not limited to EEG, fMRI, and behavioral scale data, but may also include magnetoencephalography (MEG), near-infrared spectroscopy (NIRS), diffusion tensor imaging (DTI), positron emission tomography (PET), eye movement, blood samples, genetic data, etc. No specific limitations are set here, and those skilled in the art can determine the appropriate data based on actual needs. It should be noted that the aforementioned multimodal neurobiological signals include control data comparing psychiatric medications with placebos.
[0054] S12, the preprocessed multimodal neurobiological signals are input into the deep learning model, and the deep features of each modality of neurobiological signals are extracted based on the deep learning model.
[0055] A deep learning model is a model pre-trained using multimodal neurobiological signals. This model takes multimodal neurobiological signals as input and outputs deep features, and consists of multiple deep structures. Preprocessed multimodal neurobiological signals are input into the deep learning model, which then extracts deep features for each modality of the neurobiological signal.
[0056] S13, input multiple deep features into the feature fusion layer to perform feature fusion and obtain the target fused features.
[0057] Multiple deep features extracted from multimodal neurobiological signals are input into the feature fusion layer in the deep regression model. The feature fusion layer fuses the multiple deep features to obtain the fused target feature.
[0058] S14: The target fusion features are input into the regression layer after passing through the fully connected layer. The regression layer is used to predict the vital signs of the target fusion features and generate biological vital sign prediction results.
[0059] The biomarker prediction results are used to characterize the prediction results before and after drug treatment for mental illness. The biomarker prediction results are characterized by prediction labels, which are calculated based on the difference in the values of the biomarker scales before and after drug treatment for mental illness.
[0060] The feature fusion layer is connected to the regression layer through two fully connected layers. The target fused features output by the feature fusion layer are then fed into the regression layer of the deep regression model after passing through the two fully connected layers to obtain the biometric prediction results, such as... Figure 2 As shown, the deep regression model is a supervised learning model that uses the target fused features in the regression layer to predict biological characteristics and outputs the predicted biological characteristics.
[0061] With HAMD 17 For example, the HAMD scale was collected in advance for each subject before they took the psychiatric medication. 17 Data value H0; HAMD will be collected again for each subject 8 weeks after medication. 17 Data H8 will include the HAMD data before medication. 17 Value minus HAMD after eight weeks of medication 17 The values (H0-H8) represent the predicted biosignatures. This allows for the correlation between the deep features of multimodal neurobiological signals and the efficacy of psychiatric medications, enabling accurate prediction of drug efficacy and outputting corresponding biosignature prediction results for evaluation.
[0062] The multimodal neurobiological signal processing method provided in this embodiment employs a deep learning network to extract features from multimodal neurobiological signals and fuse multiple deep features. Then, it uses the fused target features to predict biosignatures, obtaining the predicted biosignature results. This method enables the fusion of multiple deep features based on objective multimodal neurobiological signals, achieving effective capture of various deep features without the need for manual feature design or selection. Furthermore, the deep learning network's multi-level nonlinear structure effectively analyzes the nonlinear relationships present in neurobiological signals, thereby extracting effective biosignatures from multiple deep features and achieving effective prediction of the biological representation of neural signals, improving the accuracy of biological representation prediction.
[0063] This embodiment provides a multimodal neurobiological signal processing method that can be used in a deep learning network. The deep learning network includes a pre-built deep learning model and a deep regression model. The deep regression model includes a feature fusion layer, a fully connected layer, and a regression layer.
[0064] Figure 3 This is a flowchart of a multimodal neurobiological signal processing method according to an embodiment of the present invention, such as... Figure 3 As shown, the process includes the following steps:
[0065] S21, acquire the multimodal neurobiological signal to be processed, preprocess the multimodal neurobiological signal to obtain the preprocessed multimodal neurobiological signal. For detailed explanation, please refer to the relevant descriptions in the above embodiments, which will not be repeated here.
[0066] S22, the preprocessed multimodal neurobiological signals are input into the deep learning model, and the deep features of each modality of neurobiological signals are extracted based on the deep learning model.
[0067] Specifically, the deep learning model pre-constructs multiple deep learning sub-models, each designed for different types of neurobiological signals. Each sub-model is pre-trained based on its corresponding neurobiological signal sample. Accordingly, step S22 may include:
[0068] S221, Identify the types of multimodal neurobiological signals.
[0069] As mentioned above, multimodal neurobiological signals include various modalities of neurobiological signals, each with different characteristic information. When multimodal neurobiological signals are input into a deep learning network, different types of neurobiological signals are distinguished based on their characteristic information.
[0070] S222, determine the deep learning sub-models corresponding to each type of neurobiological signal.
[0071] Deep learning sub-models correspond to types of neurobiological signals, and based on this correspondence, deep learning sub-models corresponding to each type of neurobiological signal can be determined. For example, ... Figure 4 As shown, multimodal neurobiological signals include signals A, B, and C, and these signals belong to different types. The deep learning model includes sub-models a, b, and c. Specifically, signal A corresponds to sub-model a; signal B corresponds to sub-model b; and signal C corresponds to sub-model c.
[0072] S223, input each type of neurobiological signal into the corresponding deep learning sub-model to obtain the deep features corresponding to each type of neurobiological signal.
[0073] Deep features are high-dimensional features generated for neurobiological signals, represented by a high-dimensional feature matrix. Based on the types of neurobiological signals determined in the above steps and the corresponding deep learning sub-models, each type of neurobiological signal can be input into the corresponding deep learning sub-model. For example, signal A can be input into sub-model a, signal B into sub-model b, and signal C into sub-model c, thereby obtaining the corresponding deep features.
[0074] When the neurobiological signal is a first type of neurobiological signal, the first deep learning sub-model corresponding to the first type of neurobiological signal includes a model group composed of multiple first sub-models connected in series and a second sub-model, thereby realizing the multi-layer deep structure design for the first deep learning sub-model.
[0075] Accordingly, the first type of neurobiological signal is input into the first deep learning sub-model to obtain deep features corresponding to the first type of neurobiological signal, including:
[0076] (1) Input the first type of neurobiological signal into the second sub-model for feature extraction to obtain the first feature data.
[0077] (2) The first feature data is fused with the pre-configured location code to obtain fused data.
[0078] (3) Input the fused data into the model group for feature extraction to obtain the depth features corresponding to the first type of neurobiological signal.
[0079] The second sub-model is used to initially extract the neurobiological features of the first type of neurobiological signal. A model group constructed from multiple first sub-models is used to further extract the deep features of the neurobiological signal.
[0080] The neurobiological features initially extracted through the second sub-model are fused with location codes as the first feature data to obtain fused data of neurobiological features and location codes. Here, setting location codes in the neurobiological features facilitates the determination of the location of each segment of the first type of neurobiological signal or the location where it was acquired. Taking EEG signals as an example, the positions of each acquisition electrode can be encoded using certain encoding rules and fused into the initially extracted neurobiological features, facilitating the subsequent determination of the depth features at each acquisition electrode location.
[0081] The fused data obtained above is then input into the model set for high-dimensional feature extraction to extract features that can characterize the spatiotemporal features and long-range dependencies of EEG signals. The features output by the model set are identified as the deep features of the first type of neurobiological signal.
[0082] There are no restrictions on the first and second sub-models here, as long as they can achieve the extraction of deep features. Those skilled in the art can determine them according to actual needs.
[0083] In one specific implementation, the first type of neurobiological signal is an electroencephalogram (EEG) signal, the first sub-model includes a Transformer model, and the second sub-model includes a convolutional neural network model. For example... Figure 5 As shown, the convolutional neural network includes convolutional layers and average pooling layers. The Transformer model includes an attention module and a feedforward network module. The attention module includes a multi-head attention layer and a first normalization layer, while the feedforward network module includes a fully connected feedforward layer and a second normalization layer. The multi-head attention layer is connected to the first normalization layer; the fully connected feedforward layer is connected to the second normalization layer; the first normalization layer is connected to the fully connected feedforward layer; the attention module and the feedforward network module have corresponding shortcut connections.
[0084] The Transformer model, based on multi-head attention, boasts advantages such as broad applicability, good interpretability, and the ability to capture long-range dependencies in sequential signals. To more accurately mine and extract the spatiotemporal features and long-range dependencies of EEG signals, a fusion of the Transformer model and Convolutional Neural Networks (CNNs) can be chosen. Figure 5 This is a block diagram of the CNN-Trans model obtained by fusing the Transformer model with the CNN model.
[0085] in, Figure 5 (a) is a structural diagram of the Transformer model. The Transformer model includes an attention module and a feedforward network module. The attention module includes a multi-head attention layer, followed immediately by a first normalization layer. The feedforward network module includes a fully connected feedforward layer, followed immediately by a second normalization layer. The attention module and the feedforward network module are surrounded by shortcut connections. The attention module includes a multi-head attention layer, allowing the Transformer model to pay attention to information from different locations, for example, using an 8-head attention layer (i.e., 8 attention heads). The number of attention layers is not limited here; those skilled in the art can determine it according to actual needs.
[0086] Figure 5(b) is a structural diagram of a CNN model, which includes convolutional layers and average pooling layers. Figure 5 As shown in (c), the CNN model, after fusing positional encoding, is input into a model group obtained by concatenating multiple Transformer models to construct the CNN-Trans model structure. For example, a model group obtained by concatenating three Transformer models. The number of Transformer models concatenated is not limited here, and those skilled in the art can determine it according to actual needs.
[0087] The deep features of EEG signals can be extracted using the CNN-Trans model. Here, transfer learning can be employed, using a CNN-Trans model trained on normal human EEG samples as the initial model to effectively improve the efficiency and accuracy of CNN-Trans model training.
[0088] When the neurobiological signal is a second type of neurobiological signal, the second deep learning sub-model corresponding to the second type of neurobiological signal includes a model group composed of multiple first sub-models connected in series and a third sub-model. The third sub-model includes a convolutional layer, a max pooling layer and a fully connected layer, thereby realizing the multi-layer deep structure design for the second deep learning sub-model.
[0089] Accordingly, the second type of neurobiological signal is input into the second deep learning sub-model to obtain deep features corresponding to the second type of neurobiological signal, including:
[0090] (1) Input the second type of neurobiological signal into the convolutional layer of the third sub-model for multidimensional convolution processing to obtain the convolution processing result.
[0091] (2) Input the convolution processing result into the model group for feature extraction, and output the deep features corresponding to the second type of neurobiological signal through the max pooling layer and the fully connected layer.
[0092] The third sub-model is used to perform multidimensional convolution processing on the second type of neurobiological signal, and a model group composed of multiple first sub-models connected in series is used to extract deep features from the second type of neurobiological signal.
[0093] The second type of neurobiosignal is processed by multidimensional convolution in a third sub-model to convert it into a multidimensional matrix representation. The convolution result is then input into the model group for high-dimensional feature extraction, extracting features that characterize the spatial and functional connectivity of the functional magnetic resonance imaging (fMRI) signal. The high-dimensional features output from the model group are then passed through a max-pooling layer and a fully connected layer to obtain the depth features of the second type of neurobiosignal.
[0094] There are no restrictions on the first and third sub-models here, as long as they can achieve the extraction of deep features. Those skilled in the art can determine them according to actual needs.
[0095] In one specific implementation, the second type of neurobiological signal is a functional magnetic resonance imaging (fMRI) signal; the first sub-model includes a Transformer model; and the third sub-model includes a point-based 4D convolutional network model. For example... Figure 6 As shown, the point 4D convolutional network model includes point 4D convolutional layers, max pooling layers, and fully connected layers; a model group consisting of multiple Transformer models cascaded together is set between the point 4D convolutional layers and the max pooling layers; the Transformer model includes an attention module and a feedforward network module; the attention module includes a multi-head attention layer and a first normalization layer; the feedforward network module includes a fully connected feedforward layer and a second normalization layer; the multi-head attention layer is connected to the first normalization layer; the fully connected feedforward layer is connected to the second normalization layer; the first normalization layer is connected to the fully connected feedforward layer; the attention module and the feedforward network module have corresponding shortcut connections.
[0096] To more accurately mine and extract spatial and functional connectivity features of functional magnetic resonance (fMRI) signals, we can choose to fuse the Transformer model with a point-based 4D convolutional network model. Figure 6 This is a block diagram of the 4D-Trans model obtained by fusing the Transformer model with a 4D convolutional network model.
[0097] The Transformer model includes an attention module and a feedforward network module. The attention module consists of a multi-head attention layer followed by a first normalization layer. The feedforward network module consists of a fully connected feedforward layer followed by a second normalization layer. The attention module and feedforward network module are surrounded by shortcut connections; see [link to details]. Figure 5 As shown, the Transformer model can be used to convert the corresponding two-dimensional operations into three-dimensional / four-dimensional operations as needed.
[0098] A point-4D convolutional network model includes point-4D convolutional layers, max-pooling layers, and fully connected layers. For example... Figure 6 As shown, the output of the point-4D convolutional layer of the point-4D convolutional network model is used as the input to a model group obtained by concatenating multiple Transformer models. The output of the model group is then fed into a max-pooling layer and a fully connected layer to construct a 4D-Trans model structure. For example, a model group obtained by concatenating three Transformer models. The number of Transformer models concatenated is not limited here; those skilled in the art can determine it according to actual needs.
[0099] This 4D-Trans model can be used to extract depth features from functional magnetic resonance imaging (fMRI) signals. Here, transfer learning can be employed, using a 4D-Trans model trained on normal human fMRI samples as the initialization model to effectively improve the efficiency and accuracy of 4D-Trans model training.
[0100] S23, multiple deep features are input to the feature fusion layer for feature fusion to obtain the target fused features. For detailed explanations, please refer to the relevant descriptions in the above embodiments; they will not be repeated here.
[0101] S24, the target fusion features are input into the regression layer after passing through a fully connected layer. The regression layer is used to predict the vital signs of the target fusion features, generating biological characteristic prediction results. For detailed explanations, please refer to the relevant descriptions in the above embodiments, which will not be repeated here.
[0102] The multimodal neurobiological signal processing method provided in this embodiment extracts deep features by inputting different types of neurobiological signals into corresponding deep learning sub-models, which facilitates the targeted extraction of deep features of various types of neurobiological signals and improves the extraction accuracy of deep features.
[0103] This embodiment provides a multimodal neurobiological signal processing method that can be used in a deep learning network. The deep learning network includes a pre-built deep learning model and a deep regression model. The deep regression model includes a feature fusion layer, a fully connected layer, and a regression layer.
[0104] Figure 7 This is a flowchart of a multimodal neurobiological signal processing method according to an embodiment of the present invention, such as... Figure 7 As shown, the process includes the following steps:
[0105] S31, acquire the multimodal neurobiological signal to be processed, preprocess the multimodal neurobiological signal to obtain the preprocessed multimodal neurobiological signal. For detailed explanation, please refer to the relevant descriptions in the above embodiments, which will not be repeated here.
[0106] S32, the preprocessed multimodal neurobiological signals are input into a deep learning model, and the deep features of each modality of neurobiological signal are extracted based on the deep learning model. For detailed explanations, please refer to the relevant descriptions in the above embodiments; they will not be repeated here.
[0107] S33, multiple deep features are input to the feature fusion layer for feature fusion to obtain the target fused features. For detailed explanations, please refer to the relevant descriptions in the above embodiments; they will not be repeated here.
[0108] S34, Obtain the data type corresponding to the target fusion feature.
[0109] Data types include discrete and continuous data. The regression layer can perform classification prediction for discrete values and regression prediction for continuous values. When inputting target fusion features into the regression layer for vital sign prediction, the value characteristics of the target fusion features can be analyzed to determine the data type of the obtained target fusion features.
[0110] S35, Construct the loss function corresponding to the data type.
[0111] Different loss functions can be constructed for different data types.
[0112] Specifically, for discrete data, cross-entropy can be used as the loss function to train the deep regression model in the regression layer for classification, and a softmax activation function can be added after the last fully connected layer to normalize the output value.
[0113] Specifically, when the data type is continuous, a loss function can be constructed using data error parameters to optimize the deep regression model during training. These data error parameters include mean absolute error, root mean square error, and median absolute error. The constructed loss function is as follows:
[0114] Loss=α*RMSE+β*MAE+μ*MedAE+λ*||W||;
[0115] Where Loss represents the loss function; RMSE represents the root mean square error; MAE represents the mean absolute error; MedAE represents the median absolute error; ||W|| represents the regularization term; and α, β, μ, and λ represent the network parameters.
[0116] Since RMSE, MAE, and MedAE are three different evaluation metrics, this study considers all three metrics and integrates them into the Loss function to improve regression accuracy. ||W|| is a regularization term related to the network parameters; adding a regularization term to the Loss function can reduce overfitting to some extent.
[0117] α, β, μ, and λ need to be determined through network training. Specifically, the backpropagation algorithm can be used to calculate the gradient of the loss function with respect to the network parameters, and optimization methods can be used to update the network parameters to reduce the loss function until the loss function is reduced to its minimum value, at which point the network converges. The optimal network parameters are then determined.
[0118] It should be noted that in regression prediction with continuous data types, no activation function is added after the last fully connected layer, or only a linear activation function is added. Linear or non-linear activation functions can be added to other network layers as needed. The continuous predicted values output by the regression layer also need to be normalized to the range [0-1] to make the deep learning model easier to train and converge, ensuring that the model can obtain better prediction results.
[0119] S36. The parameters of the deep regression network are optimized using a loss function, wherein the regression layer is deployed in the deep regression network.
[0120] After obtaining the loss function, all parameters of the deep regression network can be optimized using the loss function until the loss function is reduced to its minimum value and the network converges. The network parameters determined at this point are then identified as the optimal network parameters, and the deep regression network model is deployed according to the optimal network parameters.
[0121] S37, the target fusion features are input into the regression layer after passing through a fully connected layer. The regression layer is used to predict the vital signs of the target fusion features, generating biological characteristic prediction results. For detailed explanations, please refer to the relevant descriptions in the above embodiments, which will not be repeated here.
[0122] S38, obtain the numerical range of the predicted biological signs.
[0123] Biometric prediction results include discrete value prediction results and continuous value prediction results. In the discrete value prediction results, the minimum and maximum values constitute the numerical range corresponding to the discrete value prediction result, while the continuous value prediction results are the numerical range formed by the continuous data values.
[0124] S39 quantizes the numerical range, dividing it into several intervals to obtain the prediction level corresponding to each interval.
[0125] For discrete values, the biometric prediction results here are equivalent to classification. Different prediction levels can be set to correspond to the classification results of the discrete values. For example, 0 - completely ineffective, 1 - slightly effective, 2 - effective, 3 - very effective.
[0126] For continuous values, the numerical range can be quantized, dividing it into multiple intervals for classification training and prediction, thus determining the prediction level corresponding to each interval. For example, if the numerical range is [-5, 15], it can be divided into four intervals, with a corresponding prediction level assigned to each interval. Specifically, the interval [-5, 0] is set to 0 - completely ineffective, the interval [0, 5] to 1 - slightly effective, the interval [5, 10] to 2 - effective, and the interval [10, 15] to 3 - very effective. Finally, the prediction results are comprehensively evaluated using accuracy, precision, F1 score, and AUC score.
[0127] Specifically, the prediction results can be comprehensively evaluated by calculating indicators such as mean absolute error (MAE), mean absolute percentage error (MAPE), mean square error (MSE), root mean square error (RMSE), logarithm of mean square error (MSLE), median absolute error (MedAE), and coefficient of determination (r² score).
[0128] Among them, indicators such as MAE, MAPE, MSE, RMSE, MSLE, and MedAE are used to assess the difference between the actual value and the predicted value, and the smaller the value, the better.
[0129] The r2 score is a statistic used to measure the goodness of fit. The maximum value of the r2 score is 1. The closer the r2 score is to 1, the better the deep regression model fits the observations. Conversely, the smaller the r2 score is, the worse the deep regression model fits the observations.
[0130] S310 extracts biological features from biometric prediction results based on a pre-defined interpretable artificial intelligence method.
[0131] Because mental illnesses arising from abnormal neurobiological signals exhibit high neurobiological heterogeneity, conventional group analysis methods struggle to identify biomarkers characterizing these illnesses. Interpretable artificial intelligence methods, based on deep learning networks, can achieve individual-level analysis of organisms.
[0132] This section utilizes Explainable Artificial Intelligence (XAI) methods from deep learning networks to analyze biometrics related to the efficacy of medications for mental illnesses from biometric prediction results. Specifically, by applying XAI to a deep regression model and performing individual-level analysis on the biometric prediction results output by the deep regression model, individual-level biometrics related to the response to medications for mental illnesses can be identified.
[0133] Explainable artificial intelligence (XAI) methods include local substitution visualization methods, occlusion analysis methods, gradient visualization methods, and hierarchical correlation propagation methods. No limitation is made on the specific XAI method to be used here, and those skilled in the art can determine it according to actual needs.
[0134] For example, the hierarchical correlation propagation (LRP) method can be used to extract biometrics. The specific implementation process includes: inputting processed neurobiological signals into a deep regression model to obtain the output of the regression layer (i.e., the current biometric prediction result); and propagating the correlation of the regression layer's output back to the input layer based on the LRP correlation criterion. This yields the LRP values for all features (data points) of the input data. A higher LRP value indicates a greater contribution of that feature (data point) to the prediction / decision process. Based on this process, an LRP heatmap can be generated for each subject, allowing for the extraction of individual-level biometrics.
[0135] S311 performs replication and convergence analysis on biometrics to identify individual biomarkers from the biometrics.
[0136] Replication and convergence analyses were performed on the above biometrics, such as... Figure 8 As shown, the portability and stability of biomarkers were evaluated through replication analysis and convergence analysis to determine whether biomarkers for drug efficacy prediction could serve as potential biomarkers. Furthermore, the drug efficacy prediction results and biomarkers could be cross-validated using the results of other non-pharmacological treatments, further validating the neurobiological and clinical significance of the drug efficacy prediction results and biomarkers.
[0137] The replication analysis involves applying biomarker predictions and biofeatures of drug efficacy responses to a local database. Cross-dataset replication analysis within the database is then used to assess the portability and stability of the drug efficacy predictions and biomarkers, determining whether a biomarker can serve as a biomarker. For example, the LRP method described above can generate an LRP heatmap. Biomarkers with higher LRP values contribute more to biomarker prediction, and those with high contributions are considered potential biomarkers. The contribution magnitude can be comprehensively evaluated using a combined contribution value calculated through various deep learning visualization analysis methods. When the combined contribution value of a biomarker exceeds a set threshold, that biomarker can be considered a potential biomarker.
[0138] Convergence analysis primarily uses multimodal neurobiological signals before and after medication administration to predict the efficacy of psychiatric drugs. However, in actual clinical practice, subjects receive not only drug treatment but also non-drug treatments, such as physical therapy (TMS) or psychotherapy. Therefore, the same subject may have treatment data under different treatment methods. Convergence analysis of non-drug treatment data can further validate the neurobiological and clinical significance of biomarkers.
[0139] Specifically, convergence analysis includes: comparing data on predicted poor drug efficacy with actual poor non-drug treatment efficacy to identify commonalities; comparing data on predicted poor drug efficacy with actual good non-drug treatment efficacy to identify differences; comparing data on predicted good drug efficacy with actual poor non-drug treatment efficacy to identify differences; and comparing data on predicted good drug efficacy with actual good non-drug treatment efficacy to identify commonalities. For data on different treatment methods, the commonalities and differences identified through the above cross-analysis are more biologically and clinically significant.
[0140] As an optional implementation, such as Figure 9 As shown, deep learning networks and individual-level biomarkers mined using interpretable artificial intelligence methods can provide potentially powerful assistance to clinicians in the treatment of mental illnesses. In actual treatment, clinicians can replicate and analyze their real-time collected clinical data using biomarkers identified by deep learning networks and interpretable artificial intelligence methods. Furthermore, by combining the results of actual diagnosis and treatment, they can further validate, optimize, and refine the deep learning and deep regression models within the deep learning networks, achieving cross-validation.
[0141] The multimodal neurobiological signal processing method provided in this embodiment constructs a loss function to facilitate joint learning and optimization of the target fusion feature learning loss and regression loss, thereby improving the regression performance of the regression layer and ensuring its analytical accuracy. By quantifying the numerical range to divide it into several intervals and setting different prediction levels for different intervals, it can assist clinicians in effectively assessing biological representations. By applying interpretable artificial intelligence to the prediction results of biological signs to extract biomarkers, and by using replication analysis and convergence analysis to determine individualized biomarkers, it has significant neurobiological implications for the transferability and stability of biological features, and can provide potentially powerful assistance to clinicians dealing with mental illnesses.
[0142] Based on the aforementioned multimodal neurobiological signal processing methods, this embodiment designs a deep learning system for predicting the efficacy of medications for mental illnesses and analyzing their biometric features. For example... Figure 10 As shown, the deep learning system mainly includes: a clinical data processing module, a computational model building module, a biomarker analysis module, and a clinical application verification module.
[0143] The clinical data processing module is mainly responsible for the processing and analysis of clinical data, including data organization, cleaning and preprocessing.
[0144] The computational model building module is mainly responsible for the construction and training of deep learning networks for predicting drug treatment effects, including the design, construction, training and testing of deep learning models and supervised regression models based on deep learning networks, and quantitative evaluation of the prediction results of biological signs.
[0145] The biomarker parsing module is mainly responsible for the mining and parsing of individual-level biological characteristics. Here, based on the Explainable Artificial Intelligence (XAI) method, the prediction results of biological characteristics output by the trained deep learning network are visualized and analyzed to mine and parse individualized biological characteristics for drug efficacy response. The model and biological characteristics are replicated and convergent verified on datasets from different research sites and different types of datasets.
[0146] The clinical application validation module is primarily responsible for guiding and assisting psychiatrists in clinical treatment through individualized biomarkers, optimizing and clinically validating models and biomarkers to reveal response phenotypes to psychiatric drug treatment, separating psychiatric drug and placebo responses, promoting neurobiological understanding of the efficacy of psychiatric drugs, and providing preliminary evidence for potential treatment options.
[0147] It should be noted that the clinical data processing module, computational model building module, biomarker analysis module, and clinical application validation module of the aforementioned deep learning system can be integrated into a large-scale intelligent medical assistance and treatment system, or each module can be implemented independently. The clinical dataset, drug efficacy prediction deep network, and personalized biometrics of the aforementioned deep learning system can be stored on a dedicated server and accessed by other modules through local or cloud services, or integrated into the same system as other modules.
[0148] The deep learning system provided in this embodiment, through the construction of a deep learning network using objective biological information and big data mining, exhibits greater predictive power in assessing drug responses to mental illnesses. Furthermore, deep learning is applicable to automated analysis at the individual level, allowing models built upon deep learning networks to effectively support clinical treatment. Simultaneously, the personalized analysis method based on interpretable artificial intelligence is more accurate in mining the biomarkers of highly heterogeneous mental illnesses, providing generalizable and replicable biomarkers for clinical treatment. This assists clinicians in developing appropriate interventions, offering effective support for the clinical treatment of mental illnesses.
[0149] This embodiment also provides a multimodal neurobiological signal processing device for implementing the above embodiments and preferred embodiments; details already described will not be repeated. As used below, the term "module" can refer to a combination of software and / or hardware that performs a predetermined function. Although the device described in the following embodiments is preferably implemented in software, hardware implementation, or a combination of software and hardware, is also possible and contemplated.
[0150] This embodiment provides a multimodal neurobiological signal processing device that can be used in a deep learning network. The deep learning network includes a pre-built deep learning model and a deep regression model. The deep regression model includes a feature fusion layer, a fully connected layer, and a regression layer. For example... Figure 11 As shown, the multimodal neural biological signal processing device includes:
[0151] The acquisition module 41 is used to acquire the multimodal neurobiological signal to be processed, preprocess the multimodal neurobiological signal, and obtain the preprocessed multimodal neurobiological signal.
[0152] The feature extraction module 42 is used to input the preprocessed multimodal neurobiological signals into the deep learning model and extract the deep features of each modality of neurobiological signals based on the deep learning model.
[0153] The feature fusion module 43 is used to input multiple deep features into the feature fusion layer for feature fusion to obtain the target fused features.
[0154] The prediction module 44 is used to input the target fusion features into the regression layer after passing through the fully connected layer, and use the regression layer to predict the vital signs of the target fusion features to generate biological vital sign prediction results.
[0155] Optionally, the deep learning model may have multiple pre-built deep learning sub-models, and the feature extraction module 42 may include:
[0156] The type determination submodule is used to determine the type of multimodal neural biological signals.
[0157] The sub-model determination sub-module is used to determine the various deep learning sub-models corresponding to each type of neurobiological signal.
[0158] The feature extraction submodule is used to input various types of neurobiological signals into the corresponding deep learning sub-models to obtain deep features corresponding to each type of neurobiological signal.
[0159] Optionally, when the neurobiological signal is a first type of neurobiological signal, the first deep learning sub-model corresponding to the first type of neurobiological signal includes a model group composed of multiple first sub-models connected in series and a second sub-model.
[0160] Accordingly, the above-mentioned feature extraction submodule can be used to: input the first type of neurobiological signal into the second sub-model for feature extraction to obtain the first feature data; fuse the first feature data with the pre-configured position code to obtain fused data; and input the fused data into the model group for feature extraction to obtain the depth features corresponding to the first type of neurobiological signal.
[0161] Specifically, the first type of neurobiological signal is the electroencephalogram (EEG) signal, and the first sub-model includes the Transformer model; the second sub-model includes the convolutional neural network model. For example... Figure 5 As shown, the convolutional neural network includes convolutional layers and average pooling layers. The Transformer model includes an attention module and a feedforward network module. The attention module includes a multi-head attention layer and a first normalization layer, and the feedforward network module includes a fully connected feedforward layer and a second normalization layer; the multi-head attention layer is connected to the first normalization layer; the fully connected feedforward layer is connected to the second normalization layer; and the first normalization layer is connected to the fully connected feedforward layer.
[0162] Optionally, when the neurobiological signal is a second type of neurobiological signal, the second deep learning sub-model corresponding to the second type of neurobiological signal includes a model group composed of multiple first sub-models connected in series and a third sub-model. The third sub-model includes a convolutional layer, a max pooling layer and a fully connected layer.
[0163] Accordingly, the aforementioned feature extraction submodule can be used to: input the second type of neurobiological signal into the convolutional layer of the third sub-model for multidimensional convolution processing to obtain the convolution processing result; input the convolution processing result into the model group for feature extraction, and output the depth features corresponding to the second type of neurobiological signal through the max pooling layer and the fully connected layer.
[0164] Specifically, the second type of neurobiological signal is functional magnetic resonance imaging (fMRI) signal, and the first sub-model includes the Transformer model; the third sub-model includes the point 4D convolutional network model. For example... Figure 6As shown, the point 4D convolutional network model includes point 4D convolutional layers, max pooling layers, and fully connected layers; a model group consisting of multiple Transformer models connected in series is set between the point 4D convolutional layers and the max pooling layers; the Transformer model includes an attention module and a feedforward network module; the attention module includes a multi-head attention layer and a first normalization layer; the feedforward network module includes a fully connected feedforward layer and a second normalization layer; the multi-head attention layer is connected to the first normalization layer; the fully connected feedforward layer is connected to the second normalization layer; and the first normalization layer is connected to the fully connected feedforward layer.
[0165] Optionally, the above-mentioned multimodal neurobiological signal processing device may further include:
[0166] The data type acquisition module is used to obtain the data type corresponding to the target fusion feature.
[0167] The loss function building module is used to construct loss functions corresponding to data types.
[0168] The optimization module is used to optimize the parameters of the deep regression network using a loss function. This regression layer is deployed in the deep regression network.
[0169] Specifically, when the data type is continuous, the loss function construction module described above is used to construct a loss function based on data error parameters. These data error parameters include mean absolute error, root mean square error, and median absolute error. The constructed loss function is as follows:
[0170] Loss=α*RMSE+β*MAE+μ*MedAE+λ*||W||;
[0171] Where Loss represents the loss function; RMSE represents the root mean square error; MAE represents the mean absolute error; MedAE represents the median absolute error; ||W|| represents the regularization term; and α, β, μ, and λ represent the network parameters.
[0172] Optionally, the above-mentioned multimodal neurobiological signal processing device may further include:
[0173] The numerical range acquisition module is used to obtain the numerical range of the predicted biological signs.
[0174] The quantization module is used to quantize the numerical range, dividing the numerical range into several intervals and obtaining the prediction level corresponding to the several intervals.
[0175] Optionally, the above-mentioned multimodal neurobiological signal processing device may further include:
[0176] The biometric extraction module is used to extract biometric features from biometric prediction results based on a preset interpretable artificial intelligence method.
[0177] The analysis module is used to perform replication and convergence analysis on biometrics to identify individual biomarkers from the biometrics.
[0178] In this embodiment, the multimodal neurobiological signal processing device is presented in the form of a functional unit. Here, a unit refers to an ASIC circuit, a processor and memory that execute one or more software or fixed programs, and / or other devices that can provide the above functions.
[0179] The further functional descriptions of the above modules and sub-modules are the same as those in the corresponding embodiments described above, and will not be repeated here.
[0180] The multimodal neurobiological signal processing device provided in this embodiment employs a deep learning network to extract features from multimodal neurobiological signals and fuse multiple deep features. It then uses the fused target features to predict biosignatures, obtaining the predicted biosignature results. Therefore, this device can fuse multiple deep features based on objective multimodal neurobiological signals, achieving effective capture of various deep features without requiring manual feature design or selection. Furthermore, the deep learning network has a multi-level nonlinear structure, effectively analyzing the nonlinear relationships present in neurobiological signals, thereby extracting effective biosignatures from multiple deep features and achieving effective prediction of the biological representation of neural signals, improving the accuracy of biological representation prediction.
[0181] This invention also provides a server, which can be a server, a computer, etc., and the server has the above-described features. Figure 11 The multimodal neurobiological signal processing device shown.
[0182] Please see Figure 12 , Figure 12 This is a schematic diagram of the structure of a server provided in an optional embodiment of the present invention, such as... Figure 12As shown, the server may include: at least one processor 501, such as a central processing unit (CPU), at least one communication interface 503, memory 504, and at least one communication bus 502. The communication bus 502 is used to implement communication between these components. The communication interface 503 may include a display screen and a keyboard; optionally, the communication interface 503 may also include a standard wired interface or a wireless interface. The memory 504 may be high-speed volatile random access memory (RAM) or non-volatile memory, such as at least one disk storage device. Optionally, the memory 504 may also be at least one storage device located remotely from the aforementioned processor 501. The processor 501 may be combined with... Figure 11 The described apparatus has an application program stored in memory 504, and a processor 501 calls the program code stored in memory 504 to perform any of the above method steps.
[0183] The communication bus 502 can be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The communication bus 502 can be divided into an address bus, a data bus, a control bus, etc. For ease of representation, Figure 12 The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.
[0184] The memory 504 may include volatile memory, such as random-access memory (RAM); the memory may also include non-volatile memory, such as flash memory, hard disk drive (HDD) or solid-state drive (SSD); the memory 504 may also include a combination of the above types of memory.
[0185] The processor 501 can be a central processing unit (CPU), a network processor (NP), or a combination of a CPU and an NP.
[0186] The processor 501 may further include a hardware chip. This hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof. The PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL), or any combination thereof.
[0187] Optionally, the memory 504 is also used to store program instructions. The processor 501 can invoke the program instructions to implement the multimodal neurobiological signal processing method as shown in the above embodiments of this application.
[0188] This invention also provides a non-transitory computer storage medium storing computer-executable instructions that can execute the multimodal neurobiological signal processing method described in any of the above method embodiments. The storage medium can be a magnetic disk, optical disk, read-only memory (ROM), random access memory (RAM), flash memory, hard disk drive (HDD), or solid-state drive (SSD), etc.; the storage medium may also include combinations of the above types of memory.
[0189] Although embodiments of the invention have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations all fall within the scope defined by the appended claims.
Claims
1. A multimodal neural biological signal processing method, characterized in that, Used in deep learning networks, the deep learning network includes a pre-built deep learning model and a deep regression model. The deep regression model includes a feature fusion layer, a fully connected layer, and a regression layer. The deep learning model contains multiple deep learning sub-models, each used for different types of neurobiological signals. Each deep learning sub-model is pre-trained based on corresponding neurobiological signal samples. The method includes: The process involves acquiring multimodal neurobiological signals to be processed, preprocessing the multimodal neurobiological signals to obtain preprocessed multimodal neurobiological signals, which are various types of neurobiological signals collected from individuals with mental illnesses, including electroencephalogram (EEG) signals, functional magnetic resonance imaging (fMRI) signals, and behavioral scale data. The preprocessed multimodal neurobiological signals are input into the deep learning model, and the deep features of each modality of neurobiological signals are extracted based on the deep learning model, including: determining the type of multimodal neurobiological signals; determining each deep learning sub-model corresponding to each type of neurobiological signal; and inputting each type of neurobiological signal into the corresponding deep learning sub-model to obtain the deep features corresponding to each type of neurobiological signal. Multiple deep features are input into the feature fusion layer for feature fusion to obtain the target fused features; The target fusion features are input into the regression layer after passing through the fully connected layer. The regression layer is used to predict the vital signs of the target fusion features, generating biological sign prediction results. The biological sign prediction results are used to characterize the prediction results before and after drug treatment for mental illness. Biological features are extracted from the biometric prediction results based on a pre-defined interpretable artificial intelligence method. Replication and convergence analyses are performed on the biometrics to identify individual biomarkers.
2. The method according to claim 1, characterized in that, When the neurobiological signal is a first type of neurobiological signal, the first deep learning sub-model corresponding to the first type of neurobiological signal includes a model group composed of multiple first sub-models connected in series and a second sub-model; the first type of neurobiological signal is input into the first deep learning sub-model to obtain deep features corresponding to the first type of neurobiological signal, including: The first type of neurobiological signal is input into the second sub-model for feature extraction to obtain the first feature data; The first feature data is fused with the pre-configured location code to obtain fused data; The fused data is input into the model group for feature extraction to obtain the deep features corresponding to the first type of neurobiological signal.
3. The method according to claim 2, characterized in that, The first type of neurobiological signal is an electroencephalogram (EEG) signal; the first sub-model includes a Transformer model; the second sub-model includes a convolutional neural network model. The convolutional neural network includes convolutional layers and average pooling layers; The Transformer model includes an attention module and a feedforward network module; the attention module includes a multi-head attention layer and a first normalization layer; the feedforward network module includes a fully connected feedforward layer and a second normalization layer; the multi-head attention layer is connected to the first normalization layer; the fully connected feedforward layer is connected to the second normalization layer; the first normalization layer is connected to the fully connected feedforward layer; The attention module and the feedforward network module have corresponding shortcut connections.
4. The method according to claim 1, characterized in that, When the neurobiological signal is a second type of neurobiological signal, the second deep learning sub-model corresponding to the second type of neurobiological signal includes a model group composed of multiple first sub-models connected in series and a third sub-model. The third sub-model includes convolutional layers, max pooling layers, and fully connected layers. The second type of neurobiological signal is input into the second deep learning sub-model to obtain deep features corresponding to the second type of neurobiological signal, including: The second type of neurobiological signal is input into the convolutional layer of the third sub-model for multidimensional convolution processing to obtain the convolution processing result; The convolution processing result is input into the model group for feature extraction, and after passing through the max pooling layer and the fully connected layer, the deep features corresponding to the second type of neurobiological signal are output.
5. The method according to claim 4, characterized in that, The second type of neurobiological signal is a functional magnetic resonance imaging (fMRI) signal; the first sub-model includes a Transformer model; the third sub-model includes a point-based 4D convolutional network model. The point 4D convolutional network model includes point 4D convolutional layers, max pooling layers, and fully connected layers. A model group consisting of multiple Transformer models connected in series is set between the point 4D convolutional layer and the max pooling layer; The Transformer model includes an attention module and a feedforward network module; the attention module includes a multi-head attention layer and a first normalization layer; the feedforward network module includes a fully connected feedforward layer and a second normalization layer; the multi-head attention layer is connected to the first normalization layer; the fully connected feedforward layer is connected to the second normalization layer; the first normalization layer is connected to the fully connected feedforward layer; The attention module and the feedforward network module have corresponding shortcut connections.
6. The method according to claim 1, characterized in that, Before the step of inputting the target fusion features into the regression layer via the fully connected layer, and using the regression layer to predict the vital signs of the target fusion features to generate biological vital sign prediction results, the method further includes: Obtain the data type corresponding to the target fusion feature; Construct a loss function corresponding to the data type; The parameters of the deep regression network are optimized using the loss function, and the regression layer is deployed in the deep regression network.
7. The method according to claim 6, characterized in that, The construction of the loss function corresponding to the data type includes: When the data type is continuous, a loss function is constructed based on the data error parameters; The data error parameters include mean absolute error, root mean square error, and median absolute error, and the loss function is: Loss = α * RMSE + β* MAE + μ * MedAE + λ * ||W||; Where Loss represents the loss function; RMSE represents the root mean square error; MAE represents the mean absolute error; MedAE represents the median absolute error; ||W|| represents the regularization term; and α, β, μ, and λ represent the network parameters.
8. The method according to claim 1, characterized in that, Also includes: Obtain the numerical range of the predicted biological signs; The numerical range is quantized and divided into several intervals to obtain the prediction level corresponding to the several intervals.
9. A multimodal neural biological signal processing device, characterized in that, Used in deep learning networks, the deep learning network includes a pre-built deep learning model and a deep regression model. The deep regression model includes a feature fusion layer, a fully connected layer, and a regression layer. The deep learning model contains multiple deep learning sub-models, each used for different types of neurobiological signals. Each deep learning sub-model is pre-trained based on corresponding neurobiological signal samples. The device includes: The acquisition module is used to acquire the multimodal neurobiological signals to be processed, preprocess the multimodal neurobiological signals to obtain preprocessed multimodal neurobiological signals; the multimodal neurobiological signals are various types of neurobiological signals collected from individuals with mental illnesses; the multimodal neurobiological signals include electroencephalogram (EEG) signals, functional magnetic resonance imaging (fMRI) signals, and behavioral scale data; The feature extraction module is used to input the preprocessed multimodal neurobiological signals into the deep learning model, and extract the deep features of each modality of neurobiological signals in the multimodal neurobiological signals based on the deep learning model; The feature fusion module is used to input multiple deep features into the feature fusion layer for feature fusion to obtain the target fused features; The prediction module is used to input the target fusion features into the regression layer after passing through the fully connected layer, and use the regression layer to predict the vital signs of the target fusion features to generate biological sign prediction results; the biological sign prediction results are used to characterize the prediction results before and after drug treatment for mental illness. A biometric feature extraction module is used to extract biometric features from the biometric prediction results based on a preset interpretable artificial intelligence method. An analysis module is used to perform replication analysis and convergence analysis on the biometrics to determine individual biomarkers from the biometrics; The feature extraction module includes: The type determination submodule is used to determine the type of multimodal neural biological signals; The sub-model determination sub-module is used to determine the various deep learning sub-models corresponding to each type of neurobiological signal; The feature extraction submodule is used to input the various types of neurobiological signals into the corresponding deep learning sub-models to obtain deep features corresponding to the various types of neurobiological signals.
10. A server, characterized in that, include: A memory and a processor are communicatively connected, the memory stores computer instructions, and the processor executes the computer instructions to perform the multimodal neurobiological signal processing method according to any one of claims 1-8.
11. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions for causing a computer to perform the multimodal neurobiological signal processing method according to any one of claims 1-8.