Deep learning-based brain connectivity signal recognition methods
By improving the Res_ICML model and combining multi-scale convolution and channel attention mechanisms, the accuracy problem of brain functional connectivity signal recognition was solved, achieving efficient recognition of brain functional connectivity signals in patients with depression, and improving the recognition success rate and the model's generalization ability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SICHUAN UNIV
- Filing Date
- 2022-07-13
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies struggle to effectively identify abnormal patterns in brain functional connectivity signals, making it impossible to accurately assess the brain functional connectivity signals of patients with depression.
We employ the Res_ICML model, an improvement upon the ResNet model, and combine z-score normalization, random dropout, multi-scale convolution, channel attention mechanism, and early termination condition to preprocess and extract features from brain connectivity data using deep learning methods, thereby generating the probability of patients with depression.
It improved the success rate of brain connectivity signal recognition, enhanced the model's accuracy and generalization ability for brain functional connectivity signals, and reduced overfitting.
Smart Images

Figure CN115392283B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data recognition, specifically a brain connectivity signal recognition method based on deep learning. Background Technology
[0002] With the development of deep learning theory and the improvement of machine performance, deep learning has demonstrated powerful learning capabilities and has been widely applied. In the field of image classification, deep learning has even surpassed human performance. In the field of speech recognition, deep learning has been able to recognize human language and make corresponding feedback[8]. In the field of machine translation, the performance of deep learning is comparable to that of humans. The development of deep learning theory and the improvement of machine performance have led to the extensive research on deep learning and its application in various fields. In addition to the fields of image and natural language, deep learning is also widely used in fields such as medicine, transportation, and security. Brain functional connectivity signals are the connectivity signals of the activities of different parts of the brain. The different parts of the brain are not isolated brain regions, but are closely connected and work together to affect human cognition and activities. Some studies in modern brain science have shown that higher cognitive functions in the brain depend on the joint action of various brain regions, rather than being completed by a single brain region. When there are certain abnormalities in the connections between various brain regions, it will affect human mental and behavioral functions and make it impossible to complete normal cognitive functions. Therefore, how to identify brain functional connectivity data is a topic that researchers need to study. Summary of the Invention
[0003] The purpose of this invention is to overcome the shortcomings of the prior art and provide a brain connectivity signal recognition method based on deep learning, comprising the following steps:
[0004] Step 1: Preprocess the brain connectivity dataset using z-score normalization and random dropout to obtain the preprocessed dataset, which serves as the training set. Then, normalize the brain connectivity dataset using z-score to generate the validation set.
[0005] Step 2: Improve the ResNet model to obtain the Res_ICML model, input the training set into the Res_ICML model, train the Res_ICML model, and after a set number of training rounds, proceed to Step 3;
[0006] Step 3: Validate the trained Res_ICML model using the validation set. If the early termination condition is met, proceed to Step 4. If not, determine whether the set maximum number of iterations has been reached. If yes, proceed to Step 4. Otherwise, return to Step 2 to continue training.
[0007] Step 4: End the training and save the completed Res_ICML model. Input the obtained brain functional connectivity signal data into the completed Res_ICML model to obtain the probability that the brain functional connectivity signal data belongs to a patient with depression.
[0008] Furthermore, the brain connectivity dataset is preprocessed using z-score normalization and random dropout to obtain a preprocessed dataset, which serves as the training set. This process includes the following steps:
[0009] The formula for z-score standardization is:
[0010] x*=x–μσ
[0011] Where μ is the mean of the data, σ is the standard deviation of the data, x is the original data, and x* is the obtained original data.
[0012] Standardized data;
[0013] The random discarding strategy is as follows: by randomly discarding elements, each element in the brain connectivity dataset has a probability of p that it does not function.
[0014] Furthermore, the aforementioned improvement of the ResNet model to obtain the Res_ICML model includes:
[0015] The brain functional connectivity signal data first goes through a two-dimensional convolutional module, then uses an improved Inception module for feature extraction to generate feature maps. After generating the feature maps, a channel attention mechanism module is used to assign weights to each feature map and continue convolution operations. Residual connections are used to increase the network depth and avoid network degradation. After continuous convolution operations, X 1x1 feature maps are finally obtained, each with a global receptive field. Finally, two fully connected layers are used, and the value p is obtained through the softmax activation function.
[0016] Furthermore, the early termination condition is: when the loss of the model on the validation set no longer decreases, it is an early termination condition.
[0017] Furthermore, the process of using the improved Inception module for feature extraction and generating a feature map includes the following steps:
[0018] Based on the Inception module structure, two convolutional kernels of different sizes were added to extract features. A 3x1 convolutional kernel was used, corresponding to the horizontal coordinate in the two-dimensional matrix of brain functional connectivity signals. A larger 7x7 convolutional kernel was introduced to capture effective features within a certain time and range. The number of parameters was reduced by decreasing the number of larger convolutional kernels.
[0019] Furthermore, the channel attention mechanism module is as follows: In the SE module, two fully connected layers are used to generate weights for the feature maps. On this basis, another fully connected layer is added to learn the weights of each feature map. At the same time, dropout is used in the fully connected layers of the SE module to improve performance.
[0020] The beneficial effect of this invention is that the Res_ICML model proposed in this invention can improve the success rate of brain connectivity data recognition in the task of discriminating brain functional connectivity signals. Attached Figure Description
[0021] Figure 1 This is a flowchart illustrating a deep learning-based brain connectivity signal recognition method.
[0022] Figure 2 This is a schematic diagram of model training;
[0023] Figure 3 Inception module diagram;
[0024] Figure 4 This is a schematic diagram of the basic structure of an SE block;
[0025] Figure 5 A schematic diagram of the improved SE module;
[0026] Figure 6 This is a schematic diagram of the Res_ICML model structure;
[0027] Figure 7 This is a schematic diagram for solving the problem. Detailed Implementation
[0028] The technical solution of the present invention will be further described in detail below with reference to the accompanying drawings, but the scope of protection of the present invention is not limited to the following description.
[0029] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention; that is, the described embodiments are only a part of the embodiments of the invention, and not all of them. The components of the embodiments of the invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations.
[0030] Therefore, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected embodiments of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of the invention without inventive effort are within the scope of protection of the invention. It should be noted that relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations.
[0031] Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0032] The features and performance of the present invention will be further described in detail below with reference to embodiments.
[0033] like Figure 1 As shown, the brain connectivity signal recognition method based on deep learning includes the following steps:
[0034] Step 1: Preprocess the brain connectivity dataset using z-score normalization and random dropout to obtain the preprocessed dataset, which serves as the training set. Then, normalize the brain connectivity dataset using z-score to generate the validation set.
[0035] Step 2: Improve the ResNet model to obtain the Res_ICML model, input the training set into the Res_ICML model, train the Res_ICML model, and after a set number of training rounds, proceed to Step 3;
[0036] Step 3: Validate the trained Res_ICML model using the validation set. If the early termination condition is met, proceed to Step 4. If not, determine whether the set maximum number of iterations has been reached. If yes, proceed to Step 4. Otherwise, return to Step 2 to continue training.
[0037] Step 4: End the training and save the completed Res_ICML model. Input the obtained brain functional connectivity signal data into the completed Res_ICML model to obtain the probability that the brain functional connectivity signal data belongs to a patient with depression.
[0038] The brain connectivity dataset is preprocessed using z-score normalization and random dropout to obtain a preprocessed dataset, which serves as the training set. The process includes the following steps:
[0039] The formula for z-score standardization is:
[0040] x*=x–μσ
[0041] Where μ is the mean of the data, σ is the standard deviation of the data, x is the original data, and x* is the obtained original data.
[0042] Standardized data;
[0043] The random discarding strategy is as follows: by randomly discarding elements, each element in the brain connectivity dataset has a probability of p that it does not function.
[0044] The aforementioned Res_ICML model, derived from the ResNet model improvement, includes:
[0045] The brain functional connectivity signal data first goes through a two-dimensional convolutional module, then uses an improved Inception module for feature extraction to generate feature maps. After generating the feature maps, a channel attention mechanism module is used to assign weights to each feature map and continue convolution operations. Residual connections are used to increase the network depth and avoid network degradation. After continuous convolution operations, X 1x1 feature maps are finally obtained, each with a global receptive field. Finally, two fully connected layers are used, and the value p is obtained through the softmax activation function.
[0046] The early termination condition is defined as follows: when the loss of the model on the validation set no longer decreases, it is considered an early termination condition.
[0047] The process of using the improved Inception module for feature extraction and generating a feature map includes the following steps:
[0048] Based on the Inception module structure, two convolutional kernels of different sizes were added to extract features. A 3x1 convolutional kernel was used, corresponding to the horizontal coordinate in the two-dimensional matrix of brain functional connectivity signals. A larger 7x7 convolutional kernel was introduced to capture effective features within a certain time and range. The number of parameters was reduced by decreasing the number of larger convolutional kernels.
[0049] The channel attention mechanism module is as follows: In the SE module, two fully connected layers are used to generate weights for the feature maps. On this basis, another fully connected layer is added to learn the weights of each feature map. At the same time, dropout is used in the fully connected layers of the SE module to improve performance.
[0050] The brain functional connectivity signals mentioned above are connectivity signals for the activities of different parts of the brain region.
[0051] Specifically, this approach employs multi-scale convolutional operations to enhance the model's expressive power and accuracy. It refines the Inception module to suit the dataset used in this paper. Considering the ResNet network model as a foundation, which already includes pooling layers, the pooling layers in Inception are unnecessary, thus removing them to reduce the number of parameters. To extract as many useful features as possible, more convolutional kernels of varying sizes are added to the Inception module for feature extraction.
[0052] This approach adds 3x1 and 7x7 convolutional kernels to the existing Inception module for convolution operations. By adding two extra kernels of different sizes to the original Inception structure, more useful features are extracted. Compared to the commonly used square convolutional kernel, this approach uses a 3x1 kernel because, due to the characteristics of brain functional connectivity signals, it focuses more on the relationships between different brain functional connectivity signals, i.e., the horizontal coordinates in the two-dimensional matrix of brain functional connectivity signals. In addition, to capture as much effective information as possible, a larger 7x7 convolutional kernel is introduced to capture effective features within a certain time and range. However, this introduces additional parameters to the model. To further optimize the structure, this paper reduces the number of larger convolutional kernels to reduce the number of parameters. Smaller convolutional kernels, such as 3x1 and 3x3 kernels, generate more feature maps, while larger kernels generate relatively fewer feature maps. The Inception module used in this paper, such as... Figure 3 As shown.
[0053] This approach replaces some convolutional layers with Inception modules on top of the ResNet model. On one hand, compared to the original ResNet structure, the improved Inception module can extract more effective features by using multiple convolutional kernels of different sizes, thus improving model accuracy. On the other hand, by retaining some convolutional layers from the ResNet network structure instead of replacing all convolutional layers with Inception modules, the number of parameters is reduced to some extent, thus shortening the training time.
[0054] After a convolutional network generates multiple feature maps through convolution operations, the model analyzes these feature maps. Typically, each feature map has the same impact on the result, meaning they all have the same weight. However, in reality, some feature maps may have a greater impact than others. This approach aims to encourage the model to focus on higher-value feature maps while ignoring lower-value ones. Building upon the ResNet model, a channel attention mechanism module is used to assign different weights to feature maps, allowing the model to focus more on valuable information during training and thus improving accuracy to some extent. The basic structure of the SE block is shown below. Figure 4 As shown.
[0055] After the normal convolution operation (i.e.) Figure 4 The Ftr operation in the algorithm yields C feature maps of length W and height H. Weights need to be learned for each feature map. First, a Squeeze operation (i.e., ...) is performed. Figure 4 The Fsq(·) operation in the code, through the Squeeze operation, yields a real number, which is equivalent to performing a global pooling operation on the two-dimensional feature map. This real number has the global receptive field of the corresponding feature map. After the Squeeze operation, C 1x1 feature maps are obtained. Then, the Excitation operation (i.e., ...) is performed. Figure 4 In The operation generates weights for each feature channel using a parameter w. w is not set manually but participates in the training of the model, and through learning, it explicitly represents the correlation between feature channels. The operation is generally implemented using fully connected layers. The first layer is a fully connected layer with input C and output C / r. The second layer is also a fully connected layer with input C / r and output C, where r is a hyperparameter, typically chosen to be 16. A sigmoid function is added after the last fully connected layer to obtain the weights of each feature channel. This channel attention mechanism, assigning weights to each channel, allows the deep neural network to focus more on features with higher weights during learning, thus enhancing the discriminability of the features. The operation multiplies the weights obtained in the previous step by the C feature channels, resulting in C weighted feature channels with height H and width W. The SE module can be embedded in other deep neural networks, and performance improvements can be achieved by adding only a small number of parameters. SE modules can be easily added to Inception and ResNet networks.
[0056] This approach uses the SE module to generate weights for feature maps in the model, allowing the model to focus more on higher-value feature maps and improve its expressive power. There is still room for optimization in the SE module. The hyperparameter *r* mentioned above is a dimensionality reduction parameter, typically chosen as 16. This is because if *r* is too large, the number of output hidden neurons (C / r) will be too small, resulting in weaker expressive power; conversely, if *r* is too small, the number of output hidden units (C / r) will be too large, leading to an increase in the number of parameters. In this paper, the model prioritizes accuracy over training speed, thus to some extent, a larger number of parameters can be tolerated. A smaller *r* value, 12, is chosen to allow the SE module to learn the weights better. The original SE module uses two fully connected layers to generate weights for the feature maps; this approach adds another fully connected layer to better learn the weights of each feature map. Dropout is also used in the fully connected layers of the SE module to improve performance. The improved SE module is shown below. Figure 5 As shown.
[0057] Generally, after using residual networks to address the training difficulties of overly deep models, deeper models tend to have better performance. This approach, building upon the ResNet model, further increases network depth using residual structures to maintain model complexity and extract more abstract features. In using convolutional neural networks to solve image processing problems, there's a priori understanding that pixels in an image are often closely related to their nearest neighbors. However, when two pixels are far apart, in extreme cases, the top-left and bottom-right pixels in an image may have almost no connection. Therefore, convolutional neural networks often don't require excessively large receptive fields for image processing.
[0058] In the dataset used in this approach, each point represents the brain functional connectivity signal between two brain regions at a certain time. Due to the unknown nature of the brain functional connectivity signals, it is impossible to determine which brain functional connectivity signals have a significant impact on the classification results, while others have almost no impact. In other words, some experience in image processing is not applicable in this paper. This paper employs convolutional kernels of various scales for feature extraction to capture more effective features. Based on this, the receptive field of the model is further expanded so that the final feature map has a global receptive field, thereby capturing more relationships between brain functional connectivity signals and enabling the model to make more accurate judgments. A common operation is global pooling; however, this loses effective information in the feature map. This paper uses convolutional layers instead of pooling layers on top of the residual network. Multiple convolutional kernels with a stride of 2 are used for convolution operations, each convolution operation reducing the feature map size to half of the original feature map size. Multiple downsampling convolution operations are performed until the feature map size is 1x1, resulting in n 1x1 feature maps. Each feature map has a global receptive field.
[0059] Ultimately, each feature and each initial point will be integrated with information from all other coordinate points, as shown in the following formula:
[0060]
[0061] Where W represents the final acquired information, t represents the position of the feature map, i and j are the x and y coordinates in the matrix, m is the width of the input matrix, and n is the length of the input matrix. In the final 1x1 feature matrix, not only does each element of the input matrix participate in the decision-making process, but the relationships between any points are also taken into account. Therefore, more useful features can be extracted, which helps to improve the results of the network model.
[0062] This paper improves upon the ResNet network model by refining its structure, ultimately proposing the Res_ICML model. A schematic diagram of the model structure is shown below. Figure 6 As shown, brain functional connectivity signal data first undergoes a standard two-dimensional convolutional module, followed by multi-scale convolution (i.e., the Inception module) to extract more effective features. After generating feature maps, a channel attention mechanism module is used to assign weights to each feature map to improve training performance. The channel attention mechanism module used in this paper is as follows: Figure 5As shown, convolutional operations continue. The Res_ICML model is an improvement on the ResNet model, so residual connections are used to increase network depth and avoid network degradation. After continuous convolutional operations, X 1x1 feature maps are finally obtained, each with a global receptive field. Finally, two fully connected layers are used, and the softmax activation function is applied to obtain a value p, where p is the probability that the data belongs to a patient with depression.
[0063] The problem studied in this paper, namely determining whether a patient is a patient with depression based on brain functional connectivity signals, is a typical binary classification problem. The cross-entropy loss function is commonly used in classification problems and has achieved good results; therefore, this paper also adopts the cross-entropy loss function. Currently, the commonly used activation functions for classification problems are the sigmoid activation function and the softmax activation function. This paper uses experiments to verify the selection of a suitable activation function to use with the cross-entropy loss function. Using the VGG16 model, ResNet model, and AlexNet model as experimental models, after experiments with different activation functions, it was found that simply changing the activation layer function still affects the results. This approach found that using the softmax activation function in the output layer of the neural network achieves the best results. Ultimately, this approach selects the softmax activation function as the output layer and the cross-entropy loss function as the model's loss function.
[0064] Commonly used gradient descent algorithms include stochastic gradient descent, batch gradient descent, and mini-batch gradient descent. The differences between these three have been explained above and will not be repeated here. Taking batch gradient descent as an example, let's assume a general linear regression function is:
[0065]
[0066] The corresponding loss function is:
[0067]
[0068] The model needs to minimize the loss function. After initializing the parameters, the weight parameters θ need to be continuously updated to reduce the model loss until the requirement is met. The update is performed using the initialized weights θ, as shown below:
[0069]
[0070] Where θj represents the model parameters in the j-th round, and a represents the learning rate, i.e., the step size in each learning iteration. To update the weights θ, the partial derivative of the function J needs to be calculated. When there is only one data point (x, y), the corresponding partial derivative of the function J is:
[0071]
[0072] Based on this, taking the partial derivative with respect to all data points, the partial derivative of the loss function is:
[0073]
[0074] When minimizing the loss function, the weights θ are continuously updated so that each update reduces the loss. The process of updating the weights θ in each round is as follows:
[0075]
[0076] As can be seen from the weight update formula, in batch stochastic gradient descent for updating model parameters, each update uses all training data. Since all samples participate in training with each update, the training speed decreases significantly when the number of samples is too large. Furthermore, the performance requirements of the machine used to train the model increase with the increase in samples. Based on batch gradient descent, stochastic gradient descent was proposed. It uses the partial derivative of the loss function of a single sample with respect to the weights θ to obtain the descent gradient, and then uses this gradient to update the weights θ, as shown in the following equation:
[0077]
[0078] While stochastic gradient descent can significantly improve training speed and reduce the required training machine compared to batch gradient descent, it also has a drawback. Compared to batch gradient descent, stochastic gradient descent updates model parameters using only a single sample, causing the model to update parameters in the direction of minimizing the loss of that sample. However, it cannot guarantee that the updated parameters will reduce the loss of the used samples. In fact, after the (j+1)th iteration, the model loss may even increase compared to the jth iteration. Furthermore, it cannot guarantee that stochastic gradient descent will eventually converge to the global optimum, i.e., minimize the sum of losses of all training samples.
[0079] Both batch gradient descent and stochastic gradient descent have their own advantages and disadvantages. Based on these two algorithms, mini-batch gradient descent
[58] was proposed. Mini-batch gradient descent achieves a balance between stochastic gradient descent and batch gradient descent, taking into account the convergence speed during training. Compared with stochastic gradient descent, it also has higher model accuracy and robustness. Each time the parameters are updated, neither all training samples nor a single sample from the training samples are used, but a portion of the training samples are used. Assuming that 10 samples are used when updating the model parameters, the corresponding weight parameter update formula is:
[0080]
[0081] Based on three gradient descent algorithms, this paper selects the mini-batch gradient descent algorithm with momentum decay to accelerate learning speed and improve learning accuracy. Using the momentum-driven mini-batch gradient descent algorithm offers the following advantages: 1. It accelerates model parameter convergence, allowing convergence in a shorter time. 2. It makes parameter convergence more stable. Compared to mini-batch gradient descent without momentum, using momentum, by fusing multiple gradients, makes model convergence more stable and leads towards the optimal solution. When updating weights using stochastic gradients, the update formula can be expressed as follows:
[0082]
[0083] After introducing momentum, its weight update formula can be expressed as follows:
[0084]
[0085] Here, β is a hyperparameter, and researchers choose an appropriate value according to their needs. Through multiple experiments, researchers found that a default setting of 0.9 generally yields good convergence speed and robustness. As shown in the above equation, the convergence direction is related not only to the current gradient but also to the directions of previous descent gradients. When the two gradient descent directions are the same, the convergence speed can be accelerated. Furthermore, using gradient descent with momentum can, to some extent, avoid getting trapped in local optima and reach the global optimum.
[0086] In deep learning, a common problem is overfitting. Due to problems such as excessively large models, overly complex network structures, and insufficient training samples, the network model may perform well on the training dataset but poorly on the test dataset, which is overfitting. This paper uses L2 regularization, Dropout, and early termination to prevent overfitting. The L2 regularization loss function
[61] is obtained by adding the sum of squares of the weight parameters to the original loss function:
[0087]
[0088] Where C is the new loss function, C0 is the original loss function, and ∑ w w 2Let w be the sum of squares of all parameters in the model, and λ be the weight decay coefficient. From the above formula, we can see that the magnitude of the model parameter values also has a certain impact on the result of the loss function. Simply put, under the same error, the larger the absolute value of the model parameter w, the greater its loss with L2 regularization. Generally, the lower the complexity of the network model, the smaller the value of the model parameter w, the lower the loss with L2 regularization, the better the fit to the data, and the stronger the generalization ability. The following provides a more detailed explanation of L2 regularization. Considering the explanation of optimization with constraints, L2 regularization is defined as:
[0089] minJ(w; X, y)
[0090] Where X represents the training samples, y represents the corresponding labels, and w represents the model parameters. Adding an L2 regularization term to the model requires satisfying the constraint st||w||2≤C. A schematic diagram illustrating the solution to this problem is shown below. Figure 7 As shown.
[0091] like Figure 7 As shown in the figure, the ellipse is a contour line of the original objective function C0, and the circle is the L2 norm sphere with radius λ. Due to the constraints, w must be located within the L2 norm sphere. Consider a point w on the boundary. At this point, the gradient direction of w is Ein in the figure, and the normal direction is the normal direction of the norm sphere at that point. Since w cannot leave the boundary (otherwise it would violate the constraints), when updating w using the gradient descent algorithm, it can only be updated in the direction of the tangent to w. As w is continuously updated, the optimal solution is eventually reached at w*. Furthermore, as shown in the figure, the larger the L2 regularization coefficient λ, the larger the L2 regularization coefficient, the larger the corresponding circle in the figure, and the greater the influence on the update direction of the original objective function, tending towards smaller model parameters. In practical applications, excessively large or small weight coefficients λ will lead to a decrease in the expressive power of the model; therefore, choosing an appropriate weight coefficient λ is necessary.
[0092] Another commonly used technique to prevent overfitting is dropout, which is also employed in this paper to mitigate overfitting to some extent. In training deep neural networks, neurons are randomly dropped with a probability of p, effectively setting them to 0. This renders the neuron ineffective in that round of training, essentially eliminating its function. By using dropout, only a subset of neurons are trained in each iteration, reducing the complexity of the network model and preventing overfitting. Early termination is a regularization strategy that can be used alone in deep neural networks or in combination with other regularization strategies to reduce overfitting. As training epochs increase, the model performs better on the training set, with the loss decreasing. However, on the validation set, after the loss reaches its minimum, it may increase again as the network continues training due to overfitting. Early termination can be used to prevent this.
[0093] When using early termination, the current model (network structure and parameter values) must first be saved. After training for one epoch, a new model is obtained. The validation set is used as input to the new model for testing. If the model finds that the loss on the validation set in this epoch is greater than the previous loss, training will not stop immediately. Considering the existence of randomness, the model continues to train and validate. If the validation set error still does not decrease, then the experiment is considered to have stopped when the lowest test error was reached last time, and the previously saved network model is the final network model obtained.
[0094] The specific process of model training is as follows: First, the model needs to be built, and various hyperparameters required for the experiment need to be set. Then, the dataset is preprocessed, and the model is trained after preprocessing. After a certain number of training rounds, the model's performance on the validation set is used to observe whether the loss and accuracy have decreased. When the model's loss on the validation set no longer decreases after a certain number of training rounds, an early termination strategy is adopted, and the best-performing model is saved as the final model.
[0095] To improve model accuracy and reduce training time, dataset preprocessing is typically performed before training to enhance model performance. This paper also employs preprocessing for the brain functional connectivity signal dataset. Common preprocessing strategies for image data, such as rotation, cropping, and scaling, are not well-suited for this dataset. Therefore, this paper utilizes z-score normalization (zero mean normalization) and random dropout for preprocessing, taking into account the characteristics of the brain functional connectivity signal dataset.
[0096] (1) z-score standardization:
[0097] After processing with z-score, the data conforms to a standard normal distribution.
[0098] The formula for z-score standardization is:
[0099] x*=x-μσ
[0100] Where μ is the mean of the data, σ is the standard deviation of the data, x is the original data, and x* is the processed data of the original data x.
[0101] z-score standardization optimizes the data distribution, enhancing data contrast and making it easier for the model to discover relationships between data points. The z-score standardization method highlights the contrast between data points, facilitating the discovery of connections, and is therefore well-suited for analyzing brain functional connectivity signals. Even with unknown data, z-score standardization can improve the data distribution to a normal distribution with a mean of 0 and a variance of 1, which is beneficial for model training and can accelerate training speed and improve model accuracy to some extent.
[0102] (2) Random drop strategy:
[0103] Inspired by the dropout technique, this paper employs a random discarding strategy for the brain functional connectivity signal dataset in addition to dropout and early termination strategies to avoid overfitting. This random discarding strategy ensures that each element in the sample data has a probability p of being ineffective, thus not affecting model training. Furthermore, it simulates a scenario where, during brain data collection from a patient, some signals are missing or abnormal and unusable for model training. In this case, other known brain functional connectivity signals are used to distinguish between data belonging to patients with depression and those without. In this paper, this manifests as the sample label remaining the same even when some elements are missing; the missing elements do not affect the label, maintaining the same label as the complete sample. Using a random discarding strategy not only mitigates overfitting to some extent but also simulates different sample data (each discarded element has a different value, thus a single sample data point corresponds to multiple randomly discarded samples during training), improving the model's generalization performance. To simulate the loss of some data in brain functional connectivity signals, rendering them ineffective during training (i.e., setting their values to 0), each element in the brain functional connectivity signal sample was assigned a probability p of 0. Setting p too large leads to the loss of effective information from the sample data, reducing training accuracy and potentially preventing model convergence. Setting p too small has no effect. Experiments showed that setting p = 0.005 improved model accuracy. Compared to the original data, only some elements were set to 0, while the values of the remaining elements remained unchanged. This approach helps prevent overfitting to some extent and, by increasing the number of training samples, improves both model accuracy and generalization ability.
[0104] In deep learning, some parameters are not used in training but are manually set before the model begins training; these are called hyperparameters. Different hyperparameters can affect the training results of the model. Therefore, selecting a suitable set of hyperparameters based on the characteristics of the network model and the dataset can improve the learning efficiency and accuracy of deep learning network models. There are no universal hyperparameter options; that is, no certain hyperparameters perform well on all models and datasets. Different hyperparameters are often required for different models and datasets. In this paper, the main hyperparameters to be set are: learning rate, number of training epochs, weight decay coefficient, momentum coefficient, mini-batch size, and number of training epochs for early termination strategy. The hyperparameter values used in this paper are shown in the table below:
[0105]
[0106] Learning Rate: In the early stages of model training, a large learning rate is needed to ensure rapid convergence of model parameters. However, after a certain number of training iterations, maintaining a large learning rate can prevent reaching the global optimum and cause the loss to oscillate, resulting in unconverged model parameters. Therefore, the learning rate is reduced after a certain number of training iterations. In this paper, a learning rate of 0.001 is used for training iterations 0 to 50; the learning rate is reduced to 0.0005 for iterations 50 to 80; and the learning rate is reduced to 0.0001 for iterations 80 to 100.
[0107] Training epochs: Too many training epochs may lead to overfitting, resulting in excellent performance on the training set but poor performance on the test set. Too few training epochs may lead to underfitting, meaning the model cannot learn sufficient expressive power from the training set, resulting in insufficient accuracy on both the training and test sets. In our experiments, we found that the model can converge within approximately 100 training epochs; therefore, we selected a maximum of 100 training epochs.
[0108] Weight decay coefficient: This paper employs a weight decay coefficient to reduce overfitting. By using L2 regularization, i.e., a weight decay strategy, the model parameters tend to select smaller values, thereby reducing overfitting. The weight decay coefficient selected in this paper is 0.001.
[0109] Momentum coefficient: Considering the impact of the weight update direction in previous training rounds on the parameter update in this round, choosing an appropriate momentum coefficient can accelerate the model's convergence speed and avoid unnecessary oscillations. Research suggests that a momentum coefficient between 0.9 and 0.95 generally yields good results. In this paper, a momentum coefficient of 0.9 is ultimately chosen.
[0110] Mini-batch size: When using the mini-batch gradient descent algorithm, the number of samples used for each model parameter update needs to be specified. Too many samples in the batch will consume more memory, placing higher demands on the machine used for training the model. Furthermore, as the number of samples increases, the model's convergence speed decreases, and the training time also increases. While a small batch size can speed up convergence, it cannot guarantee reaching the global optimum or the optimal solution. This paper selects 15 samples for each model weight update. Early termination strategy training epochs: This paper adopts an early termination strategy to avoid overfitting. After a certain number of training epochs, if the model's loss on the validation set begins to increase and its accuracy on the validation set begins to decrease, early termination can be considered. In subsequent training epochs, if the model's performance on the validation set consistently declines, the early termination strategy is used, saving the best-performing model on the validation set as the final model and stopping training. The early termination strategy selected in this paper involves 10 training rounds. In the subsequent 10 training rounds, if the model’s performance on the validation set decreases, then the model has overfitted. The model with the better performance in the previous rounds is saved and training is stopped.
[0111] The above description is merely a preferred embodiment of the present invention. It should be understood that the present invention is not limited to the forms disclosed herein and should not be construed as excluding other embodiments. It can be used in various other combinations, modifications, and environments, and can be altered within the scope of the concept described herein through the above teachings or related technologies or knowledge. Modifications and variations made by those skilled in the art that do not depart from the spirit and scope of the present invention should be within the protection scope of the appended claims.
Claims
1. A brain connectivity signal recognition method based on deep learning, characterized in that, Includes the following steps: Step 1: Preprocess the brain connectivity dataset using z-score normalization and random dropout to obtain the preprocessed dataset, which serves as the training set. Then, normalize the brain connectivity dataset using z-score to generate the validation set. Step 2: Improve the ResNet model to obtain the Res_ICML model, input the training set into the Res_ICML model, train the Res_ICML model, and after a set number of training rounds, proceed to Step 3; Step 3: Validate the trained Res_ICML model using the validation set. If the early termination condition is met, proceed to Step 4. If not, determine whether the set maximum number of iterations has been reached. If yes, proceed to Step 4; otherwise, return to Step 2 to continue training. Step 4: End the training and save the trained Res_ICML model. Input the obtained brain functional connectivity signal data into the trained Res_ICML model to obtain the probability that the brain functional connectivity signal data belongs to a patient with depression. The aforementioned improvement of the ResNet model to obtain the Res_ICML model includes: The brain functional connectivity signal data first goes through a two-dimensional convolutional module, then uses an improved Inception module for feature extraction to generate feature maps. After generating feature maps, a channel attention mechanism module is used to assign weights to each feature map and continue convolution operations. Residual connections are used to increase network depth and avoid network degradation. After continuous convolution operations, X 1x1 feature maps are finally obtained. Each feature map has a global receptive field. Finally, two fully connected layers are used, and the value p is obtained through the softmax activation function. The process of using the improved Inception module for feature extraction and generating a feature map includes the following steps: Based on the Inception module structure, two convolutional kernels of different sizes were added to extract features. A 3x1 convolutional kernel was used, corresponding to the horizontal coordinate in the two-dimensional matrix of brain functional connectivity signals. A larger 7x7 convolutional kernel was introduced to capture effective features within a certain time and range. The number of parameters was reduced by decreasing the number of the larger 7x7 convolutional kernels.
2. The brain connectivity signal recognition method based on deep learning according to claim 1, characterized in that, The brain connectivity dataset is preprocessed using z-score normalization and random dropout to obtain a preprocessed dataset, which serves as the training set. The process includes the following steps: The formula for z-score standardization is: in The average value of the data. The standard deviation of the data. This is the original data. The obtained raw data Data after standardization; The random drop strategy is as follows: by randomly dropping elements, each element in the brain connectivity dataset has a probability of p that it does not play a role.
3. The brain connectivity signal recognition method based on deep learning according to claim 1, characterized in that, The early termination condition is defined as follows: when the loss of the model on the validation set no longer decreases, it is considered an early termination condition.
4. The brain connectivity signal recognition method based on deep learning according to claim 1, characterized in that, The channel attention mechanism module is as follows: In the SE module, two fully connected layers are used to generate weights for the feature maps. On this basis, another fully connected layer is added to learn the weights of each feature map. At the same time, dropout is used in the fully connected layers of the SE module to improve performance.