A lightweight classification method for brain tumor MRI images
By constructing a lightweight MDCA-MobileNetV2 network and combining multi-scale feature extraction and attention mechanisms, the problems of high computational cost and insufficient feature extraction in existing technologies are solved, and high-precision MRI classification of brain tumors is achieved on resource-constrained devices, which is suitable for rapid and accurate clinical auxiliary diagnosis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TIANJIN UNIVERSITY OF TECHNOLOGY
- Filing Date
- 2026-04-10
- Publication Date
- 2026-06-19
AI Technical Summary
Existing MRI classification models for brain tumors are computationally expensive, lack sufficient feature extraction for small and morphologically variable lesions, making them difficult to deploy in real time on clinical terminal devices with limited computing resources, and their classification accuracy and sensitivity are insufficient.
A lightweight network based on MobileNetV2 is constructed, and a multi-scale feature extraction module and attention mechanism are introduced. Combined with dynamic learning rate and label smoothing strategy, the loss function is optimized to form the MDCA-MobileNetV2 network.
While maintaining low computational complexity, it improves the classification accuracy and generalization ability of brain tumor MRI images, adapts to embedded device deployment, and enables rapid and accurate assisted diagnosis.
Smart Images

Figure CN122244547A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of medical image processing and artificial intelligence technology, specifically relating to an automatic classification method for brain tumor MRI images based on a lightweight convolutional neural network. Background Technology
[0002] Brain tumors are among the most common malignant tumors of the central nervous system, characterized by high incidence, high recurrence rate, and high mortality. According to the World Health Organization, the global incidence of brain tumors is on the rise, seriously threatening human life and health. In clinical practice, early and accurate diagnosis is crucial for developing effective treatment plans (including surgical resection, radiotherapy, and chemotherapy) and improving patient prognosis. Magnetic resonance imaging (MRI), with its superior soft tissue resolution and lack of radiation damage, has become the preferred imaging technique for brain tumor screening, diagnosis, and treatment evaluation.
[0003] However, the interpretation and analysis of MRI images still heavily relies on the professional knowledge and clinical experience of radiologists. Faced with the ever-increasing volume of image data, manual interpretation is not only time-consuming, labor-intensive, and inefficient, but also easily affected by physicians' subjective experience, fatigue, and differences in diagnostic criteria, potentially leading to inconsistent diagnostic results, missed diagnoses, or misdiagnoses. This is especially true in brain tumor classification tasks, where different tumor types (such as gliomas, meningiomas, and pituitary adenomas) exhibit significant differences in morphology, boundaries, signal intensity, and spatial location. Some lesions are small, have low contrast with normal tissue, and are irregular in shape, further increasing the difficulty and uncertainty of manual identification.
[0004] In recent years, deep learning technology, especially convolutional neural networks (CNNs), has shown great potential in medical image analysis. Some studies have attempted to use classic CNN models (such as VGG, ResNet, and Inception) for the automatic classification of brain tumor MRI images, and have achieved some progress. However, these models generally suffer from the following limitations: First, mainstream high-performance CNNs are typically complex in structure and have a huge number of parameters (often tens of millions or even hundreds of millions), resulting in high computational costs and large memory consumption, making it difficult to achieve real-time deployment and application in clinical terminal devices or mobile medical scenarios with limited computing resources. Second, most general-purpose CNN models are designed for natural images and are not well adapted to the characteristics of medical images, such as small lesion regions, subtle features, and varied morphologies. In particular, the process of lightweighting often comes at the cost of sacrificing feature expression capabilities, leading to a decrease in the sensitivity of identifying small tumors and atypical lesions. In addition, although existing lightweight network architectures (such as the MobileNet series) have significantly reduced computational complexity through techniques such as depthwise separable convolution, they still have problems such as weak multi-scale feature fusion capabilities, insufficient utilization of spatial context information, and insufficient attention to key regions when facing complex and varied medical images, which restricts further improvement in classification accuracy.
[0005] Therefore, there is an urgent need for a lightweight brain tumor MRI image classification method that can maintain low computational complexity and parameter count while possessing strong feature extraction and discrimination capabilities, especially one that can effectively integrate multi-scale information and enhance focus on lesion areas. This method would assist in achieving efficient, objective, and accurate computer-aided diagnosis, alleviate clinical diagnostic pressure, and improve the consistency and reliability of diagnosis and treatment. Summary of the Invention
[0006] The purpose of this invention is to address the problems of high computational cost and insufficient feature extraction for small and morphologically variable lesions in existing brain tumor MRI classification models, and to provide a lightweight classification method for brain tumor MRI images. This invention constructs a lightweight network with MobileNetV2 as its backbone, introduces a multi-scale feature extraction module to fuse global and local information, and combines channel and spatial attention mechanisms to enhance the model's ability to identify tumor regions, ultimately achieving high-precision, low-complexity end-to-end classification.
[0007] The lightweight classification method for brain tumor MRI images of the present invention mainly includes the following key steps:
[0008] 1. Construct a lightweight backbone network:
[0009] 1.1. The MobileNetV2 network is used as the basic framework;
[0010] 1.2. Depthwise separable convolution and inverse residual structure are used to construct a lightweight framework;
[0011] 2. Design a multi-scale feature enhancement module:
[0012] Section 2.1, Multi-scale feature extraction module based on dilated convolution;
[0013] 2.2 This module replaces the first standard convolutional layer of MobileNetV2;
[0014] 3. Introduce the attention mechanism module:
[0015] 3.1 Embedded channel attention mechanism;
[0016] 3.2 Embedded spatial attention mechanism;
[0017] 4. Construct and optimize an end-to-end classification network:
[0018] 4.1 Integrate the MDCA-MobileNetV2 network;
[0019] 4.2 Dynamic learning rate adjustment strategy;
[0020] 4.3 Optimize the loss function using a label smoothing strategy.
[0021] Furthermore, the use of MobileNetV2 network as the basic framework in step 1.1 specifically refers to the adoption of MobileNetV2 network as the initial architecture and feature extraction core of the entire classification model. This framework is composed of multiple linear bottleneck inverse residual blocks stacked together. Each inverse residual block achieves efficient feature expression through the process of "dimensionality increase - feature extraction - dimensionality reduction", laying a structural foundation for the lightweight model.
[0022] Step 1.2 describes building a lightweight framework using depthwise separable convolution and inverse residual structures. Specifically, in each inverse residual block, the input feature map first undergoes 1×1 pointwise convolution for dimensionality increase, then 3×3 depthwise separable convolution for spatial feature extraction, and finally 1×1 pointwise convolution for dimensionality reduction and the addition of residual connections. The depthwise separable convolution decomposes standard convolution into depthwise convolution (independent convolution per channel) and pointwise convolution (1×1 channel fusion), significantly reducing computational complexity. Assuming the input feature map size is... The kernel size is The number of output feature channels is The computational cost of standard convolution is:
[0023] (1)
[0024] The total computational cost of depthwise separable convolution is approximately:
[0025] (2)
[0026] The ratio of their computational costs is:
[0027] (3)
[0028] Typically, both K and Cout are much greater than 1, thus reducing the computational cost by an order of magnitude. A linear bottleneck structure is used to prevent nonlinear activation functions from destroying low-dimensional features, thus maintaining feature expressiveness while compressing the parameter size.
[0029] The proposed multi-scale feature extraction module based on dilated convolution, as described in step 2.1, specifically refers to: designing a parallel multi-branch structure module. The first convolutional layer of this module consists of 1×1, 3×3, 5×5, and 7×7 convolutional templates. Among them, the 3×3, 5×5, and 7×7 convolutional templates are dilated convolutional templates obtained by dilating ordinary 3×3 convolutional templates. Different dilation rates (d=1, 2, 3) are used to capture features of different receptive fields. The second convolutional layer consists of 1×1 convolutional templates, which are used to concatenate the feature maps of each branch after the dilated convolution operation by channel, and fuse global feature information and subtle feature information to generate a feature map with uniform dimensionality.
[0030] The step 2.2, which involves replacing the first standard convolutional layer of MobileNetV2 with this module, specifically means replacing the original 3×3 standard convolutional layer of MobileNetV2 with the multi-scale feature extraction module. This allows the network to integrate multi-scale features in the initial stage, enhancing its adaptability to changes in tumor size and morphology.
[0031] Furthermore, the embedded channel attention mechanism described in step 3.1 specifically refers to: inserting a channel attention module after the network feature layer. First, global average pooling and global max pooling are performed on the input feature map to obtain two types of channel feature vectors. Then, a shared multilayer perceptron (MLP) is used to adjust the dimension and perform nonlinear mapping on the feature vectors to generate channel attention weights. This module establishes dependencies between feature channels and redistributes weights to different feature channels, increasing the network's attention to important information such as tumor regions and suppressing interference from redundant information such as normal tissues. The calculation formula is as follows:
[0032]
[0033] (4)
[0034] (5)
[0035] Where σ is the Sigmoid activation function, This indicates multiplication by channel. These are the channel features after global average pooling. The channel features are obtained after global max pooling. and Shared weights for MLP layers;
[0036] The embedded spatial attention mechanism described in step 3.2 specifically refers to: inserting a spatial attention module after the channel attention module, performing average pooling and max pooling on the feature maps in the channel dimension respectively, concatenating the results and generating a spatial attention map through a convolutional layer.
[0037] Furthermore, the integration of the above modules to form the MDCA-MobileNetV2 network described in step 4.1 specifically refers to connecting the modules in the following order: "input image → multi-scale feature extraction module → max pooling layer → channel attention module → spatial attention module → MobileNetV2 skeleton → adaptive average pooling layer → fully connected classification layer" to construct a complete end-to-end classification network. Attention modules are inserted as needed into the key feature layers of the MobileNetV2 skeleton to achieve deep fusion of multi-scale features and attention enhancement. The overall architecture ensures that the network possesses strong feature representation capabilities while remaining lightweight.
[0038] The dynamic learning rate adjustment strategy described in step 4.2 specifically refers to using a multi-step decay method (MultiStepLR) to dynamically adjust the learning rate. During training, when the number of iterations reaches a preset milestone, the current learning rate is multiplied by a decay coefficient. The formula is:
[0039] (6)
[0040] Where α is the initial learning rate, [milestones] is the specified number of learning rate decay iterations, and γ is the learning rate decay coefficient, usually taken as 0.1; this strategy enables the network to converge quickly with a large learning rate in the early stage of training, and then finely adjust the parameters with a smaller learning rate in the later stage, reducing the fluctuation of the loss function near the optimal solution;
[0041] Step 4.3 describes using a label smoothing strategy to optimize the loss function. Specifically, label smoothing is a regularization method used to prevent the model from blindly trusting the probability distribution of the true labels during training. Its core is to change the probability of the true class label of the sample. ( (For the Dirac function), the changed class label probability is: (7)
[0042] in, Typically, the category labels are taken as a uniform distribution, i.e. ε is the label smoothing coefficient. Therefore, the loss function becomes:
[0043] (8)
[0044] In the formula The original cross-entropy loss is represented by L(u,p), which is the degree of deviation of the predicted probability distribution from the uniform distribution (penalty term). This strategy reduces the network's over-reliance on the true labels, alleviates overfitting, and improves the model's ability to generalize to new data.
[0045] The advantages and positive effects of this invention are:
[0046] This invention addresses the core pain points of existing technologies through a combined design of a lightweight backbone, multi-scale feature enhancement, dual attention mechanisms, and optimized training strategies. The application of depthwise separable convolutions and inverted residual structures significantly reduces the number of model parameters and computational cost, making it suitable for embedded device deployment. The multi-scale feature extraction module achieves the fusion of global and subtle features, improving adaptability to tumors of different sizes and shapes. The synergistic effect of channel and spatial attention mechanisms strengthens the feature representation of tumor regions and reduces redundant information interference. Dynamic learning rate and label smoothing strategies further optimize the model training process, improving classification accuracy and generalization ability. Compared with existing classic networks, the proposed MDCA-MobileNetV2 network significantly improves classification accuracy, sensitivity, and specificity while maintaining its lightweight characteristics, providing an effective technical solution for rapid and accurate auxiliary diagnosis of brain tumors. Attached Figure Description
[0047] Figure 1 This is a diagram of the MobileNetV2 architecture;
[0048] Figure 2 This is a diagram of a multi-scale feature extraction module based on dilated convolution;
[0049] Figure 3 This is a diagram of the channel attention module;
[0050] Figure 4 This is the MDCA-MobileNetV2 architecture diagram;
[0051] Figure 5 This is a diagram obtained by rotating and translating brain tumor images from the CE-MRI dataset;
[0052] Figure 6 This is a graph showing the changes in the Acc values obtained by the original MDCA-MobileNetV2 network and the MDCA-MobileNetV2 network with dynamically adjusted learning rate in the training set.
[0053] Figure 7 This is a graph showing the changes in loss values obtained by the original MDCA-MobileNetV2 network and the MDCA-MobileNetV2 network with dynamically adjusted learning rate in the training set.
[0054] Figure 8 This is the confusion matrix diagram obtained from the original MDCA-MobileNetV2 network in the test set;
[0055] Figure 9 This is the confusion matrix diagram of MDCA-MobileNetV2 obtained in the test set after adding the label smoothing strategy;
[0056] Figure 10 This is a classification result diagram of the five networks;
[0057] Figure 11 This is a flowchart of the lightweight classification method for brain tumor MRI images according to the present invention. Detailed Implementation
[0058] Example 1
[0059] This embodiment designs a brain tumor MRI image classification system based on the PyTorch deep learning framework and the Python programming language. The core objective is to verify the classification accuracy, generalization ability, and deployment efficiency of the MDCA-MobileNetV2 network under lightweight conditions, adapting it to clinical auxiliary diagnostic scenarios. Key operations mainly involve dataset construction and preprocessing, network model building, training optimization, performance verification, and ablation experiments.
[0060] like Figure 11 As shown, the lightweight classification method for brain tumor MRI images of this invention mainly includes the following key steps:
[0061] 1. Construct a lightweight backbone network:
[0062] 1.1. The MobileNetV2 network is used as the basic framework;
[0063] 1.2. Depthwise separable convolution and inverse residual structure are used to construct a lightweight framework;
[0064] 2. Design a multi-scale feature enhancement module:
[0065] Section 2.1, Multi-scale feature extraction module based on dilated convolution;
[0066] 2.2 This module replaces the first standard convolutional layer of MobileNetV2;
[0067] 3. Introduce the attention mechanism module:
[0068] 3.1 Embedded channel attention mechanism;
[0069] 3.2 Embedded spatial attention mechanism;
[0070] 4. Construct and optimize an end-to-end classification network:
[0071] 4.1 Integrate the MDCA-MobileNetV2 network;
[0072] 4.2 Dynamic learning rate adjustment strategy;
[0073] 4.3 Optimize the loss function using a label smoothing strategy.
[0074] Key steps explained in detail:
[0075] The use of the MobileNetV2 network as the basic framework in step 1.1 of this invention specifically refers to: employing an additional... Figure 1 The MobileNetV2 network serves as the initial architecture and core feature extraction mechanism for the entire classification model. This backbone is composed of multiple linear bottleneck inverse residual blocks stacked together. Each inverse residual block achieves efficient feature representation through the process of "dimensionality increase - feature extraction - dimensionality reduction", laying the structural foundation for lightweight model.
[0076] Step 1.2 describes building a lightweight framework using depthwise separable convolution and inverse residual structures. Specifically, in each inverse residual block, the input feature map first undergoes 1×1 pointwise convolution for dimensionality increase, then 3×3 depthwise separable convolution for spatial feature extraction, and finally 1×1 pointwise convolution for dimensionality reduction and the addition of residual connections. The depthwise separable convolution decomposes standard convolution into depthwise convolution (independent convolution per channel) and pointwise convolution (1×1 channel fusion), significantly reducing computational complexity. Assuming the input feature map size is... The kernel size is The number of output feature channels is The computational cost of standard convolution is:
[0077] (1)
[0078] The total computational cost of depthwise separable convolution is approximately:
[0079] (2)
[0080] The ratio of their computational costs is:
[0081] (3)
[0082] Typically, both K and Cout are much greater than 1, thus reducing the computational cost by an order of magnitude. A linear bottleneck structure is used to prevent nonlinear activation functions from destroying low-dimensional features, thus maintaining feature expressiveness while compressing the parameter size.
[0083] Step 2.1, which proposes a multi-scale feature extraction module based on dilated convolution, specifically refers to: designing an appendix... Figure 2 The parallel multi-branch structure module shown has a first convolutional layer composed of 1×1, 3×3, 5×5, and 7×7 convolutional templates. The 3×3, 5×5, and 7×7 convolutional templates are dilated 3×3 convolutional templates, achieving feature capture at different receptive field scales through different dilation rates (d=1, 2, 3). The second convolutional layer consists of 1×1 convolutional templates, which concatenate the feature maps from each branch after the dilated convolution operation by channel, fusing global and subtle feature information to generate a feature map with uniform dimensionality.
[0084] The step 2.2, which involves replacing the first standard convolutional layer of MobileNetV2 with this module, specifically means replacing the original 3×3 standard convolutional layer of MobileNetV2 with the multi-scale feature extraction module. This allows the network to integrate multi-scale features in the initial stage, enhancing its adaptability to changes in tumor size and morphology.
[0085] Furthermore, the embedded channel attention mechanism described in step 3.1 specifically refers to inserting an appended channel attention mechanism after the network feature layer. Figure 3 The channel attention module shown first performs global average pooling and global max pooling on the input feature map to obtain two types of channel feature vectors. Then, it uses a shared multilayer perceptron (MLP) to adjust the dimension and perform nonlinear mapping on the feature vectors to generate channel attention weights. This module redistributes weights to different feature channels by establishing dependencies between them, thereby increasing the network's attention to important information such as tumor regions and suppressing interference from redundant information such as normal tissues. The calculation formula is as follows:
[0086]
[0087] (4)
[0088] (5)
[0089] Where σ is the Sigmoid activation function, This indicates multiplication by channel. These are the channel features after global average pooling. The channel features are obtained after global max pooling. and Shared weights for MLP layers;
[0090] The embedded spatial attention mechanism described in step 3.2 specifically refers to: inserting a spatial attention module after the channel attention module, and obtaining two 2D spatial feature maps by performing average pooling and max pooling on the feature maps in the channel dimension, concatenating the two and compressing the number of channels through a 1×1 convolutional layer to generate a spatial attention map, which accurately focuses on the spatial location of the tumor, making up for the deficiency of channel attention which only focuses on the importance of channels and ignores spatial location information.
[0091] Furthermore, the integration of the above modules to form the MDCA-MobileNetV2 network described in step 4.1 specifically refers to connecting the modules in the following order: "input image → multi-scale feature extraction module → max pooling layer → channel attention module → spatial attention module → MobileNetV2 skeleton → adaptive average pooling layer → fully connected classification layer," thus constructing a complete end-to-end classification network. The overall architecture is shown in the attached figure. Figure 4 As shown; attention modules are inserted as needed into the key feature layers of the MobileNetV2 skeleton to achieve deep fusion of multi-scale features and attention enhancement. The overall architecture ensures that the network has strong feature expression capabilities while being lightweight.
[0092] The dynamic learning rate adjustment strategy described in step 4.2 specifically refers to using a multi-step decay method (MultiStepLR) to dynamically adjust the learning rate. During training, when the number of iterations reaches a preset milestone, the current learning rate is multiplied by a decay coefficient. The formula is:
[0093] (6)
[0094] Where α is the initial learning rate, [milestones] is the specified number of learning rate decay iterations, and γ is the learning rate decay coefficient, usually taken as 0.1; this strategy enables the network to converge quickly with a large learning rate in the early stage of training, and then finely adjust the parameters with a smaller learning rate in the later stage, reducing the fluctuation of the loss function near the optimal solution;
[0095] Step 4.3 describes using a label smoothing strategy to optimize the loss function. Specifically, label smoothing is a regularization method used to prevent the model from blindly trusting the probability distribution of the true labels during training. Its core is to change the probability of the true class label of the sample. ( (For the Dirac function), the changed class label probability is: (7)
[0096] in, Typically, the category labels are taken as a uniform distribution, i.e. ε is the label smoothing coefficient. Therefore, the loss function becomes:
[0097] (8)
[0098] In the formula The original cross-entropy loss is represented by L(u,p), which is the degree of deviation of the predicted probability distribution from the uniform distribution (penalty term). This strategy reduces the network's over-reliance on the true labels, alleviates overfitting, and improves the model's ability to generalize to new data.
[0099] The CE-MRI dataset used in this invention is a T1-weighted contrast-enhanced MRI dataset of brain tumors obtained from two hospitals in China (Southern Hospital of Guangzhou and Tianjin Medical University General Hospital). It includes three types of brain tumors: glioma, meningioma, and pituitary adenoma, comprising 3064 enhanced MRI images. Each image is a 2D slice, and the boundaries between the tumor and normal tissue in each slice were manually marked by three experienced neuroradiologists, clearly displaying the fine structures of the brain and detailed information about the lesion area. It can be used for brain tumor image classification tasks. Therefore, the CE-MRI dataset was chosen as the dataset for this invention. Table 1 shows the distribution of the CE-MRI dataset.
[0100] Table 1 Distribution of CE-MRI Datasets
[0101]
[0102] Furthermore, the overall number of brain tumor images is not particularly large, which leads to significant differences in classification results across different categories. To increase the amount of brain tumor images of various categories in the training dataset, data augmentation operations are performed on the brain tumor images before training the brain tumor classification model. The specific operations are as follows:
[0103] (1) Vertical image flip: Flip the image 180 degrees clockwise.
[0104] (2) Horizontal image flip: Mirror the image.
[0105] (3) Random image rotation: Randomly rotate the image by x degrees, where x∈ [−90, 90].
[0106] (4) Image color transformation: Randomly change the brightness and contrast of the image to y times the original value, where y∈ [0.5,1.5].
[0107] (5) Image normalization: In order to make the pixel distribution of the image follow a normal distribution, the average value of each channel of the training image is subtracted and then divided by the standard deviation.
[0108] The images obtained by rotating and translating brain tumor images from the CE-MRI dataset are shown in the attached image. Figure 5 As shown.
[0109] The number of images in the CE-MRI dataset and the expanded CE-MRI' (rotation, translation) dataset is shown in Table 2. 1000 images of glioma, meningioma, and pituitary tumor were generated using traditional image enhancement methods to expand the dataset.
[0110] Table 2. Number of CE-MRI datasets before and after expansion
[0111]
[0112] The experiment will employ K-fold cross-validation to divide the training and validation sets to ensure image diversity. Furthermore, K-fold cross-validation can be used for model tuning to find the hyperparameter values that optimize the model's generalization performance. K-fold cross-validation divides all samples into k equal-sized subsets. In each iteration, each sample point is assigned to both the training and validation sets without repetition. Then, the k subsets are iterated sequentially, with the current subset used as the validation set and the rest as the training set for model training and evaluation. Finally, the average value is used as the evaluation metric. The value of K varies depending on the dataset size, typically between 5 and 10; in this chapter's experiments, a value of 7 is used. Table 3 shows the hyperparameters set during the training of the MDCA-MobileNetV2 network.
[0113] Table 3 Hyperparameter settings for the Neuro-MobiFuseNet network
[0114]
[0115] Integrating the above modules and theories, a complete deep learning-based brain tumor image classification algorithm is designed:
[0116] Input parameters:
[0117] Training dataset ;
[0118] Test dataset ;
[0119] Learning rate ;
[0120] Batch Size ;
[0121] Number of training cycles (Epochs) ;
[0122] Optimizer (Adam);
[0123] Output result:
[0124] The trained MDCA-mobileNetV2 model ;
[0125] In the test set Classification performance metrics (accuracy, precision, recall, F1 score);
[0126] Algorithm steps description:
[0127] Initialization: Initialize the MDCA-mobileNetV2 model. Network parameters Choose Adam as the optimizer and set the learning rate. .
[0128] Data preprocessing: For and All images Preprocessing is performed, including size normalization, data augmentation (such as random flipping and rotation), and normalization.
[0129] Model training:
[0130] training dataset Randomly shuffled.
[0131] Will Divided into multiple batches, each batch size is .
[0132] For each batch :
[0133] Forward propagation: Input the MDCA-mobileNetV2 model and calculate the predicted output. This process involves the calculation of the coordinate attention fusion module defined in formulas (2) to (7).
[0134] Calculate the loss: Use the cross-entropy loss function to calculate the predicted output. With real labels Losses between As described in formula (1).
[0135] Backpropagation: Calculate the loss function with respect to the model parameters. gradient .
[0136] Update parameters: Use the Adam optimizer based on gradients To update model parameters .
[0137] Model Evaluation: After training is complete, the trained model will be evaluated. Set to evaluation mode.
[0138] Execute prediction: For the test set Each image in Measurement through model .
[0139] Performance calculation: Collect all prediction results, compare them with the true labels, and calculate performance metrics such as accuracy, precision, recall, and F1 score.
[0140] Output: Returns the trained model. and performance indicators.
[0141] Algorithm 1 Description:
[0142]
[0143] Time and space complexity analysis:
[0144] Space complexity (number of model parameters):
[0145] The parameters are mainly determined by convolutional layers and fully connected layers. CFA-Net extensively uses depthwise separable convolutions instead of standard convolutions. The number of parameters in a standard convolution is... The number of parameters for depthwise separable convolution is The number of parameters is reduced to approximately [a fraction] of that of a standard convolution. Therefore, the space complexity of MDCA-mobileNetV2 is much lower than that of networks such as VGG or ResNet that use standard convolutions. Its space complexity is O(n). , where D is the network depth and l is the layer index.
[0146] Time complexity (computational cost / FLOPs):
[0147] Similar to space complexity, time complexity is primarily determined by the number of multiplications and additions in the convolution operation. The FLOPs of a standard convolution are approximately... The number of FLOPs for depthwise separable convolutions is approximately [missing information]. The computational cost is reduced to approximately one-third of that of a standard convolution. The overall time complexity is significantly reduced because MDCA-mobileNetV2 uses this efficient convolutional method, enabling faster training and inference speeds, which is crucial for deployment on resource-constrained devices such as MEC.
[0148] Algorithm 2 description:
[0149]
[0150] Algorithm 3 Description:
[0151]
[0152] This experiment will consider five evaluation metrics: accuracy, sensitivity, specificity, positive predictive value, and negative predictive value to evaluate the classification results. The calculation formulas are as follows:
[0153] (9)
[0154] (10)
[0155] (11)
[0156] (12)
[0157] (13)
[0158] The meanings of α, β, δ, and γ in the formula are shown in Table 4.
[0159] Table 4 Confusion Matrix
[0160]
[0161] HGG represents high-grade glioma, and LGG represents low-grade glioma.
[0162] The results of this experiment are as follows:
[0163] 1. Ablation Experiment Results and Analysis
[0164] Two sets of ablation experiments were conducted regarding the network structure. The first set used only the MD module as the first convolutional layer of the network, and was named MD-MobileNetV2. The second set added only the CA module to the network structure, and was named CA-MobileNetV2. To ensure the scientific validity of the experimental results, the initialization parameters of the networks used for comparison in both sets of ablation experiments were randomly generated using the same random number seed. The experimental dataset used was the MRI dataset, and the comparison classification results were composed of the mean and standard deviation of five sets of experimental data.
[0165] 1) Network structure ablation experiment
[0166] The effectiveness of the MD module and the attention module was verified, and the results are shown in Table 5:
[0167] Table 5 MD-MobileNetV2 and CA-MobileNetV2
[0168]
[0169] Analysis: The complete model significantly outperforms the single-module model in all metrics. The ACC is improved by 4.68% compared to MD-MobileNetV2 and by 4.88% compared to CA-SA-MobileNetV2, with a reduced standard deviation, demonstrating that the synergistic effect of the modules improves classification accuracy and robustness. Feature map visualization shows that the complete model can simultaneously display the spatial location of the tumor and suppress ineffective responses outside the region.
[0170] 2) Optimized strategy ablation experiment
[0171] First, a single optimization strategy was added during network training. The accuracy and loss curves during training, as well as the classification results obtained on the test set, were compared and analyzed to verify the effectiveness of the optimization strategy used in this invention. Then, an ablation experiment was conducted to control for a single variable and select the optimal combination of optimization strategies. The BraTS2017 dataset was used for the experimental dataset in this section, and five-fold cross-validation experiments were performed. For ease of description, only the same set of cross-validation data was used for the comparative analysis of a single optimization strategy. The comparative classification results of the ablation experiment consist of the mean and standard deviation of the five sets of experimental data.
[0172] Appendix Figure 6 and attached Figure 7 The curves showing the changes in Acc and Loss values obtained in the training set are for the original MDCA-MobileNetV2 network and the MDCA-MobileNetV2 network with dynamically adjusted learning rate, respectively.
[0173] Appendix Figure 8 and attached Figure 9 The images show the confusion matrices obtained in the test set for the original MDCA-MobileNetV2 network and the MDCA-MobileNetV2 network after adding a label smoothing strategy, respectively.
[0174] The effectiveness of dynamic learning rate and label smoothing was verified, and the results are shown in Table 6:
[0175] Table 6 Comparison of Dynamic Learning Rate and Label Smoothing Data
[0176]
[0177] Analysis: The combined use of the two strategies yielded the best results, improving ACC by 3.31%, reducing loss by 45.4%, and decreasing the number of convergence iterations by 28.6%, thus verifying the effectiveness of the optimization strategy.
[0178] 3) Classic network comparison experiment
[0179] Two sets of comparative experiments were set up. One set of experiments applied MobileNetV1, MobileNetV2, ResNet, and VGG19 networks to the MRI dataset and compared the classification results of the three classic networks with those of the MDCA-MobileNetV2 network. The other set of experiments compared the classification results of the optimized three classic networks with those of the MDCA-MobileNetV2 network.
[0180] To ensure the scientific validity of the experimental results, all original networks used in the comparative experiments in this section adopted the same random number seed as the network initialization parameter and the same hyperparameter settings.
[0181] According to Table 7, in the MRI dataset, the MDCA-MobileNetV2 network proposed in this chapter achieves the highest classification results compared to the other three classic networks. Compared to the MobileNetV1 network, the values of ACC, SEN, SPE, PPV, and NPV are improved by 4.04%, 4.29%, 3.82%, 3.99%, and 3.94%, respectively; compared to the MobileNetV2 network, the values of ACC, SPE, PPV, and NPV are improved by 2.75%, 4.08%, 4.70%, and 0.35%, respectively; and compared to the ResNet network, the values of ACC, SEN, SPE, PPV, and NPV are improved by 2.85%, 3.82%, 0.80%, 1.06%, and 3.32%, respectively. Furthermore, the standard deviation of the MDCA-MobileNetV2 network is smaller than that of the other networks.
[0182] Table 7 Evaluation of Classification Results for the MRI Dataset
[0183]
[0184] Appendix Figure 10 The figure shows the classification results of the five networks. As can be seen from the figure, the MDCA-MobileNetV2 network proposed in this chapter has the highest classification accuracy, reaching 97.19%.
[0185] 2. Experimental Conclusions
[0186] This embodiment verifies the effectiveness of the MDCA-MobileNetV2 network through specific experiments: the multi-scale feature extraction module solves the problem of insufficient feature capture of small lesions, the channel-spatial attention mechanism enhances the expression of tumor region features, and the dynamic learning rate and label smoothing strategy optimize the training process. Experimental results show that, with a significant reduction in the number of parameters and computational cost, this method achieves a classification accuracy of 97.19%, strong generalization ability, and inference speed that meets the requirements of clinical deployment, providing a feasible technical solution for rapid and accurate auxiliary diagnosis of brain tumors.
Claims
1. A lightweight classification method for brain tumor MRI images, characterized in that, This method mainly includes the following steps:
1. Construct a lightweight backbone network: 1.
1. The MobileNetV2 network is used as the basic framework; 1.
2. Depthwise separable convolution and inverse residual structure are used to construct a lightweight framework; 2. Design a multi-scale feature enhancement module: Section 2.1, Multi-scale feature extraction module based on dilated convolution; 2.2 This module replaces the first standard convolutional layer of MobileNetV2; 3. Introduce the attention mechanism module: 3.1 Embedded channel attention mechanism; 3.2 Embedded spatial attention mechanism; 4. Construct and optimize an end-to-end classification network: 4.1 Integrate the MDCA-MobileNetV2 network; 4.2 Dynamic learning rate adjustment strategy; 4.3 Optimize the loss function using a label smoothing strategy.
2. The lightweight classification method for brain tumor MRI images as described in claim 1, characterized in that, The use of MobileNetV2 network as the basic framework in step 1.1 specifically refers to the following: the MobileNetV2 network shown in Figure 1 is used as the initial architecture and feature extraction core of the entire classification model. This framework is composed of multiple linear bottleneck inverse residual blocks stacked together. Each inverse residual block achieves efficient feature representation through the process of "dimensionality increase - feature extraction - dimensionality reduction", laying the structural foundation for the lightweight model. Step 1.2 describes constructing a lightweight framework using depthwise separable convolution and inverse residual structures. Specifically, in each inverse residual block, the input feature map first undergoes 1×1 pointwise convolution for dimensionality increase, then 3×3 depthwise separable convolution for spatial feature extraction, and finally 1×1 pointwise convolution for dimensionality reduction and the addition of residual connections. The depthwise separable convolution decomposes standard convolution into depthwise convolution and pointwise convolution, significantly reducing computational complexity. Assuming the input feature map size is... The kernel size is The number of output feature channels is The computational cost of standard convolution is: (1) The total computational cost of depthwise separable convolution is approximately: (2) The ratio of their computational costs is: (3) K and All are much greater than 1. The linear bottleneck structure prevents the nonlinear activation function from destroying low-dimensional features and maintains the feature expressive power.
3. The lightweight classification method for brain tumor MRI images as described in claim 1, characterized in that, The proposed multi-scale feature extraction module based on dilated convolution in step 2.1 specifically refers to: designing a parallel multi-branch structure module. The first convolutional layer of this module consists of 1x1, 3x3, 5x5, and 7x7 convolutional templates. The 3x3, 5x5, and 7x7 convolutional templates are dilated from ordinary 3x3 convolutional templates. The second convolutional layer consists of 1x1 convolutional templates. Its function is to stitch together the feature maps after dilated convolution to obtain a feature map that contains both global and subtle feature information. The step 2.2, which involves replacing the first standard convolutional layer of MobileNetV2 with this module, specifically means replacing the original 3×3 standard convolutional layer of MobileNetV2 with the multi-scale feature extraction module. This allows the network to integrate multi-scale features in the initial stage, enhancing its adaptability to changes in tumor size and morphology.
4. The lightweight classification method for brain tumor MRI images as described in claim 1, characterized in that, The embedded channel attention mechanism described in step 3.1 specifically refers to inserting a channel attention module after the network feature layer. First, global average pooling and global max pooling are performed on the input feature map. Then, channel attention weights are generated through a shared multilayer perceptron. This module generates a channel attention map by establishing channel relationships between feature maps, and redistributes weights to different feature channels to increase the network's attention to important information in the input image, i.e., increase the attention to the tumor region in the entire image. The calculation formula is as follows: (4) (5) in The Sigmoid activation function is used. This indicates multiplication by channel. These are the channel features after global average pooling. The channel features are obtained after global max pooling. and Shared weights for MLP layers; The embedded spatial attention mechanism described in step 3.2 specifically refers to: inserting a spatial attention module after the channel attention module, and generating a spatial attention map by performing average pooling and max pooling on the channel dimension, concatenating the results, and then passing them through a convolutional layer.
5. The lightweight classification method for brain tumor MRI images as described in claim 1, characterized in that, Step 4.1 describes integrating the above modules to form the MDCA-MobileNetV2 network, which specifically means: connecting the input image → multi-scale feature extraction module → max pooling layer → channel attention module → spatial attention module → MobileNetV2 skeleton → adaptive average pooling layer → fully connected classification layer in sequence to construct a complete end-to-end classification network; The dynamic learning rate adjustment strategy described in step 4.2 specifically refers to using a multi-step decay method, where the learning rate is adjusted when a preset milestone round is reached during training. Multiply by the attenuation coefficient γ, the formula is: (6) Where α is the initial learning rate, The specified number of iterations for learning rate decay, where γ is the learning rate decay coefficient, typically set to 0.1; Step 4.3 describes using a label smoothing strategy to optimize the loss function. Specifically, label smoothing is a regularization method typically used to prevent the model from blindly trusting the probability distribution of the true labels during training. Its mechanism primarily involves altering the probability of the true class label of a sample. The probability of the changed category label is: (7) in, Typically, the category labels are taken as a uniform distribution, i.e. ε is the label smoothing coefficient, so the loss function becomes: (8) Where ε is a small hyperparameter, typically ranging from 0.05 to 0.1, it is equivalent to adding a penalty term to the loss function. When it is a uniform distribution, This represents the degree of deviation between the predicted probability distribution p(k) and (k).