A brain tumor lesion detection method, device, medium and equipment
By combining the improved YOLOv8 detection model with the U-Net network, the problem of insufficient accuracy of YOLOv8 in small target detection in brain tumor detection is solved, and high-precision detection of brain tumor lesions is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NORTHWEST UNIVERSITY FOR NATIONALITIES
- Filing Date
- 2026-03-19
- Publication Date
- 2026-06-19
AI Technical Summary
In existing technologies, YOLOv8 struggles to capture the semantic relationship between small targets and the global scene in brain tumor detection, resulting in a high false detection rate and failing to meet the clinical need for precise localization.
An improved YOLOv8 detection model is introduced, including the MambaC2f module in the backbone network and an improved U-Net network. Feature computation is performed through the State Space Model (SSM), and local detail feature extraction is combined to improve the detection accuracy of small targets. Furthermore, multi-scale pooling and feature fusion are used to enhance detection accuracy.
It significantly improves the detection accuracy of small brain tumor lesions, and can better capture the feature information of small targets and long-distance spatial dependencies in images, thereby improving the accuracy of detection.
Smart Images

Figure CN122243944A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image recognition, and in particular to a method, apparatus, medium, and device for detecting brain tumor lesions. Background Technology
[0002] Brain tumors are common space-occupying lesions of the central nervous system, encompassing both benign and malignant types. Their harm depends not only on the nature of the tumor but also closely on its location. Benign tumors, if they compress key neural centers such as those for respiration, heartbeat, and speech, can cause severe functional impairment; malignant tumors, on the other hand, can invade and destroy surrounding normal brain tissue, and even spread intracranially, significantly impacting patient prognosis. Therefore, accurately locating brain tumor lesions and clarifying their spatial proximity to important neural centers, blood vessels, and other key structures is a core prerequisite for the diagnosis and treatment of brain tumors. Whether it's planning the boundaries of surgical resection, delineating the target area for radiotherapy, or evaluating the efficacy of targeted drug therapy, all require precise tumor localization and structural correlation information as a foundation. This directly relates to improving the safety and effectiveness of treatment.
[0003] Magnetic Resonance Imaging (MRI) is a non-invasive medical imaging technique based on the principle of nuclear magnetic resonance, which can clearly present information about the human body's anatomical structure and physiological function. It utilizes the magnetic resonance phenomenon of hydrogen nuclei in human brain tissue to capture differences in relaxation time and signal characteristics among different tissues, generating multi-parameter, multi-planar imaging data to clearly present the morphological boundaries, extent of invasion, and anatomical relationship with surrounding nerves and blood vessels of brain tumors. Although MRI possesses extremely high soft tissue resolution and the advantage of no ionizing radiation, providing crucial support for the localization, staging, and monitoring of treatment efficacy of brain tumors, the overlapping of imaging features due to tumor heterogeneity and the confusion between small lesions and artifacts still pose challenges to accurate quantitative analysis and early detection of small tumors.
[0004] YOLOv8, the latest object detection algorithm in the YOLO series launched by Ultralytics in 2023, achieves efficient object detection thanks to its four-part architecture consisting of an input network, a backbone network, a neck network, and a head network (Detect). However, in brain tumor detection scenarios, YOLOv8 struggles to capture the semantic relationship between small targets and the global scene, resulting in a high false detection rate and failing to meet the clinical need for precise localization. Summary of the Invention
[0005] This invention provides a method, apparatus, medium, and device for detecting brain tumor lesions, to solve the aforementioned problems in the prior art, namely, how to improve the detection accuracy of small brain tumor lesions in the prior art. This invention provides a method for detecting brain tumor lesions, the method comprising: Acquire imaging data of the brain tumor to be detected; Brain tumor imaging data is input into a pre-trained improved YOLOv8 detection model to obtain brain tumor detection results. The improved YOLOv8 detection model includes a backbone network, a neck network, and a head network. The backbone network includes a C2f module, two Conv+MambaC2f modules, and a Conv+C2f+SPPF module connected sequentially. The C2f module extracts the first-scale feature map from the brain tumor imaging data, and the Conv+MambaC2f module expands the dimensions of the brain tumor imaging data to determine the corresponding spatial dimension features. Contextual information fusion is performed to generate fused global features, and local features are extracted from brain tumor imaging data. The global and local features are then concatenated to determine a second-scale feature map. Deep features in the brain tumor imaging data are extracted using the Conv+C2f+SPPF module and multi-scale pooling is performed to determine a third-scale feature map. The neck network then fuses the first, second, and third-scale feature maps from bottom to top and top to determine a fused feature map. Finally, the head network performs target detection on the fused feature map to obtain the brain tumor detection results.
[0006] Optionally, before inputting the brain tumor image data into the pre-trained improved YOLOv8 detection model, the brain tumor image data is denoised using an improved U-Net network to determine the denoised brain tumor image data. The improved U-Net network adds a spatial attention module (SA) between the encoder and decoder of the original U-Net network. The encoder performs multiple downsampling and convolution operations on the brain tumor image data to extract low-frequency structural features. The SA module performs convolution operations on the low-frequency structural features to generate a single-channel spatial attention map, which is then weighted to determine a weighted feature map. The decoder upsamples the weighted feature map to determine the denoised brain tumor image data.
[0007] Optionally, the Conv+MambaC2f module replaces the C2f modules of the fourth and sixth layers in the original backbone network with the MambaC2f module.
[0008] Optionally, the MambaC2f module includes parallel Mamba branches, Bottleneck branches, and a feature fusion layer connected to both. The Mamba branches expand the dimensions of the brain tumor image data to determine the corresponding spatial features. The State Space Model (SSM) is used to fuse the spatial features with contextual information to generate fused global features. The Bottleneck branches extract local features from the brain tumor image data. The feature fusion layer concatenates the global and local features to determine the second-scale feature map.
[0009] Optionally, the construction of the state-space model (SSM) specifically includes: ; in, It is the input sequence. It is a sequence of hidden states at a certain moment. It is time The derivative of the hidden state. Here is the state transition matrix. The input matrix; ; in, It is the output sequence. For the output matrix, It is a mapping matrix.
[0010] This invention provides a brain tumor lesion detection device, comprising: The acquisition module is used to acquire imaging data of the brain tumor to be detected. A detection module is used to input brain tumor imaging data into a pre-trained improved YOLOv8 detection model to obtain brain tumor detection results. The improved YOLOv8 detection model includes a backbone network, a neck network, and a head network. The backbone network includes a C2f module, two Conv+MambaC2f modules, and a Conv+C2f+SPPF module connected in sequence. The C2f module extracts the first-scale feature map from the brain tumor imaging data, and the Conv+MambaC2f module expands the dimensions of the brain tumor imaging data to determine the corresponding spatial dimension features. The system integrates dimensional features with contextual information to generate a fused global feature map. Local features are then extracted from the brain tumor imaging data, and the global and local features are concatenated to determine a second-scale feature map. Deep features from the brain tumor imaging data are extracted using the Conv+C2f+SPPF module and multi-scale pooling is performed to determine a third-scale feature map. The neck network then integrates the first, second, and third-scale feature maps using bottom-up and top-down methods to determine the fused feature map. Finally, the head network performs target detection on the fused feature map to obtain the brain tumor detection results.
[0011] The present invention provides a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the above-described method for detecting brain tumor lesions.
[0012] The present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the above-mentioned brain tumor lesion detection method.
[0013] Compared to existing technologies, the beneficial effects of this invention are as follows: This invention provides a method for detecting brain tumor lesions. This method introduces the Mamba module into the backbone network of the YOLOv8 model to construct the MambaC2f module. Mamba performs feature calculation through the State-Space Model (SSM), which can better capture the feature information of small targets. Simultaneously, combined with local detail feature extraction, it significantly improves the feature discrimination capability of small targets, thereby effectively improving the detection accuracy of small targets such as brain tumors. Furthermore, the SSM of the Mamba module enables the model to capture long-distance spatial dependencies and semantic associations in images, better understanding the complex relationships between features in different regions of the image, which helps to accurately detect targets, thus improving the detection accuracy of small brain tumor lesions. Attached Figure Description
[0014] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with the invention and, together with the description, serve to explain the principles of the invention.
[0015] Figure 1 A flowchart of a brain tumor lesion detection method provided in an embodiment of the present invention; Figure 2 This is a diagram of the improved U-Net network architecture provided in an embodiment of the present invention; Figure 3 This is a diagram of the YOLOv8 model architecture integrated with Mamba provided in an embodiment of the present invention. Figure 4 The following is an architecture diagram of the Mamba module, MambaC2f module, C2f module, and fast space pyramid pooling SPPF module provided in the embodiments of the present invention; Figure 5 This is a schematic diagram of the recognition results provided in an embodiment of the present invention; Figure 6 A schematic diagram of a computer device for a brain tumor lesion detection method provided in an embodiment of the present invention. Detailed Implementation
[0016] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0017] The technical solution of the present invention and how the technical solution of the present invention solves the above-mentioned technical problems are described in detail below with specific embodiments. These specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments. The embodiments of the present invention will now be described with reference to the accompanying drawings.
[0018] Figure 1 This is a flowchart of a brain tumor lesion detection method provided in an embodiment of the present invention, such as... Figure 1 As shown in this embodiment, a method for detecting brain tumor lesions includes: S1: Acquire imaging data of the brain tumor to be detected.
[0019] For example, the present invention first uses the patient's brain tumor image data as input, and uses an improved U-Net network to denoise the raw data, removing redundant interference information in the image to obtain clearer denoised data; wherein, the improved U-Net network adds a spatial attention module SA to the bottleneck layer of the U-Net basic architecture, mainly including an encoder, a spatial attention module SA and a decoder.
[0020] Specifically, the encoder filters high-frequency noise from the image through multiple downsampling and convolutions, preserving low-frequency true structural features. The spatial attention map (SA) generates a single-channel spatial attention map using 1×1 convolutions, which is then normalized with a sigmoid function to obtain weights. These weights are multiplied by the feature map to enhance tumor region features and suppress background noise. The decoder upsamples low-resolution semantic features to their original size and uses skip connections to stitch together high-resolution detail features from the encoder, compensating for detail loss during upsampling. Finally, a denoising result with the same size and channel as the input is output via a 1×1 convolution.
[0021] Introducing an improved U-Net network as a front-end module leverages its encoder-decoder architecture and skip connections to adaptively suppress Gaussian noise and artifacts through a data-driven approach. Simultaneously, it utilizes multi-scale feature fusion to repair key details such as edges and contours of small targets, enhancing the distinction between targets and background while preserving the integrity of effective features. This process provides high signal-to-noise ratio input to subsequent detection networks, allowing them to focus more on target feature recognition and localization, thus improving overall performance in complex scenes.
[0022] S2: Input brain tumor imaging data into a pre-trained improved YOLOv8 detection model to obtain brain tumor detection results; the improved YOLOv8 detection model includes a backbone network, a neck network, and a head network; wherein, the backbone network includes a C2f module, two Conv+MambaC2f modules, and a Conv+C2f+SPPF module connected in sequence; the first-scale feature map in the brain tumor imaging data is extracted through the C2f module, and the dimensionality of the brain tumor imaging data is expanded through the Conv+MambaC2f module to determine the corresponding spatial dimension features, and the spatial dimension is... The system performs contextual information fusion to generate a fused global feature map, and extracts local features from the brain tumor image data. The global and local features are then concatenated to determine the second-scale feature map. Deep features from the brain tumor image data are extracted using the Conv+C2f+SPPF module and multi-scale pooling is performed to determine the third-scale feature map. The neck network then performs bottom-up and top-down feature fusion on the first, second, and third-scale feature maps to determine the fused feature map. Finally, the head network performs target detection on the fused feature map to obtain the brain tumor detection results.
[0023] For example, the denoised brain tumor image data is input into the improved YOLOv8 detection model (a YOLOv8 detection model that integrates the Mamba module). The model's feature extraction capability accurately captures the feature information of the brain tumor, and finally outputs the detection result of the brain tumor.
[0024] The specific model architecture of the improved YOLOv8 detection model proposed in this invention is as follows: Figure 3 As shown, the model is mainly divided into four parts: input, backbone, neck, and head (detect), with the overall architecture being YOLOv8. The improved YOLOv8 model replaces the C2f layers in the fourth and sixth layers of the original YOLOv8 model with MambaC2f modules.
[0025] like Figure 4 As shown, the MambaC2f module replaces some Bottleneck branches in the C2f module with Mamba branches, thus retaining both local details and global dependencies, making it suitable for medical image detection. Mamba is a pure Mamba feature extraction unit, not combined with Conv. The C2f module is the original YOLOv8 feature fusion module, enhancing feature representation through multi-branch residual fusion.
[0026] Mamba first expands the dimensionality of the input feature map using linear layers to enhance the model's expressive power, then processes it through a State-Space Model (SSM). The SSM first maps the features to dynamic parameters, then uses wavefront scanning to traverse each position of the feature map diagonally, allowing the state at each position to incorporate contextual information from its left and top, while dynamically adjusting the state decay and update intensity. The processed state is then concatenated with the original feature residuals, and finally, linear layers compress the features back to their original dimensions.
[0027] The original C2f module first performs dimensionality adjustment and feature extraction on the input features using convolution, batch normalization (BN), and SiLU. Then, it splits the module into two branches based on channel dimension: one branch directly retains the original features for residual connections, and the other branch concatenates n Bottlenecks. Each Bottleneck is a bottleneck structure consisting of two concatenated convolutional blocks. When `shortcut` is False, there are no residual connections; when `shortcut` is True, residual connections are used, adding the original feature map and the feature map after the two convolutional blocks element-wise before outputting. The output of the previous Bottleneck becomes the input of the next Bottleneck, and all intermediate outputs of the Bottlenecks are concatenated with the shortcut branches channel-wise. Finally, a Conv function adjusts the number of channels in the concatenated array to match the input dimension, completing feature compression and integration.
[0028] The MambaC2f module retains the basic structure of the original C2f module, replacing the originally fully cascaded Bottleneck with an alternating appearance of Mamba and Bottleneck modules.
[0029] The SPPF module first performs a convolution on the input feature map to adjust the number of channels, then performs multiple max pooling operations in parallel to obtain features with different receptive fields. Next, these pooling results are concatenated with the original features along the channel dimension to fuse multi-scale spatial information. Finally, a single convolution is used to integrate the concatenated features, ensuring the output dimension matches the input. Through convolution, multiple max pooling operations, concatenation, and then convolutional integration, efficient aggregation of multi-scale features is achieved.
[0030] For example, a state-space model (SSM) is a model that predicts what its next state might be based on certain inputs. A state-space model can only accept sequential data and cannot accept discrete data. The SSM formulas are shown in formulas (1)-(2):
[0031] (1) in, It is the input sequence. It is a sequence of hidden states at a certain moment. It is time The derivative of the hidden state. The state transition matrix describes how the previously hidden states naturally evolve over time. For the input matrix, the input... Injection status.
[0032] (2) in, It is the output sequence. The output matrix maps the states to the final output. It is a mapping matrix, usually It can be omitted.
[0033] Because the SSM can only transmit sequential data and cannot receive unserialized image data, it needs to be discretized. This discretization process uses the zero-order hold (ZOH) formula for the SSM parameters to convert continuous-time parameters A and B into discrete-time parameters. , The details are as follows.
[0034] (exp(∆A)-I).∆B(3) Here, ∆ is introduced as an additional learnable parameter to control the step size or sampling interval of continuous parameters. ∆ can also be considered an attention mechanism, used to control the degree to which a specific sample in the sequence is "remembered". exp(x) is the exponential function ex, and I is the identity matrix. Let represent the inverse matrix of ∆A. After the discretization step, the formula is:
[0035] (4) (5) Among them, h t-1 This represents the state vector at time t-1.
[0036] The HiPPO theory, which utilizes continuous-time memory, is used as the initialization mechanism for the parameters of matrix A. The HiPPO calculation formula is as follows: (6) The A matrix can be obtained through this calculation formula. The hidden state at each location is iteratively calculated using the discretized A matrix by scanning the image pixels with a wavefront. The iteratively obtained hidden states are then converted into features using the output matrix C and fused with the original input.
[0037] The above are one or more embodiments of the brain tumor lesion detection method provided in this specification. Based on the same idea, this specification also provides a corresponding brain tumor lesion detection device, including: The acquisition module is used to acquire imaging data of the brain tumor to be detected. A detection module is used to input brain tumor imaging data into a pre-trained improved YOLOv8 detection model to obtain brain tumor detection results. The improved YOLOv8 detection model includes a backbone network, a neck network, and a head network. The backbone network includes a C2f module, two Conv+MambaC2f modules, and a Conv+C2f+SPPF module connected in sequence. The C2f module extracts the first-scale feature map from the brain tumor imaging data, and the Conv+MambaC2f module expands the dimensions of the brain tumor imaging data to determine the corresponding spatial dimension features. The system integrates dimensional features with contextual information to generate a fused global feature map. Local features are then extracted from the brain tumor imaging data, and the global and local features are concatenated to determine a second-scale feature map. Deep features from the brain tumor imaging data are extracted using the Conv+C2f+SPPF module and multi-scale pooling is performed to determine a third-scale feature map. The neck network then integrates the first, second, and third-scale feature maps using bottom-up and top-down methods to determine the fused feature map. Finally, the head network performs target detection on the fused feature map to obtain the brain tumor detection results.
[0038] Specific limitations regarding the brain tumor lesion detection device can be found in the limitations of the brain tumor lesion detection method described above, and will not be repeated here. Each module in the aforementioned brain tumor lesion detection device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in the processor of a computer device in hardware form or independent of it, or stored in the memory of a computer device in software form, so that the processor can call and execute the corresponding operations of each module.
[0039] The present invention also provides a computer-readable storage medium storing a computer program that can be used to execute the brain tumor lesion detection method provided above.
[0040] The present invention also provides Figure 6 The schematic diagram of the computer device shown is as follows: Figure 6 As shown, at the hardware level, the computer device includes a processor, an internal bus, a network interface, memory, and non-volatile memory, and may also include other hardware required for business operations. The processor reads the corresponding computer program from the non-volatile memory into memory and then runs it to implement the brain tumor lesion detection method provided in the above embodiment.
[0041] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the methods described above. Any references to memory, storage, databases, or other media used in the embodiments provided by this invention can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, or optical storage, etc. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM can be in various forms, such as static random access memory (SRAM) or dynamic random access memory (DRAM), etc.
[0042] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this invention.
Claims
1. A method for detecting brain tumor lesions, characterized in that, include: Acquire imaging data of the brain tumor to be detected; Brain tumor imaging data is input into a pre-trained improved YOLOv8 detection model to obtain brain tumor detection results. The improved YOLOv8 detection model includes a backbone network, a neck network, and a head network. The backbone network includes a C2f module, two Conv+MambaC2f modules, and a Conv+C2f+SPPF module connected in sequence. The C2f module extracts a first-scale feature map from the brain tumor imaging data. The Conv+MambaC2f module expands the dimensions of the brain tumor imaging data to determine the corresponding spatial dimension features. Contextual information is fused to the spatial dimension features to generate a fused global feature. Local features are extracted from the brain tumor imaging data, and the global and local features are concatenated to determine a second-scale feature map. The Conv+C2f+SPPF module extracts deep features from the brain tumor imaging data and performs multi-scale pooling to determine a third-scale feature map. The neck network fuses the first, second, and third-scale feature maps from bottom to top and top to determine the fused feature map. The head network is used to perform target detection on the fused feature map to obtain the detection results of brain tumors.
2. The method for detecting brain tumor lesions as described in claim 1, characterized in that, Before inputting the brain tumor image data into the pre-trained improved YOLOv8 detection model, the brain tumor image data is denoised using an improved U-Net network to determine the denoised brain tumor image data. The improved U-Net network adds a spatial attention module (SA) between the encoder and decoder of the original U-Net network. The encoder performs multiple downsampling and convolution operations on the brain tumor image data to extract low-frequency structural features. The SA module then performs convolution operations on the low-frequency structural features to generate a single-channel spatial attention map, which is then weighted to determine a weighted feature map. Finally, the decoder upsamples the weighted feature map to determine the denoised brain tumor image data.
3. The method for detecting brain tumor lesions as described in claim 1, characterized in that, The Conv+MambaC2f module replaces the C2f modules in the fourth and sixth layers of the original backbone network with the MambaC2f module.
4. The method for detecting brain tumor lesions as described in claim 1, characterized in that, The MambaC2f module includes parallel Mamba branches, Bottleneck branches, and a feature fusion layer connected to both. The Mamba branches expand the dimensions of brain tumor image data to determine the corresponding spatial features. The State Space Model (SSM) is used to fuse the spatial features with contextual information to generate fused global features. The Bottleneck branches extract local features from the brain tumor image data. The feature fusion layer concatenates the global and local features to determine the second-scale feature map.
5. The method for detecting brain tumor lesions as described in claim 1, characterized in that, The construction of the state-space model SSM specifically includes: ; in, It is the input sequence. It is a sequence of hidden states at a certain moment. It is time The derivative of the hidden state. Here is the state transition matrix. The input matrix; ; in, It is the output sequence. For the output matrix, It is a mapping matrix.
6. A brain tumor lesion detection device, characterized in that, include: The acquisition module is used to acquire imaging data of the brain tumor to be detected. The detection module is used to input brain tumor imaging data into a pre-trained improved YOLOv8 detection model to obtain brain tumor detection results. The improved YOLOv8 detection model includes a backbone network, a neck network, and a head network. The backbone network includes a C2f module, two Conv+MambaC2f modules, and a Conv+C2f+SPPF module connected in sequence. The C2f module extracts the first-scale feature map from the brain tumor imaging data, and the Conv+MambaC2f module expands the dimensionality of the brain tumor imaging data. The process involves: identifying corresponding spatial dimension features; fusing contextual information on these features to generate a fused global feature map; extracting local features from brain tumor imaging data; and concatenating the global and local features to determine a second-scale feature map. The Conv+C2f+SPPF module is then used to extract deep features from the brain tumor imaging data, followed by multi-scale pooling to determine a third-scale feature map. Finally, the neck network is used to fuse the first, second, and third-scale feature maps using both bottom-up and top-down methods to determine the fused feature map. The head network is used to perform target detection on the fused feature map to obtain the detection results of brain tumors.
7. A computer-readable storage medium, characterized in that, The storage medium stores a computer program, which, when executed by a processor, implements the brain tumor lesion detection method according to any one of claims 1-5.
8. A computer device, characterized in that, The method includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the brain tumor lesion detection method according to any one of claims 1-5.