Task-driven brain tumor multi-modal MRI image fusion method and system
The task-driven multimodal MRI image fusion system for brain tumors combines convolutional neural networks and Transformer coding modules to extract features, fuses and optimizes the generator to produce fused images suitable for downstream tasks, solving the problem of insufficient semantic information capture in high-level visual tasks in existing technologies and improving the accuracy of tumor segmentation and classification.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- QIDONG FUDAN INSTITUTE OF MEDICAL INNOVATION
- Filing Date
- 2026-02-13
- Publication Date
- 2026-06-23
AI Technical Summary
Existing multimodal MRI image fusion methods for brain tumors fail to effectively capture high-order semantic information and neglect semantic information from advanced visual tasks, resulting in negative impacts on downstream applications. Furthermore, deep learning models rely on cumbersome manual parameter tuning and have high computational costs, making them difficult to widely apply in clinical brain tumor auxiliary diagnosis.
A task-driven multimodal MRI image fusion system for brain tumors is adopted. Local and global features are extracted through parallel convolutional neural networks and Transformer coding modules, and reconstructed by bidirectional connection feature fusion modules and decoders. A discriminator is used to integrate a downstream task recognition network, and a generator is optimized by a joint loss function to generate fused images suitable for downstream tasks.
It improves the quality of multimodal MRI image fusion and the accuracy of downstream tasks, breaks through the bottleneck of model accuracy, realizes high-quality fused image applications, and supports more accurate tumor segmentation and classification.
Smart Images

Figure CN122265048A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of medical image processing technology, specifically relating to a task-driven method and system for multimodal MRI image fusion of brain tumors. More specifically, it is a task-driven network for multimodal MRI image fusion of brain tumors. Background Technology
[0002] In recent years, multimodal imaging and artificial intelligence technologies have become one of the most popular methods for directly acquiring multimodal MRI features from images and establishing reliable and robust models by analyzing and mining the relationship between these features and diseases to accomplish tasks such as clinical brain tumor screening, assisted subtype identification, molecular classification, and personalized treatment. [5] .
[0003] However, the complex structure, numerous subtypes, and similar imaging and clinical symptoms of brain tumors limit the clinical application of existing methods. Due to the abundance of brain MRI sequences, brain tumors exhibit significant variations in shape, size, and location on MRI images, often with indistinct tumor boundaries and complex internal structures. Clinicians must repeatedly switch between and compare information from different imaging sequences, increasing their workload and impacting their efficiency and accuracy. [6] .
[0004] Therefore, how to comprehensively utilize patients' clinical data, imaging findings, and biochemical indicators for integrated modeling and analysis to help doctors detect and diagnose different subtypes of brain tumors earlier, optimize patient treatment plans, and achieve personalized and precise treatment has become the most closely integrated research direction and an urgent clinical problem in the field of medical imaging analysis and clinical brain tumor diagnosis and treatment.
[0005] Currently, radiologists face a heavy workload in interpreting images, requiring them to repeatedly switch between and compare information on organs, tissues, and anatomical structures displayed by different imaging sequences. [7] Image fusion technology can extract and fuse information on lesion shape, tissue structure, and relative spatial location from multimodal MRI, generating new images with higher resolution and more information, providing more comprehensive and accurate medical imaging information. Therefore, fusing multiple tumor tissue structures and spatial location information from multimodal MRI images of brain tumors to generate fused images and overcome bottlenecks in downstream tasks is crucial. Traditional image fusion methods typically follow a "decomposition-fusion-reconstruction" processing principle, mostly targeting single-scale image fusion, and are not suitable for multimodal medical image fusion tasks. In contrast, the Laplace pyramid (LP)... [8] Multiscale image fusion methods, such as decomposition and wavelet transform, are more widely used in multimodal medical image fusion.[9] .
[0006] However, these methods often result in overly smoothed images, leading to a loss of texture information and potentially causing problems such as halos and residual artifacts. Given the significant performance of sparse representation (SR) in computer vision and image processing...
[0007] Deep learning-based image fusion methods can automatically learn fusion rules and have stronger robustness, making them widely used in the field of image fusion.
[10] While convolution operations are effective at extracting local features, they are less effective at extracting global information, leading to lower image sharpness in fused images. To improve image sharpness, researchers have used GANs for image fusion. Ma et al.
[11] This was the first time GANs were introduced into image fusion tasks, but this approach suffers from severe blurring and distortion problems. Furthermore, the generators of GANs typically use simple single-scale networks for feature fusion, leading to loss of image information and over-smoothing. With the successful application of Transformer models in natural language processing (NLP), researchers began to consider applying their powerful feature extraction and relation modeling capabilities to image fusion tasks.
[12]
[13] Existing image fusion methods primarily focus on improving the overall image quality and visual effect of the fused image, neglecting the more important preservation of texture details in lesion areas and the contrast between diseased tissues. This has prevented the fused images from being applied to downstream tasks such as ROI segmentation and disease classification, thus hindering their ability to overcome bottlenecks in downstream tasks.
[0008] In multimodal medical image segmentation, precise localization and segmentation of the region of interest in the lesion is the first step in the diagnosis and treatment of brain tumors. Precise segmentation of brain tumors based on multimodal MRI can help doctors pinpoint the tumor's location and its relationship to surrounding normal tissues, quantify its size and shape, and further assess its biological characteristics and degree of invasion. This is crucial for achieving personalized medicine and precision treatment.
[14] Early multimodal MRI segmentation of brain tumors relied primarily on manual work by specialists. Later, researchers proposed a series of fast and repeatable machine learning (ML) segmentation methods to improve efficiency. However, the uneven intensity, large contrast variations, and noise in MRI images pose significant challenges to accurate lesion segmentation.
[0009] With the development of deep learning, U-Net, a symmetric encoder-decoder structure with skip connections, has emerged.
[15] The Transformer model is widely used in medical image segmentation tasks. Compared to traditional CNN models, the Transformer model, due to its self-attention mechanism, can achieve global correlation, making information transfer between different locations more efficient and parallel, and has been widely applied in the field of image segmentation. Unlike CNN methods, Transformers are not only very powerful in modeling global contextual features, but also show excellent transferability in large-scale pre-training. Xing et al.
[16] A nested modality-aware Transformer is proposed, which comprehensively considers the feature relationships within and between modalities to improve segmentation performance.
[0010] While Transformer-based methods have made some progress in tumor segmentation, these methods are typically computationally expensive, and the pre-training process is cumbersome and time-consuming. Furthermore, researchers need to manually select the number of layers, the type of each layer, the connection methods, and other hyperparameters, a process often reliant on experience and trial and error. To address these issues, image segmentation research has begun to shift its focus to AutoML, which enables the automatic design of network structures.
[17] Yan et al.
[18] A multi-scale NAS framework with a multi-scale search space is proposed. Through a partial channel connection scheme and a two-step decoding method, the computational overhead caused by the large search space can be mitigated. Existing deep learning models are designed for specific tasks, and their development, design, and manual parameter tuning typically require extensive expertise and experience. This is not only tedious and time-consuming but also limits the application of deep learning models in the auxiliary diagnosis of clinical brain tumors.
[0011] Regarding multimodal medical image-assisted judgment, with the work of Lambin et al.
[19] Introducing radiomics into the field of medical image-assisted interpretation has promoted the application of machine learning in medical image analysis. Radiomics methods can capture image shape, grayscale, texture, and wavelet features from the Region of Interest (ROI) of medical images for analysis and modeling to accomplish specific tasks. In recent years, the rapid development and application of deep learning have enabled researchers to directly predict tumor subtypes from clinical MRI images using deep learning methods, making it a widely used approach.
[20]
[21] Ghassem et al.
[22] GANs are then used to perform tumor classification tasks. In addition, networks such as MobileNetV3, VGG16, and ResNet have also been trained for brain tumor subtype classification.
[0012] Due to differences in parameters and training processes, CNN models primarily focus on local features and weight sharing to obtain texture information of tumors, which may lead to inductive bias and reduce classification results. Since the advent of Transformer models, their excellent ability to represent the intrinsic dependencies of multimodal data has attracted widespread attention in brain disease research. Tummala et al.
[23] Using finely tuned ViT to identify brain tumors provides radiologists with prior knowledge for decision-making. Aloraini et al.
[24] A dual-path approach combining CNN and Transformer was established for local and global feature extraction, improving brain tumor classification results. Considering that Transformers rely on large-scale datasets for self-attention computation, Ferdous et al.
[25] This study utilizes teacher-student strategies and external attention mechanisms to perform low-complexity Transformer computations for rapid brain tumor classification. However, different subtypes of brain tumors exhibit small inter-class differences but large intra-class differences. This means that subtle changes in tumor tissue play a decisive role in the classification process, and the presentation of the same tumor category varies significantly across different patients' brains. Existing brain tumor classification algorithms mostly focus on global or local features within images, failing to effectively mine and utilize the correlations between multimodal images, labels, and between images and labels to optimize tumor feature representation and further improve the accuracy of image-assisted judgment and classification of brain tumors.
[0013] Regarding its application prospects and technological innovation, multimodal MRI, as an important component of modern medical imaging technology, has garnered widespread attention and successful application in the early diagnosis and treatment of brain tumors. In recent years, with the development of computer vision technology, AI-based image analysis technology has rapidly developed and been widely applied in brain metastasis research. An increasing number of studies utilize multimodal MRI to acquire information such as the size, shape, density, and spatial relative location of brain metastases, constructing automated models for early screening, lesion segmentation, and assisted diagnosis of brain tumors, providing a scientific basis for clinical brain tumor imaging classification.
[0014] However, existing methods have obvious shortcomings: (1) There is no expert consensus to guide the collection and annotation of clinical brain tumor MRI data, and a usable multimodal MRI standard database for brain tumors has not been formed. (2) As the primary task of tumor lesion segmentation in brain tumor image-assisted classification, radiomics models rely on manual annotation to obtain regions of interest; although deep learning models can automatically segment lesions, the segmentation models rely on massive amounts of labeled data and complex network structures designed by experts, resulting in huge computational demands. (3) Existing brain tumor classification models are easily affected by training data and have poor generalization ability; the models usually omit the correlation between different local features and the spatial relationship between them and global features, and have not achieved satisfactory results for more refined subtype predictions. (4) At present, brain tumor image-assisted models are still in the laboratory research stage and no clinically usable brain tumor-assisted engineering prototypes have been formed.
[0015] The references are as follows: [5] Zhang R, Yang Y, Hu C, et al. Comprehensive analysis reveals potential therapeutic targets and an integrated risk stratification model for solitary fibrous tumors [J]. Nature Communications, 2023, 14 (1): 7479. [6] Affia Aa O, Finch H, Jung W, et al. IoT health devices:exploring security risks in the connected landscape [J]. IoT, 2023, 4 (2):150–182. [7] Park SH, Han K, Jang HY, et al. Methods for clinical evaluation of artificial intelligence algorithms for medical diagnosis [J]. Radiology, 2023, 306 (1): 20–31. [8] Du J, Li W, Xiao B, et al. Union Laplacian pyramid with multiplefeatures for medical imagefusion [J]. Neurocomputing, 2016, 194: 326–339. [9] Li H, Manjunath B, Mitra S. Multisensor image fusion using thewavelet transform [J]. Graphical Models and Image Processing, 1995, 57 (3):235–245.
[10] Liu Y, Chen X, Cheng J, et al. A medical image fusion methodbased on convolutional neural networks [C]. In Proceedings of theInternational Conference on Information Fusion, 2017:1–7.
[11] Ma J, Zhou Z, Wang B, et al. Infrared and visible image fusionbased on visual saliency map and weighted least square optimization [J].Infrared Physics&Technology, 2017, 82: 8–17.
[12] Tang W, He F, Liu Y, et al. MATR: multimodal medical imagefusion via multiscale adaptivetransformer [J]. IEEE Transactions on ImageProcessing, 2022, 31: 5134–5149.
[13] Rao D, Wu X-J, Xu T. TGFuse: an infrared and visible imagefusion approach based on transformer and generative adversarial network [J].arXiv preprint arXiv:2201.10147, 2022.
[14] Wang S, Zhou M, Liu Z, et al. Central focused convolutionalneural networks: developing data-driven model for lung nodule segmentation[J]. Medical Image Analysis, 2017, 40: 172–183.
[15] Ronneberger O, Fischer P, Brox T. U-net: convolutional networksfor biomedical image segmentation [C]. In Proceedings of the Medical ImageComputing and Computer-Assisted Intervention, 2015: 234–241.
[16] Xing Z, Yu L, Wan L, et al. NestedFormer: nested modality-awaretransformer for brain tumor segmentation [C]. In Proceedings of the MedicalImage Computing and Computer-AssistedIntervention, 2022: 140–150.
[17] Wei J, Fan Z. Genetic U-net: automatically designing lightweightu-shaped cnn architectures using the genetic algorithm for retinal vesselsegmentation [J]. arXiv preprint arXiv:2010.15560,2020.
[18] Yan X, Jiang W, Shi Y, et al. MS-NAS: multi-scale neuralarchitecture search for medical image segmentation [C]. In Proceedings of theMedical Image Computing and Computer AssistedIntervention, 2020: 388–397.
[19] Lambin P, Rios-Velazquez E, Leijenaar R, et al. Radiomics:extracting more information from medical images using advanced featureanalysis [J]. European Journal of Cancer, 2012, 48 (4):441–446.
[20] Saydirasulovich S N, Mukhiddinov M, Djuraev O, et al. An improvedwildfire smoke detection based on YOLOv8 and UAV images [J]. Sensors, 2023,23 (20): 8374.
[21] Habib H, Amin R, Ahmed B, et al. Hybrid algorithms for braintumor segmentation, classification, and feature extraction [J]. Journal ofAmbient Intelligence and Humanized Computing,2022: 1–22.
[22] Ghassemi N, Shoeibi A, Rouhani M. Deep neural network withgenerative adversarial networks-training for brain tumor classification basedon MR images [J]. Biomedical Signal Processing and Control, 2020, 57: 101678.
[23] Tummala S, Kadry S, Bukhari S A C, et al. Classification ofbrain tumor from magnetic resonance imaging using vision transformersensembling [J]. Current Oncology, 2022, 29 (10):7498–7511.
[24] Aloraini M, Khan A, Aladhadh S, et al. Combining the transformerand convolution for effective brain tumor classification using MRI images[J]. Applied Sciences, 2023, 13 (6): 3680.
[25] Ferdous GJ, Sathi KA, Hossain MA, et al. LCDEiT: a linearcomplexity data-efficient image transformer for MRI brain tumor classification [J]. IEEE Access, 2023, 11: 20337–20350. Patent document CN115620892A discloses a method and system for fusing multimodal MRI images of brain tumors, including: acquiring multimodal MRI data of brain tumors with multiple targets and preprocessing it; constructing a joint network for generating multimodal MRI fusion images and segmenting brain tumors, and performing feature extraction and feature fusion on the multimodal MRI images; constructing the relationship between regions of interest and background, and constraining the contrast between different groups of brain tumors and between brain tumors and normal brain tissue through a significant loss function; using a deep supervision mechanism between U-shaped sub-networks to convolve the output features of the sub-networks to obtain the fused image; identifying the fused image until the loss function tends to stabilize, saving the final generation network model, and generating multimodal MRI fusion images of brain tumors.
[0016] This method establishes connections between tasks solely through a loss function, limiting the fusion model's ability to capture high-order semantic information. It overemphasizes domain differences between low-level visual tasks while neglecting the semantic information required for high-level visual tasks, leading to technical problems where the fusion results negatively impact downstream applications.
[0017] This problem urgently needs to be solved. Summary of the Invention
[0018] To address the shortcomings of existing technologies, the purpose of this invention is to provide a task-driven method and system for multimodal MRI image fusion of brain tumors.
[0019] A task-driven multimodal MRI image fusion system for brain tumors, according to the present invention, includes: Feature extraction steps: Input the multimodal brain tumor MRI image into the generator, and extract the local and global features of the multimodal brain tumor MRI image through the parallel convolutional neural network encoding module and the Transformer encoding module, respectively. Feature fusion step: The local features and global features are fused through a bidirectional connection feature fusion module, and the fused image is reconstructed by a decoder. Task-driven steps: The fused image is input to a discriminator, which integrates at least one downstream task recognition network. The downstream task recognition network performs a preset downstream task on the fused image and incurs task performance loss. Joint optimization step: Construct and optimize the generator and the discriminator through a joint loss function, thereby enabling the generator to generate a fused image suitable for the downstream task.
[0020] Preferably, in the feature extraction step, the local features of the multimodal brain tumor MRI image are expressed as follows:
[0021] in, Indicates the first The output features of each convolutional neural network coding module, i.e., the local features of multimodal brain tumor MRI images; Indicates downsampling; Indicates the first One Resblock module operation; The global features of the multimodal brain tumor MRI image are expressed as follows:
[0022] Among them, In the formula, Indicates the first The output features of each Transformer coding module, i.e., the global features of multimodal brain tumor MRI images; This indicates a position rearrangement operation; Indicates the number of multi-head attention modules; Indicates the Transformer encoding module; This represents the input characteristics of the Transformer encoding module.
[0023] Preferably, in the feature fusion step, the local features and global features are fused through a bidirectional connection feature fusion module, including: Step A1: The fusion module fuses the local features and global features to obtain intermediate features; Step A2: Weight the intermediate features to obtain the enhanced fused features; Step A3: Re-divide the enhanced fused features into two features along the channel dimension. and Then, the features are weighted by convolutional blocks and MLP blocks respectively and fed back to the parallel convolutional neural network encoding module and Transformer encoding module to complete feature fusion and obtain a preliminary fused image; , ;exist , middle, These are the width, height, and number of channels of the input multimodal MRI. Represent the feature space; The expression for the intermediate feature is:
[0024] in, Indicates intermediate features. Indicates feature concatenation operation; The expression for the enhanced fused features is:
[0025] in, This indicates the enhanced fusion features. This represents an element-wise addition operation; Indicates feature fusion operation; The process of reconstructing and generating the fused image via the decoder includes: Step B1: Construct a decoder; the depth of the convolutional neural network in the decoder is the same as the depth of the convolutional neural network in the generator; Step B2: Based on the preliminary fused image, obtain the output of each convolutional neural network module through the decoder, and then pass it through a 1 A convolutional layer of 1 yields the final fused image; The output of the convolutional neural network module is expressed as follows:
[0026] in, Indicates the first The output of each convolutional neural network module Indicates upsampling; Indicates Resblock module operations; Indicates the first The input to each convolutional neural network module; The first The input to each convolutional neural network module is expressed as:
[0027] in, This indicates that the lowest-level fusion features are obtained by sampling through a U-shaped network; Indicates the first The input to the encoding module of a convolutional neural network.
[0028] Preferably, the task-driven step includes: Step C1: Input the fused image into the tumor allocation discriminator to obtain the acquired features. ; Step C2: Feature-based Obtain tumor classification results; Step C3: Input the fused image into the tumor segmentation discriminator to obtain the segmentation result; The features The expression is:
[0029] in, Represents a 5x5 convolution ; Represents 3x3 convolution ; Indicates the input to the tumor allocation discriminator; footnote Represents 3x3 convolution The quantity is three; footnote Represents a 5x5 convolution The number is 8; * means product.
[0030] The expression for the tumor classification result is:
[0031] in, Indicates the tumor classification results; This represents a 1x1 convolutional layer. The asterisk (*) indicates a max pooling operation; the asterisk (*) indicates a product.
[0032] The expression for the segmentation result is:
[0033] in, Indicates the segmentation result. ;exist middle, These represent the width, height, and number of channels of the mask, respectively. The number of convolution kernels, express Activation function; This represents the output of the tumor segmentation discriminator's depth supervision.
[0034] Preferably, in the joint optimization step, the joint loss includes generator loss and discriminator loss; The generator loss is expressed as follows:
[0035] in, Indicates generator loss. Indicates significant loss. Indicates texture loss, This represents the similarity loss; The expression for the significance loss is:
[0036] Among them, In the formula, express The subsequent feature channel retrieval sequence number, It is the total number of channels for this feature. express function, Represents the 1-norm operation; The source image features the deepest convolutional features of the generator encoder; This indicates the convolutional features at the lowest level of the discriminator in the fused image; The expression for the texture loss is:
[0037] in, and Both represent shallow features of the generator decoder of the source image; The expression for the similarity loss is:
[0038] Among them, In the formula, This indicates the number of labels contained in the mask. Indicates the ordinal number of the label; footnote They are pixels In the n The grayscale value of a modality and the pixel grayscale value of that pixel in the fused image; This represents the mean operation; Indicates variance operation; This indicates the operation of computational structure measurement indicators; and All are constants; The discriminator loss is defined as having Class splitting tags The segmentation task loss of the fused image, in the segmentation label In the fused image, Indicates the width of the mask. The height of the mask is represented by the expression for the segmentation task loss:
[0039] Among them, In the formula, This represents the loss for splitting tasks, where "task" represents the task itself. Indicates the number of sample images. This indicates the total number of categories for the classification task; Indicates the category, i.e., the ordinal number of the label; This represents the category weight parameter; These represent the x and y coordinates of a pixel, respectively. It is a pixel. Category The true value, It is a pixel. Category The predicted probability value.
[0040] A task-driven multimodal MRI image fusion system for brain tumors, according to the present invention, includes: Feature extraction module: Input multimodal brain tumor MRI images into the generator, and extract local and global features of the multimodal brain tumor MRI images through parallel convolutional neural network encoder and Transformer encoder, respectively; Feature fusion module: The local and global features are fused through a bidirectional connection feature fusion module, and the fused image is reconstructed by the decoder. Task-driven module: Inputs the fused image to the discriminator, which integrates at least one downstream task recognition network. The downstream task recognition network performs a preset downstream task on the fused image and generates task performance loss. Joint optimization module: Constructs and optimizes the generator and the discriminator through a joint loss function, thereby generating a fused image suitable for the downstream task.
[0041] Preferably, in the feature extraction module, the local features of the multimodal brain tumor MRI image are expressed as follows:
[0042] in, Indicates the first The output features of each convolutional neural network coding module, i.e., the local features of multimodal brain tumor MRI images; Indicates downsampling; Indicates the first One Resblock module operation; The global features of the multimodal brain tumor MRI image are expressed as follows:
[0043] Among them, In the formula, Indicates the first The output features of each Transformer coding module, i.e., the global features of multimodal brain tumor MRI images; This indicates a position rearrangement operation; Indicates the number of multi-head attention modules; Indicates the Transformer encoding module; This represents the input characteristics of the Transformer encoding module.
[0044] Preferably, in the feature fusion module, the local features and global features are fused through a bidirectional connection feature fusion module, including: Module A1: The fusion module fuses the local and global features to obtain intermediate features; Module A2: Weighted intermediate features are used to obtain enhanced fused features; Module A3: Re-divides the enhanced fused features into two features along the channel dimension. and Then, the features are weighted by convolutional blocks and MLP blocks respectively and fed back to the parallel convolutional neural network encoding module and Transformer encoding module to complete feature fusion and obtain a preliminary fused image; , ;exist , middle, These are the width, height, and number of channels of the input multimodal MRI. Represent the feature space; The expression for the intermediate feature is:
[0045] in, Indicates intermediate features. Indicates feature assembly operation; The expression for the enhanced fused features is:
[0046] This indicates the enhanced fusion features. This represents an element-wise addition operation; Indicates feature fusion operation; The process of reconstructing and generating the fused image via the decoder includes: Module B1: Constructing a decoder; the depth of the convolutional neural network in the decoder is the same as the depth of the convolutional neural network in the generator; Module B2: Based on the preliminary fused image, the output of each convolutional neural network module is obtained through the decoder, and then processed through a 1 A convolutional layer of 1 yields the final fused image; The output of the convolutional neural network module is expressed as follows:
[0047] in, Indicates the first The output of each convolutional neural network module Indicates upsampling; Indicates Resblock module operations; Indicates the first The input to each convolutional neural network module; The first The input to each convolutional neural network module is expressed as:
[0048] in, This indicates that the lowest-level fusion features are obtained by sampling through a U-shaped network; Indicates the first The input to the encoding module of a convolutional neural network.
[0049] Preferably, the task-driven module includes: Module C1: Inputs the fused image into the tumor allocation discriminator to obtain the acquired features. ; Module C2: Feature-based Obtain tumor classification results; Module C3: Inputs the fused image into the tumor segmentation discriminator to obtain the segmentation result; The features The expression is:
[0050] in, Represents a 5x5 convolution ; Represents 3x3 convolution ; Indicates the input to the tumor allocation discriminator; footnote Represents 3x3 convolution The quantity is three; footnote Represents a 5x5 convolution The quantity is 8; The expression for the tumor classification result is:
[0051] in, Indicates the tumor classification results; This represents a 1x1 convolutional layer. This represents the max pooling operation; The expression for the segmentation result is:
[0052] in, Indicates the segmentation result. ;exist middle, These represent the width, height, and number of channels of the mask, respectively. The number of convolution kernels, express Activation function; This represents the output of the tumor segmentation discriminator's depth supervision.
[0053] Preferably, in the joint optimization module, the joint loss includes generator loss and discriminator loss; The generator loss is expressed as follows:
[0054] in, Indicates generator loss. Indicates significant loss. Indicates texture loss, This represents the similarity loss; The expression for the significance loss is:
[0055] Among them, In the formula, express The subsequent feature channel retrieval sequence number, It is the total number of channels for this feature. express function, Represents the 1-norm operation; The source image features the deepest convolutional features of the generator encoder; This indicates the convolutional features at the lowest level of the discriminator in the fused image; The expression for the texture loss is:
[0056] in, and Both represent shallow features of the generator decoder of the source image; The expression for the similarity loss is:
[0057] Among them, In the formula, This indicates the number of labels contained in the mask. Indicates the ordinal number of the label; footnote They are pixels In the n The grayscale value of a modality and the pixel grayscale value of that pixel in the fused image; This represents the mean operation; Indicates variance operation; This indicates the operation of computational structure measurement indicators; and All are constants; The discriminator loss is defined as having Class splitting tags The segmentation task loss of the fused image, in the segmentation label In the fused image, Indicates the width of the mask. The height of the mask is represented by the expression for the segmentation task loss:
[0058] Among them, In the formula, This represents the loss for splitting tasks, where "task" represents the task itself. Indicates the number of sample images. This indicates the total number of categories for the classification task; Indicates the category, i.e., the ordinal number of the label; This represents the category weight parameter; These represent the x and y coordinates of a pixel, respectively. It is a pixel. Category The true value, It is a pixel. Category The predicted probability value.
[0059] Compared with the prior art, the present invention has the following beneficial effects: 1. This invention fundamentally bridges the semantic gap between low-level and high-level visual tasks. By incorporating multi-task perceptual representations into the same domain, it simultaneously optimizes image fusion quality and downstream driving task performance. Different tasks are integrated into the discriminator to enable the generator to extract key features from the source image. End-to-end adaptive generation can improve the accuracy of downstream tasks by producing high-quality fused images.
[0060] 2. This invention constructs a task-driven multimodal MRI image fusion network model for brain tumors, realizing multimodal MRI image fusion. Furthermore, through a GAN network, different tasks are integrated into the discriminator to encourage the generator to produce fusion results that are more suitable for actual task applications, enabling further application of the fused images and improving the accuracy of downstream tasks.
[0061] 3. This invention updates TDFusion by combining fusion loss and task-driven loss, thus breaking through the current model accuracy bottleneck. Attached Figure Description
[0062] Other features, objects, and advantages of the present invention will become more apparent from the following detailed description of non-limiting embodiments with reference to the accompanying drawings: Figure 1 This is a schematic diagram of the TDFusion network structure provided by the present invention; Figure 2 This is a schematic diagram of the bidirectional connection feature fusion module structure provided by the present invention. Detailed Implementation
[0063] The present invention will now be described in detail with reference to specific embodiments. These embodiments will help those skilled in the art to further understand the present invention, but do not limit the invention in any way. It should be noted that those skilled in the art can make several changes and improvements without departing from the concept of the present invention. These all fall within the protection scope of the present invention.
[0064] This invention constructs a task-driven multimodal MRI image fusion network model for brain tumors, achieving multimodal MRI image fusion. This model uses a GAN network to integrate different tasks into the discriminator, prompting the generator to produce fusion results more suitable for practical applications and enabling further applications of the fused images, thus improving the accuracy of downstream tasks.
[0065] A task-driven multimodal MRI image fusion method for brain tumors, provided by the present invention, includes: Step 1: Construct a standard multimodal MRI database for brain tumors from multiple clinical centers. Step 2: Construct a task-driven multimodal MRI image fusion network model for brain tumors to achieve multimodal MRI image fusion. Step 3: Utilize multimodal MRI and fused images to construct a network-based automatic search multimodal MRI brain tumor segmentation model to improve the accuracy of tumor segmentation results. Step 4: Utilize multimodal MRI and fused images to construct a brain tumor classification model based on fine-grained methods to achieve accurate auxiliary classification of brain tumors. Step 5: Develop a client / server architecture-based AI-assisted software system framework for brain tumors, apply it in at least five clinical hospitals, and generate user reports.
[0066] Regarding the construction of a standardized multimodal MRI database for clinical multicenter brain tumors, as one example, an expert consensus can be developed by organizing radiologists and physicians based on the latest WHO recommendations and clinical guidelines. According to this consensus, 3500 real clinical multimodal MRI images of brain tumors from multiple hospitals will be collected, preprocessed, and labeled to construct a standardized multimodal MRI database for clinical multicenter brain tumors. The entire process consists of two stages: the first stage is data inclusion and standardized preprocessing, and the second stage is data labeling.
[0067] Firstly, data inclusion and standardized preprocessing. Based on the latest WHO recommendations and clinical guidelines, this invention organized radiologists and physicians to develop an expert consensus, determining multi-center equipment, imaging parameters, and acquisition parameters, and establishing inclusion and exclusion criteria, as well as annotation standards, reaching a consensus. Data collection was conducted according to the expert consensus, and the included data underwent standardized preprocessing. The main steps are as follows: a. Data desensitization. Sensitive information, such as personally identifiable information, in the original multimodal MRI image data of brain tumors was transformed according to desensitization rules to reliably protect sensitive privacy data. The personally identifiable information includes: name, ID number, patient number, and address.
[0068] b. Data cleaning. The collected raw data is reviewed and verified by professional physicians to remove duplicate information, correct errors, and improve data consistency.
[0069] c. Data resampling. All included data are resampled to a resolution of 256*256*16, with the spacing uniformly set to (1,1,1).
[0070] d. Standardized storage format. This invention aims to standardize the storage of multimodal MR images of brain tumors processed in the above steps into nii.gz format data, as shown in Table 1.
[0071] Table 1. Multimodal MRI image information to be collected
[0072] Secondly, the manual annotation process.
[0073] a. Region of Interest (ROI) delineation of the lesion. ROIs are manually marked on each image by the radiologist using MITK software (http: / / mitk.org / wiki / MITK).
[0074] Neither physician was familiar with the patient's clinical information. All tumors were labeled using two regions of interest (ROIs): tumor areas were defined as enhancing regions on CE-T1WI and FLAIR, while edema areas were defined as areas with no significant enhancement but abnormal tissue on T2WI. The labeling results were then interpreted by a radiologist. Tumor type classification labels were then determined based on the results of the concurrency criteria.
[0075] b. Labeling. ① Multimodal MRI of brain tumors, with acquisition of each modality shown in Table 1; ② Gold standard for brain tumor segmentation: tumor area and peritumoral edema; ③ Tumor type classification labels.
[0076] The multimodal MRI of the brain tumor includes: CE-T1WI, T2-Falir, DWI, and ADC.
[0077] The above information was used as disease labels for multimodal MRI of tumors to construct a multidimensional information database for brain tumors. Brain tumor segmentation annotations were created by physicians and reviewed by experts. The corresponding segmentation annotations (.nii.gz files) and their labels for the multimodal MRI images were saved, and a CSV label file was generated and saved after conversion.
[0078] A task-driven multimodal MRI image fusion network model for brain tumors. This invention aims to enhance complementary lesion tissue features between different tumor tissues to generate fused images by driving different downstream tasks. A task-driven multimodal MRI image fusion network model for brain tumors, namely TDFusion, is proposed to overcome the accuracy bottleneck of current models. Figure 1 As shown, the model first obtains the fusion result through a generator. Then, a task-driven discriminator is used to train the downstream task network on the fused data. Finally, TDFusion is updated using the fusion loss and the task-driven loss.
[0079] Third, the generator.
[0080] a. Generator encoder. The encoder is designed to have Deep CNN and Mamba parallel encoding structure are used to extract local information and supplement global information, respectively.
[0081] For the A CNN encoding module, the input of which is ,exist middle, These are the width, height, and number of channels of the multimodal MRI, respectively. Output features of each CNN encoding module It can be expressed by the following formula:
[0082] Among them, In the formula, Indicates the first The invention sets up operations for each Resblock module. .
[0083] Since CNNs cannot effectively extract fine-grained features from medical images, state-space models, such as Transformers, can model long-range interactions while maintaining linear computational complexity.
[0084] Therefore, this invention constructs a parallel encoder based on Transformer to obtain more accurate fine-grained features of the global image. This invention defines the first... The input to each Transformer encoding module is ,exist middle, Given the width, height, and number of channels of the multimodal MRI, respectively, the encoded features are: .
[0085] Specifically, the Transformer encoding module first... Perform position embedding encoding and convert it into tokensize. This represents the location embedding encoding feature. go through Multiple head self-attention (MHSA), i.e., the transformer module obtains the output of the l-th transformer coded block. The expression is:
[0086] in, This indicates a position rearrangement operation; Indicates the number of multi-head attention modules; Indicates the Transformer encoding module; This represents the input characteristics of the Transformer encoding module.
[0087] Among them, the indivual The input is defined as ,but The output is:
[0088]
[0089] in, It is an element-wise addition operation. That is, a feedforward network, consisting of two MLP layers with an expansion ratio r, and a GELU activation function in the middle of the two MLP layers. Presentation layer normalization operation; This indicates a multi-head self-attention module; b. Generator's decoder. To match the encoder's feature dimensions, this invention also designs the generator's decoder to have... Deep CNN decoding structure.
[0090] The input to each CNN module of this decoder consists of two parts: one part is the fused features at the lowest level, which are sampled through a U-shaped network. The other part is obtained by the encoder CNN module. ,in, These are the width, height, and number of channels of the input multimodal MRI. Represents the feature space.
[0091] Then, the components are merged via skip connections, with the input to each CNN module of the decoder being... ,
[0092] Then its output It can be calculated using the following formula:
[0093] Finally, through a depth of The CNN decoder restores the image to its original size and passes it through a 1 A convolutional layer of 1 is used to obtain the fused image. Through a 1 The convolutional layer with a value of 1 outputs the final fused image.
[0094] Fourth, a task-based discriminator. This invention uses multimodal brain tumor MRI as a research object and designs a discriminator based on three different tasks, including an image authenticity discrimination network, a tumor classification network, and a tumor segmentation network. This invention first processes the generated final fused image, i.e. pass The convolutional layer is mapped to a multi-channel image. The present invention sets . The original image is a 4-modal MRI. These are the width and height of the input multimodal MRI, respectively.
[0095] Then, based on the different discrimination networks corresponding to different tasks, It is then used as input to the discriminator for feature extraction and to complete the corresponding task.
[0096] a. Tumor classification and identification device. For example... Figure 1 As shown, this invention uses Mobilenetv3-S as the discriminator for the classification task. The discriminator's role is to constrain the generator to generate fused images that are more suitable for tumor classification.
[0097] The input to Mobilenetv3-S is , After undergoing dimensionality upscaling through a 3x3 convolutional layer, and then through three 3x3 convolutional layers... and 8 with 5x5 convolutions Feature extraction is performed to obtain features. .
[0098]
[0099] Then, The final classification result is obtained through a pooling layer and two 1x1 convolutional layers.
[0100] b. Tumor segmentation and identification device. For example... Figure 1 As shown, this invention uses U2-Net as the discriminator for the segmentation task. The discriminator's role is to constrain the generator to produce a fused image more suitable for tumor tissue segmentation. The entire U2-Net network is an encoder-decoder symmetrical U-shaped structure, with each outer encoder module containing a U-shaped network RSU module. like Figure 2 As shown in RSU_i, green represents convolution + BN + ReLU, gray represents downsampling + convolution + BN + ReLU, and purple represents upsampling + convolution + BN + ReLU.
[0101] The network includes One downsampling block and one upsampling block, =5; This was achieved using Maxpool and linear interpolation methods respectively. After convolution and downsampling, convolution and upsampling were performed to restore the feature map to the input size.
[0102] This invention defines The input to the downsampling RSU module is ,exist middle, , and Let the length, width, and number of channels of the feature be represented respectively, then the output is... It can be expressed by the following formula:
[0103] in, Indicates the first Each RSU module operation, when hour, . Indicates the first i Each RSU module operation. In middle, , and These represent the length, width, and number of channels of the feature, respectively. The corresponding output of the upsampling RSU module It is expressed by the following formula:
[0104] in, express Each RSU module operation. In middle, , and These represent the length, width, and number of channels of the feature, respectively.
[0105] Furthermore, U2-Net effectively combines shallow and deep semantic information through deep supervision, that is, the features of each upsampling module are... operate After resizing to the original image size, the images are summed to obtain the shallow-deep semantic information depth supervision output. Therefore, Deep supervision output of layer U2-Net Calculated by the following formula:
[0106] The final output of U2-Net It can be expressed as the following formula:
[0107] Among them, middle, These represent the width, height, and number of channels of the mask, respectively. The number of convolution kernels.
[0108] Fifth, the bidirectional connectivity feature fusion module. The bidirectional connectivity feature fusion module, such as... Figure 2 As shown, features are configured to fuse CNN features and transformer features. The fused features are then decomposed and implicitly re-encoded into a parallel encoding structure to obtain better representation features. This... Each feature fusion module takes CNN features as input. and transformer features . Firstly, through The operation restores the feature dimensions to the same level as... The two types of features are kept in the same dimensions, and then the two types of features are merged using the concat operation to obtain intermediate features. Its expression is:
[0109] Among them, in the intermediate features middle, These are the width, height, and number of channels of the input multimodal MRI.
[0110] Then The enhanced fused features are obtained by weighting the features using a feature fusion module based on Channel-Spatial Attention. As shown in the following formula:
[0111] in, It is an element-wise addition operation. Indicates input Feature fusion calculation, the calculation process can be obtained by the following formula:
[0112] in, Indicates input; This indicates an element-wise product operation; Represents the channel weighted matrix; Represents a spatial weighted matrix; Indicates variance operation; This represents the max pooling operation; This indicates a global average pooling operation; This represents a diagonal pooling operation; Indicates multilayer sensor operation; This represents a 5×5 convolution operation; In other words, the fusion process can be divided into channel-weighted fusion and spatial-weighted fusion.
[0113] In the channel weighted fusion stage, the first step is to... The global max pooling operation and global average pooling operation are performed to obtain the global max pooling result in the channel-weighted fusion stage. and the result of global average pooling operation Next, and They are fed into a two-layer MLP for weighting.
[0114] The number of neurons in the first layer is , For the reduction rate, =8, activation function is ReLU; number of neurons in the second layer is Next, element-wise summation and sigmoid activation are performed on the features output by the MLP to generate a channel weighted matrix, i.e. .
[0115] In the spatial weighted fusion stage, the first step is to... Performing global max pooling and global average pooling operations yields the global max pooling result for the spatially weighted fusion stage. and the result of global average pooling operation Then and Perform a channel concat operation, followed by a 5×5 convolution operation to reduce the dimensionality to... and after Generate a spatial weighted matrix, i.e. Specifically, in middle, These are the width and height of the input features of the bidirectional connectivity feature fusion module, respectively.
[0116] Finally, and Each is compared with the input feature of this module. That is, element-wise multiplication and summation are performed to obtain the final fused features.
[0117] In obtaining fusion characteristics Subsequently, to match the fused output with the CNN encoding and transformer encoding, this invention will... Re-divided into two features along the channel dimension and ,Then and The features are weighted by convolutional blocks and MLP blocks respectively and then fed back to the parallel encoding modules of CNN and Transformer to complete feature fusion; where CNN is the convolutional neural network. These are the width, height, and number of channels of the input multimodal MRI.
[0118] Sixth, the loss function. This invention achieves the adversarial generation process by minimizing the generator and maximizing the discriminator. The optimization objective of TBF-GAN is expressed by the following formula:
[0119] in, This represents the operation of minimizing the generator loss function and maximizing the discriminator loss function; express and The generation loss between them; and These represent the original image and the generated fused image, respectively. This indicates the output of the discriminator. express and The loss of identification between them.
[0120] In this process, optimizing the objective loss function consists of two parts: a generator loss function that constrains the quality of the fused image. Constraint-fused images are more suitable for the discriminator loss function of tasks. .
[0121] a. Generator loss function . It includes saliency loss, texture loss, and similarity loss. It can be expressed by the following formula:
[0122] This invention uses The deepest convolutional features of the generator encoder The lowest level convolutional features of the discriminator To calculate the image saliency loss, As shown in the following formula:
[0123] Among them, In the formula, express The subsequent feature channel retrieval sequence number, It is the total number of channels for this feature. express function, Represents the 1-norm operation; The source image features the deepest convolutional features of the generator encoder; This indicates that the fused image is generated by convolutional features at the lowest level of the discriminator.
[0124] This invention uses a generator encoder to encode shallow features. and generator decoder shallow features To calculate image texture loss, As shown in the following formula:
[0125] in, , These represent the shallow features of the source image in the generator encoder and decoder, respectively.
[0126] because This invention allows for comparison of differences in image intensity, contrast, and structure; therefore, it incorporates [a specific feature] into the generation loss function. To constrain the structural similarity of the fused images As shown in the following formula:
[0127] Among them, In the formula, This indicates the number of labels contained in the mask. Indicates the ordinal number of the label; footnote It is a pixel. The grayscale value of the nth modality and the pixel grayscale value of that pixel in the fused image. and These are operations for calculating the mean and variance, respectively. This indicates the operation of computational structure measurement indicators; and It is a small constant added to prevent the denominator from approaching zero. .
[0128] b. Discriminator Loss Function .
[0129] Classification task loss. This invention treats tumor classification as a multi-class classification task, i.e., [0,1,2,3]. Therefore, this invention extends the above-mentioned cross-entropy loss to the multi-class classification task, expressed by the following formula:
[0130] Among them, In the formula, It is the sample size. The sign function (0, 1) represents the sample. In category The value of ; If the true category is equal to c, then take 1; otherwise take 0. This is the total number of categories for this type of task. It is a sample Category The predicted probability of the category. , That is, the ordinal number of the label.
[0131] Segmentation task loss. In the segmentation network, this invention transforms the segmentation problem into a pixel classification problem. Because multiple masks corresponding to segmentation are one-hot encoded into multiple... Each pixel in the segmentation network outputs the probability of being classified as foreground or background in different channels. Therefore, for an image with... Class splitting tags For the fused image, this invention defines the segmentation task loss function using the following formula:
[0132] Among them, In the formula, It is the number of sample images. It is a pixel. Category The true value, i.e., belonging to the category. If the result is positive, then take 1; otherwise, take 0. This is the total number of categories, that is, the number of labels contained in the mask. It is a pixel. Category The predicted probability value.
[0133] In multimodal brain tumor MRI images, only a small portion represents the lesion tissue, leading to pre-segmentation / background imbalance during segmentation training and resulting in performance degradation. Inspired by Generalized Dice Loss, this invention uses class weight parameters in segmentation. Provide invariance for different tag set attributes. As shown in the following formula
[0134] Among them, in the formula middle, Indicates category, Indicates the width of the mask. It indicates the height of the mask.
[0135] The present invention also provides a task-driven multimodal MRI image fusion system for brain tumors. The task-driven multimodal MRI image fusion system for brain tumors can be implemented by executing the process steps of the task-driven multimodal MRI image fusion method for brain tumors. That is, those skilled in the art can understand the task-driven multimodal MRI image fusion method for brain tumors as a preferred embodiment of the task-driven multimodal MRI image fusion system for brain tumors.
[0136] A task-driven multimodal MRI image fusion system for brain tumors, according to the present invention, includes: Feature extraction module: Inputs multimodal brain tumor MRI images into the generator, and extracts local and global features from the multimodal brain tumor MRI images through parallel convolutional neural network encoding module and Transformer encoding module, respectively; Feature fusion module: Fuses the local and global features through a bidirectional connection feature fusion module, and reconstructs the fused image through a decoder; Task-driven module: Inputs the fused image into a discriminator, which integrates at least one downstream task recognition network. The downstream task recognition network performs a preset downstream task on the fused image and generates task performance loss; Joint optimization module: Constructs and optimizes the generator and the discriminator through a joint loss function, thereby generating a fused image suitable for the downstream task.
[0137] Those skilled in the art will understand that, besides implementing the system and its various devices, modules, and units provided by this invention in the form of purely computer-readable program code, the same functions can be achieved entirely through logical programming of the method steps, making the system and its various devices, modules, and units of this invention function in the form of logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded microcontrollers. Therefore, the system and its various devices, modules, and units provided by this invention can be considered as a hardware component, and the devices, modules, and units included therein for implementing various functions can also be considered as structures within the hardware component; alternatively, the devices, modules, and units for implementing various functions can be considered as both software modules implementing the method and structures within the hardware component.
[0138] Specific embodiments of the present invention have been described above. It should be understood that the present invention is not limited to the specific embodiments described above, and those skilled in the art can make various changes or modifications within the scope of the claims, which do not affect the essence of the present invention. Unless otherwise specified, the embodiments and features described in this application can be arbitrarily combined with each other.
Claims
1. A task-driven brain tumor multi-modal MRI image fusion method, characterized in that, include: Feature extraction steps: Input the multimodal brain tumor MRI image into the generator, and extract the local and global features of the multimodal brain tumor MRI image through the parallel convolutional neural network encoding module and the Transformer encoding module, respectively. Feature fusion step: The local features and global features are fused through a bidirectional connection feature fusion module, and the fused image is reconstructed by a decoder. Task-driven steps: The fused image is input to a discriminator, which integrates at least one downstream task recognition network. The downstream task recognition network performs a preset downstream task on the fused image and incurs task performance loss. Joint optimization step: Construct and optimize the generator and the discriminator through a joint loss function, thereby enabling the generator to generate a fused image suitable for the downstream task. 2.The task-driven brain tumor multi-modal MRI image fusion method according to claim 1, characterized in that, In the feature extraction step, the local features of the multimodal brain tumor MRI image are expressed as follows: wherein, represents the output feature of the first convolutional neural network encoding module, i.e., the local feature of the multi-modal brain tumor MRI image; represents the output feature of the first convolutional neural network encoding module, i.e., the local feature of the multi-modal brain tumor MRI image; represents down-sampling; represents the output feature of the first convolutional neural network encoding module, i.e., the local feature of the multi-modal brain tumor MRI image; represents the output feature of the first convolutional neural network encoding module, i.e., the local feature of the multi-modal brain tumor MRI image; The global features of the multimodal brain tumor MRI image are expressed as follows: wherein, in wherein, represents the output feature of the Transformer encoding module, i.e., the global feature of the multi-modal brain tumor MRI image; represents a position rearrangement operation; represents the number of multi-head attention modules; represents a Transformer encoding module; represents the input feature of the Transformer encoding module. 3.The task-driven brain tumor multi-modal MRI image fusion method according to claim 2, characterized in that, In the feature fusion step, the local features and global features are fused through a bidirectional connection feature fusion module, including: Step A1: The fusion module fuses the local features and global features to obtain intermediate features; Step A2: Weight the intermediate features to obtain the enhanced fused features; Step A3: Re-divide the enhanced fused features into two features along the channel dimension. and Then, the features are weighted by convolutional blocks and MLP blocks respectively and fed back to the parallel convolutional neural network encoding module and Transformer encoding module to complete feature fusion and obtain a preliminary fused image; , ;exist , middle, These are the width, height, and number of channels of the input multimodal MRI. Represent the feature space; The expression for the intermediate feature is: in, Indicates intermediate features. Indicates feature concatenation operation; The expression for the enhanced fused features is: in, This indicates the enhanced fusion features. This represents an element-wise addition operation; Indicates feature fusion operation; The process of reconstructing and generating the fused image via the decoder includes: Step B1: Construct a decoder; the depth of the convolutional neural network in the decoder is the same as the depth of the convolutional neural network in the generator; Step B2: Based on the preliminary fused image, obtain the output of each convolutional neural network module through the decoder, and then pass it through a 1 A convolutional layer of 1 yields the final fused image; The output of the convolutional neural network module is expressed as follows: in, Indicates the first The output of each convolutional neural network module Indicates upsampling; Indicates Resblock module operations; Indicates the first The input to each convolutional neural network module; The first The input to each convolutional neural network module is expressed as: in, This indicates that the lowest-level fusion features are obtained by sampling through a U-shaped network; Indicates the first The input to the encoding module of a convolutional neural network.
4. The task-driven multimodal MRI image fusion method for brain tumors according to claim 1, characterized in that, The task-driven steps include: Step C1: Input the fused image into the tumor allocation discriminator to obtain the acquired features. ; Step C2: Feature-based Obtain tumor classification results; Step C3: Input the fused image into the tumor segmentation discriminator to obtain the segmentation result; The features The expression is: in, Represents a 5x5 convolution ; Represents 3x3 convolution ; Indicates the input to the tumor allocation discriminator; footnote Represents 3x3 convolution The quantity is three; footnote Represents a 5x5 convolution The quantity is 8; The expression for the tumor classification result is: in, Indicates the tumor classification results; This represents a 1x1 convolutional layer. This represents the max pooling operation; The expression for the segmentation result is: in, Indicates the segmentation result. ;exist middle, These represent the width, height, and number of channels of the mask, respectively. The number of convolution kernels, express Activation function; This represents the output of the tumor segmentation discriminator's depth supervision.
5. The task-driven multimodal MRI image fusion method for brain tumors according to claim 1, characterized in that, In the joint optimization step, the joint loss includes the generator loss and the discriminator loss; The generator loss is expressed as follows: in, Indicates generator loss. Indicates significant loss. Indicates texture loss, This represents the similarity loss; The expression for the significance loss is: Among them, In the formula, express The subsequent feature channel retrieval sequence number, It is the total number of channels for this feature. express function, Represents the 1-norm operation; The source image features the deepest convolutional features of the generator encoder; This indicates the convolutional features at the lowest level of the discriminator in the fused image; The expression for the texture loss is: in, and Both represent shallow features of the generator decoder of the source image; The expression for the similarity loss is: Among them, In the formula, This indicates the number of labels contained in the mask. Indicates the ordinal number of the label; footnote They are pixels In the n The grayscale value of a modality and the pixel grayscale value of that pixel in the fused image; This represents the mean operation; Indicates variance operation; This indicates the operation of computational structure measurement indicators; and All are constants; The discriminator loss is defined as having Class splitting tags The segmentation task loss of the fused image, in the segmentation label In the fused image, Indicates the width of the mask. The height of the mask is represented by the expression for the segmentation task loss: Among them, In the formula, This represents the loss for splitting tasks, where "task" represents the task itself. Indicates the number of sample images. This indicates the total number of categories for the classification task; Indicates the category, i.e., the ordinal number of the label; This represents the category weight parameter; These represent the x and y coordinates of a pixel, respectively. It is a pixel. Category The true value, It is a pixel. Category The predicted probability value.
6. A task-driven multimodal MRI image fusion system for brain tumors, characterized in that, include: Feature extraction module: Input multimodal brain tumor MRI images into the generator, and extract local and global features of the multimodal brain tumor MRI images through parallel convolutional neural network encoder and Transformer encoder, respectively; Feature fusion module: The local and global features are fused through a bidirectional connection feature fusion module, and the fused image is reconstructed by the decoder. Task-driven module: Inputs the fused image to the discriminator, which integrates at least one downstream task recognition network. The downstream task recognition network performs a preset downstream task on the fused image and generates task performance loss. Joint optimization module: Constructs and optimizes the generator and the discriminator through a joint loss function, thereby generating a fused image suitable for the downstream task.
7. The task-driven multimodal MRI image fusion system for brain tumors according to claim 6, characterized in that, In the feature extraction module, the local features of the multimodal brain tumor MRI image are expressed as follows: in, Indicates the first The output features of each convolutional neural network coding module, i.e., the local features of multimodal brain tumor MRI images; Indicates downsampling; Indicates the first One Resblock module operation; The global features of the multimodal brain tumor MRI image are expressed as follows: Among them, In the formula, Indicates the first The output features of each Transformer coding module, i.e., the global features of multimodal brain tumor MRI images; This indicates a position rearrangement operation; Indicates the number of multi-head attention modules; Indicates the Transformer encoding module; This represents the input characteristics of the Transformer encoding module.
8. The task-driven multimodal MRI image fusion system for brain tumors according to claim 7, characterized in that, In the feature fusion module, the local features and global features are fused through a bidirectional connection feature fusion module, including: Module A1: The fusion module fuses the local and global features to obtain intermediate features; Module A2: Weighted intermediate features are used to obtain enhanced fused features; Module A3: Re-divides the enhanced fused features into two features along the channel dimension. and Then, the features are weighted by convolutional blocks and MLP blocks respectively and fed back to the parallel convolutional neural network encoding module and Transformer encoding module to complete feature fusion and obtain a preliminary fused image; , ;exist , middle, These are the width, height, and number of channels of the input multimodal MRI. Represent the feature space; The expression for the intermediate feature is: in, Indicates intermediate features. Indicates feature concatenation operation; The expression for the enhanced fused features is: This indicates the enhanced fusion features. This represents an element-wise addition operation; Indicates feature fusion operation; The process of reconstructing and generating the fused image via the decoder includes: Module B1: Constructing a decoder; the depth of the convolutional neural network in the decoder is the same as the depth of the convolutional neural network in the generator; Module B2: Based on the preliminary fused image, the output of each convolutional neural network module is obtained through the decoder, and then processed through a 1 A convolutional layer of 1 yields the final fused image; The output of the convolutional neural network module is expressed as follows: in, Indicates the first The output of each convolutional neural network module Indicates upsampling; Indicates Resblock module operations; Indicates the first The input to each convolutional neural network module; The first The input to each convolutional neural network module is expressed as: in, This indicates that the lowest-level fusion features are obtained by sampling through a U-shaped network; Indicates the first The input to the encoding module of a convolutional neural network.
9. The task-driven multimodal MRI image fusion system for brain tumors according to claim 6, characterized in that, The task-driven module includes: Module C1: Inputs the fused image into the tumor allocation discriminator to obtain the acquired features. ; Module C2: Feature-based Obtain tumor classification results; Module C3: Inputs the fused image into the tumor segmentation discriminator to obtain the segmentation result; The features The expression is: in, Represents a 5x5 convolution ; Represents 3x3 convolution ; Indicates the input to the tumor allocation discriminator; footnote Represents 3x3 convolution The quantity is three; footnote Represents a 5x5 convolution The quantity is 8; The expression for the tumor classification result is: in, Indicates the tumor classification results; This represents a 1x1 convolutional layer. This represents the max pooling operation; The expression for the segmentation result is: in, Indicates the segmentation result. ;exist middle, These represent the width, height, and number of channels of the mask, respectively. The number of convolution kernels, express Activation function; This represents the output of the tumor segmentation discriminator's depth supervision.
10. The task-driven multimodal MRI image fusion system for brain tumors according to claim 6, characterized in that, In the joint optimization module, the joint loss includes generator loss and discriminator loss; The generator loss is expressed as follows: in, Indicates generator loss. Indicates significant loss. Indicates texture loss, This represents the similarity loss; The expression for the significance loss is: Among them, In the formula, express The subsequent feature channel retrieval sequence number, It is the total number of channels for this feature. express function, Represents the 1-norm operation; The source image features the deepest convolutional features of the generator encoder; This indicates the convolutional features at the lowest level of the discriminator in the fused image; The expression for the texture loss is: in, and Both represent shallow features of the generator decoder of the source image; The expression for the similarity loss is: Among them, In the formula, This indicates the number of labels contained in the mask. Indicates the ordinal number of the label; footnote They are pixels In the n The grayscale value of a modality and the pixel grayscale value of that pixel in the fused image; This represents the mean operation; Indicates variance operation; This indicates the operation of computational structure measurement indicators; and All are constants; The discriminator loss is defined as having Class splitting tags The segmentation task loss of the fused image, in the segmentation label In the fused image, Indicates the width of the mask. The height of the mask is represented by the expression for the segmentation task loss: Among them, In the formula, This represents the loss for splitting tasks, where "task" represents the task itself. Indicates the number of sample images. This indicates the total number of categories for the classification task; Indicates the category, i.e., the ordinal number of the label; This represents the category weight parameter; These represent the x and y coordinates of a pixel, respectively. It is a pixel. Category The true value, It is a pixel. Category The predicted probability value.