A medical image cross-modality generation method and device

By employing multi-granularity convolution, frequency domain texture separation, and progressive decoding, combined with adversarial learning strategies, a cross-modal medical image generation model was constructed. This approach addresses the issues of lost pathological information and insufficient training samples, thereby improving the generation quality and stability.

CN115661287BActive Publication Date: 2026-06-19SHENZHEN INST OF ADVANCED TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHENZHEN INST OF ADVANCED TECH
Filing Date
2022-10-28
Publication Date
2026-06-19

Smart Images

  • Figure CN115661287B_ABST
    Figure CN115661287B_ABST
Patent Text Reader

Abstract

This invention relates to the field of medical imaging, specifically to a method and apparatus for cross-modal generation of medical images. The invention proposes a multi-granularity convolution method to extract and encode features at various scales of pathological information in source modal medical images. Furthermore, it proposes a frequency-domain texture separation method to transform the target modal medical image into the wavelet domain for anisotropic texture separation, and then utilizes multiple generators for multi-frequency parallel generation, thereby improving the generation quality of the target modal image in terms of minute detail textures and reducing the training difficulty of a single generator. It also proposes a progressive decoding method, progressively decoding from high-dimensional features to low-dimensional features, which helps to stably and gradually refine the pathological information of the target modal image, aiming to solve the problem of modal missingness in medical images.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of medical imaging, and more specifically, to a method and apparatus for generating medical images across modalities. Background Technology

[0002] In clinical diagnosis, a patient's medical imaging characteristics are the primary basis for clinicians to make accurate diagnoses. Medical images contain multiple modalities of data. Due to different imaging principles, different modalities of medical images emphasize different aspects of the pathological information prominently reflected in the same physiological tissue. Single-modal medical images often cannot reflect the complex characteristics of a disease. Comprehensive analysis of multiple modalities of medical images, utilizing the characteristics and complementary relationships of different modalities, can provide doctors with more comprehensive diagnostic information and improve the accuracy of disease diagnosis. For example, magnetic resonance imaging (MRI) uses the different attenuations of released energy in different structural environments within a substance. By detecting the emitted electromagnetic waves through an external gradient magnetic field, the location and type of atomic nuclei constituting the object can be determined, and an image of the object's internal structure can be drawn. X-ray computed tomography (CT) mainly uses X-ray tomography, which is received by an electron photon detector and converted into digital signals input into a computer, which then converts them into images. Positron emission tomography (PET) measures metabolic changes in brain tissue by injecting a radioactive agent labeled with positrons into the patient and analyzing the metabolic processes they participate in. MR images provide more detailed soft tissue imaging, CT images offer clearer bone tissue imaging, and PET images can detect lesions in their early stages by observing cellular metabolic activity. However, MRI requires a long scan time (approximately 30 minutes) to acquire high-resolution images, during which the patient must remain still; any displacement can easily cause problems such as ghosting or blurring in the imaging results. CT scans can cause radiation damage to the human body, and the injection of PET contrast agents can also pose risks to the human body. Therefore, obtaining multiple modalities of medical imaging simultaneously is clinically challenging and costly.

[0003] A promising solution is to synthesize novel modalities of medical images using existing artificial intelligence methods with limited real-world image samples through image computation. In recent years, various image synthesis techniques have been proposed and widely applied in the medical field, and their role will become even more significant with advancements in hardware and the increasing volume of medical image data. Among these, Generative Adversarial Networks (GANs) have achieved great success in image generation. GAN strategies possess inherent semi-supervised learning characteristics, effectively addressing the problem of limited training samples in medical imaging. GANs, capable of reconstructing corresponding medical images from Gaussian hidden layer vectors, are currently among the best-performing generative models and have gradually become a research hotspot in deep learning, beginning to be applied in the field of medical imaging.

[0004] A cross-modal medical image synthesis method based on a parallel generative network (Chinese Patent Application No.: CN201911232218.9) is available for synthesizing MR images from CT images. This invention obtains the common vector feature space of the cross-modal medical images through preprocessing and constructs a training dataset for training the synthesizer. For synthesizer training, if the feature difference between the generated samples is within the allowable error range, a synthesized image is output; if the feature difference exceeds the allowable error range, two regularization terms are weighted and combined with the synthesizer's original loss function to guide further training and improve the quality of synthesized whole images of easily deformable medical areas.

[0005] Another medical image reconstruction method (Chinese patent application number: CN202010565805.6) mainly proposes to extract feature codes from real image samples to obtain feature code vectors of real samples, and to reconstruct the first image based on the feature code vectors through an image reconstruction network. The second image is then reconstructed based on the first hidden layer vector of the real image sample. The first and second images are then discriminated by an image discrimination network, and the image reconstruction network is optimized based on the image discrimination results.

[0006] A recent technology is a deep learning-based method for converting and generating 3D abdominal medical images (Chinese Patent Application No.: 202110442107.1). This invention provides a deep learning-based method for converting and generating 3D abdominal medical images, used to achieve cross-modal image conversion from abdominal MRI images to abdominal CT images, including the following steps: Step S1, performing 3D medical registration on real MRI images and real CT images in an existing training set to obtain registered images; Step S2, inputting the registered images into a 3D deep learning model for image conversion to train the model, obtaining a trained 3D deep learning model; Step S3, inputting the abdominal MRI image to be converted into the trained 3D deep learning model to obtain the corresponding CT image of the same location.

[0007] In summary, the existing technology has the following drawbacks:

[0008] 1. Existing technologies propose using only one modality of medical images to generate another modality of images, without providing a method for generating target modality medical images by using existing multiple source modalities of medical images for feature fusion encoding. This can easily lead to the loss of some pathological information in the reconstruction results of the target modality images.

[0009] 2. Existing technologies do not sufficiently extract and encode the features of the source modality. Since pathological information in medical images exists at multiple scales, the network structure used in existing technologies cannot fully extract and encode pathological information at each scale, which can easily lead to the loss of pathological information in the final reconstructed target modality image.

[0010] 3. Existing technologies do not fully consider the limited number of training samples for medical images. Supervised training methods cannot fully utilize unpaired medical images for training cross-modal medical image generation models. Deep learning-based network models require large-scale datasets for training to achieve good results. Summary of the Invention

[0011] This invention provides a method and apparatus for generating cross-modal medical images, which at least solves the technical problem of missing modalities in existing medical images.

[0012] According to an embodiment of the present invention, a method for generating medical images across modalities is provided, comprising the following steps:

[0013] Multi-granularity convolution is used to extract and encode features of pathological information at various scales in the source modal medical images to generate target modal medical images.

[0014] The target modal medical image is converted to the wavelet domain for anisotropic texture separation using a frequency domain texture separation method. Then, multiple generators are used to generate multi-frequency parallel generation and fuse them into a unified coding vector.

[0015] The progressive decoding method is used to progressively decode the uniformly encoded vector from high-dimensional features to low-dimensional features.

[0016] Furthermore, the method also includes:

[0017] A semi-supervised generative model for cross-modal medical images based on progressive multi-granularity feature encoding and multi-frequency parallel decoding using an adversarial strategy is constructed. Using this model, a semi-supervised training method is designed with the semi-supervised characteristics of the adversarial learning strategy, and unpaired data is used for training to improve the model's performance.

[0018] Furthermore, the adversarial training discrimination module consists of a multi-layer convolutional neural network and employs a relative mean discrimination strategy for discrimination.

[0019] Furthermore, multi-granularity convolution methods are used to extract and encode features at various scales from pathological information at different scales in the source modality of medical images to generate target modality medical images, including:

[0020] A multi-granularity convolutional network structure is used to encode the source modalities of medical images from multiple sources, extracting shared structural and pathological information features among the multiple source modalities, and generating preliminary target modal images.

[0021] Furthermore, the target modal medical image is transformed into the wavelet domain for anisotropic texture separation using a frequency domain texture separation method. Then, multiple generators are used to generate multi-frequency parallel images and fuse them into a unified encoded vector, including:

[0022] The initially generated target modal medical image is converted to the wavelet domain for lossless frequency domain decomposition, extracting the image's contour, lateral, longitudinal, and diagonal texture information, and then fused with the features extracted by multi-granularity convolution to transform it into a unified encoding vector.

[0023] Furthermore, a frequency domain texture separation scheme is adopted to decompose the initially generated target modal medical image into contour, horizontal, vertical, and diagonal texture information.

[0024] Furthermore, multiple sets of feature vectors are fused with the contour, lateral, longitudinal, and diagonal texture information of the initially generated target modal image, and the feature encoding vector contains the texture information of the initial target modal image.

[0025] Furthermore, the progressive decoding method, which progressively decodes the unified coding vector from high-dimensional features to low-dimensional features, includes:

[0026] Multiple generators are used to decode the unified encoded vector from high-dimensional features to low-dimensional features.

[0027] Furthermore, decoding the unified encoded vector from high-dimensional features to low-dimensional features using multiple generators includes:

[0028] In the multi-granularity feature extraction process, high-dimensional features and low-dimensional features are encoded separately, forming multiple sets of feature encoding vectors from high-dimensional to low-dimensional.

[0029] Another embodiment of the present invention provides a medical image cross-modal generation device, comprising:

[0030] The feature extraction and encoding unit is used to extract and encode pathological information at various scales in the source modal medical images using multi-granularity convolution methods to generate target modal medical images.

[0031] The encoding vector fusion and transformation unit is used to convert the target modal medical image to the wavelet domain for anisotropic texture separation using the frequency domain texture separation method, and then use multiple generators to perform multi-frequency parallel generation and fuse them into a unified encoding vector.

[0032] The progressive decoding unit for encoded vectors is used to progressively decode a uniform encoded vector from high-dimensional features to low-dimensional features using a progressive decoding method.

[0033] A storage medium storing program files capable of implementing any of the above-described medical image cross-modal generation methods.

[0034] A processor for running a program, wherein the program executes any of the above-mentioned methods for cross-modal generation of medical images.

[0035] The medical image cross-modal generation method and apparatus in this invention proposes a multi-granularity convolution method to extract and encode features at various scales for pathological information in source modal medical images. Furthermore, a frequency domain texture separation method is proposed to transform the target modal medical image into the wavelet domain for anisotropic texture separation, followed by multi-frequency parallel generation using multiple generators. This improves the generation quality of the target modal image in terms of minute detail textures and reduces the training difficulty of individual generators. A progressive decoding method is also proposed, progressively decoding from high-dimensional features to low-dimensional features, which helps to stably and gradually refine the pathological information of the target modal image, aiming to solve the problem of missing modalities in medical images. Attached Figure Description

[0036] The accompanying drawings, which are included to provide a further understanding of the invention and form part of this application, illustrate exemplary embodiments of the invention and, together with their description, serve to explain the invention and do not constitute an undue limitation thereof. In the drawings:

[0037] Figure 1 This is a flowchart of the medical image cross-modal generation method of the present invention;

[0038] Figure 2 This is a diagram of the cross-modal medical image generation model in this invention;

[0039] Figure 3 This is a diagram of the multi-granularity convolutional network structure in this invention;

[0040] Figure 4 This is the feature encoding vector fusion diagram in this invention;

[0041] Figure 5 This is a schematic diagram of the progressive decoder framework in this invention;

[0042] Figure 6 This is a block diagram of the medical image cross-modal generation device of the present invention. Detailed Implementation

[0043] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0044] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0045] Example 1

[0046] According to an embodiment of the present invention, a method for cross-modal generation of medical images is provided, see [link to relevant documentation]. Figure 1 This includes the following steps:

[0047] S101: Using multi-granularity convolution, feature extraction and encoding of pathological information at various scales in the source modal medical images are performed to generate target modal medical images.

[0048] S102: The target modal medical image is converted to the wavelet domain for anisotropic texture separation using the frequency domain texture separation method. Then, multiple generators are used to generate multiple frequencies in parallel and fuse them into a unified coding vector.

[0049] S103: Use a progressive decoding method to progressively decode the unified coding vector from high-dimensional features to low-dimensional features.

[0050] The medical image cross-modal generation method in this invention proposes a multi-granularity convolution method to extract and encode features at various scales for pathological information in source modal medical images. Furthermore, it proposes a frequency domain texture separation method to transform the target modal medical image into the wavelet domain for anisotropic texture separation, and then uses multiple generators for multi-frequency parallel generation, thereby improving the generation quality of the target modal image in subtle texture details and reducing the training difficulty of a single generator. It also proposes a progressive decoding method, progressively decoding from high-dimensional features to low-dimensional features, which helps to stably and gradually refine the pathological information of the target modal image, aiming to solve the problem of missing modalities in medical images.

[0051] The methods also include:

[0052] A semi-supervised generative model for cross-modal medical images based on progressive multi-granularity feature encoding and multi-frequency parallel decoding using an adversarial strategy is constructed. Using this model, a semi-supervised training method is designed with the semi-supervised characteristics of the adversarial learning strategy, and unpaired data is used for training to improve the model's performance.

[0053] The adversarial training discrimination module consists of a multi-layered convolutional neural network and uses a relative mean discrimination strategy for discrimination.

[0054] Among them, the method of using multi-granularity convolution to extract and encode features of pathological information at various scales in the source modality medical images to generate target modality medical images includes:

[0055] A multi-granularity convolutional network structure is used to encode the source modalities of medical images from multiple sources, extracting shared structural and pathological information features among the multiple source modalities, and generating preliminary target modal images.

[0056] The method of using frequency domain texture separation to transform the target modal medical image into the wavelet domain for anisotropic texture separation, and then using multiple generators to perform multi-frequency parallel generation and fuse them into a unified encoding vector includes:

[0057] The initially generated target modal medical image is converted to the wavelet domain for lossless frequency domain decomposition, extracting the image's contour, lateral, longitudinal, and diagonal texture information, and then fused with the features extracted by multi-granularity convolution to transform it into a unified encoding vector.

[0058] Among them, the frequency domain texture separation scheme is used to decompose the initially generated target modal medical image into contour, horizontal, vertical and diagonal texture information.

[0059] In this process, multiple sets of feature vectors are fused with the contour, lateral, longitudinal, and diagonal texture information of the initially generated target modal image, and the feature encoding vector contains the texture information of the initial target modal image.

[0060] The progressive decoding method, which progressively decodes the unified encoded vector from high-dimensional features to low-dimensional features, includes:

[0061] Multiple generators are used to decode the unified encoded vector from high-dimensional features to low-dimensional features.

[0062] The process of decoding the unified encoded vector from high-dimensional features to low-dimensional features using multiple generators includes:

[0063] In the multi-granularity feature extraction process, high-dimensional features and low-dimensional features are encoded separately, forming multiple sets of feature encoding vectors from high-dimensional to low-dimensional.

[0064] The medical image cross-modal generation method of the present invention will be described in detail below with specific embodiments:

[0065] Because different modalities of medical images have different emphases on the pathological information of the prominent response of the same physiological tissue, single-modal medical images often cannot reflect the complex characteristics of diseases. Therefore, comprehensive analysis of multiple modalities of medical images, utilizing the features and complementary relationships of different modal data, can provide doctors with more comprehensive diagnostic information and improve the accuracy of disease diagnosis. However, obtaining multiple modalities of medical images for the same patient in clinical practice is costly and carries potential health risks. This invention proposes a cross-modal generation method for medical images based on an adversarial strategy and a two-stage feature encoding and decoding model. This method can generate another modality of medical images using one or more source modalities of medical images, aiming to solve the problem of missing modalities in medical images.

[0066] This invention presents a semi-supervised generation model for cross-modal medical images based on progressive multi-granularity feature encoding and multi-frequency parallel decoding using adversarial strategies. It utilizes the semi-supervised characteristics of adversarial learning strategies to design a semi-supervised training method, which can fully leverage unpaired data for training to improve model performance. This invention proposes a multi-granularity convolution method to extract and encode features at various scales of pathological information in source modal medical images. Furthermore, it proposes a frequency-domain texture separation method to transform the target modal medical image into the wavelet domain for anisotropic texture separation, and then uses multiple generators for multi-frequency parallel generation, thereby improving the generation quality of subtle texture details in the target modal image and reducing the training difficulty of individual generators. This invention also proposes a progressive decoding method, progressively decoding from high-dimensional features to low-dimensional features, which helps to stably and gradually refine the pathological information of the target modal image.

[0067] Specifically, for ease of explanation, this invention uses the generation of PET (target modal medical images) from MRI and CT (source modal images) as an example for illustration.

[0068] In the first stage of this invention, a multi-granularity convolutional network structure is used to encode multi-granularity features from multiple source modalities of medical images (MRI, CT), extracting shared structural and pathological information among the various source modalities, and simultaneously generating a preliminary target modality image (PET). Next, the preliminarily generated PET image is transformed into the wavelet domain for lossless frequency domain decomposition, extracting the image's contour, lateral, longitudinal, and diagonal texture information, which is then fused with the features extracted by the multi-granularity convolution to form a unified encoding vector. The multi-granularity convolution scheme helps to solve the problem of information loss when extracting physiological structures and pathological information of various granularities from source modalities, better enhancing the pathological feature information of the generated image in the target modality, while ensuring consistency between pathological and brain structural information. Frequency domain texture separation helps to process texture information in each direction separately and generate texture information in the target modality image in parallel, solving the problem of blurred texture details in the final generated target modality image.

[0069] In the second stage of this invention, a frequency-division parallel progressive decoding structure is used. Four generators decode the feature encoding vector from the first stage from high-dimensional features to low-dimensional features, thereby generating a refined target modal image. Using multiple generators in parallel enhances the model's ability to capture and generate high-frequency detail textures (PET). The progressive decoding generation strategy facilitates pixel-by-pixel refinement from high-dimensional contours to low-dimensional features, resolving the issue of blurry generated target modal images.

[0070] This invention employs a two-stage progressive generation method, which helps to gradually improve the quality of generated target modality images (PET). In terms of training, this invention proposes a semi-supervised training method based on adversarial strategies, which can utilize prior information from multiple unpaired modal images for encoding and fusion, helping to solve the problem of insufficient training data samples for paired medical images. In summary, this invention mainly addresses the problems of missing and difficult acquisition of certain modalities in medical images, as well as high costs, through a cross-modal medical image generation method based on progressive multi-granularity feature encoding and multi-frequency parallel decoding using adversarial strategies.

[0071] This invention leverages the inherent semi-supervised nature of adversarial learning strategies, combined with the limited availability of paired medical images, to design a semi-supervised training strategy. This allows for training using both partially paired and unpaired multimodal medical images, improving the model's generation performance from source modal images to target modal images. In the encoding stage of shared features in source modal images (MRI, CT), this invention proposes a multi-granularity convolutional network structure to extract and encode features of physiological structures and pathological information at multiple scales in the source modal medical images. Furthermore, during multi-granularity feature extraction, high-dimensional and low-dimensional features are encoded separately, forming 18 sets of feature encoding vectors from high to low dimensions. A frequency-domain texture separation scheme is then used to decompose the initially generated target modal medical image (PET) into contour, lateral, longitudinal, and diagonal texture information. Finally, the 18 sets of feature vectors are fused with the contour, lateral, longitudinal, and diagonal texture information of the initially generated target modal image, ensuring that the feature encoding vectors incorporate the texture information of the initial target modal image.

[0072] In the feature decoding stage, a progressive decoding network structure is adopted, and four generators are used to focus on the generation of contour, horizontal texture, vertical texture, and diagonal texture, respectively. Eighteen feature vectors that integrate contour, horizontal, vertical, and diagonal texture information are used as inputs and fed into the generators of the progressive network structure, which decode and refine the generation step by step from the high-dimensional feature vectors.

[0073] For training the overall network model, this invention employs wavelet loss in the wavelet frequency domain, pixel-wise loss, perceptual loss (Lpips), and adversarial loss based on an adversarial strategy. The discriminant module for adversarial training utilizes a multi-layered convolutional neural network and employs a relative mean discriminant strategy. This means that when a decoded generated image is judged as real, a real image is relatively judged as fake. The relative mean discriminant module comprehensively considers the discrimination of generated and real data by the discriminant module, determining the probability that the input data is more real than the expected value of a random batch of samples of the opposing type. Introducing a relative discriminant strategy helps stabilize the training of the generative adversarial network.

[0074] For ease of explanation, this invention uses the generation of PET (target modality medical images) from MRI and CT (source modality images) as an example. The overall framework diagram of this invention is shown below. Figure 2 As shown. Specifically includes:

[0075] 1. Model Building

[0076] 1.1 Multi-granularity feature encoding module

[0077] See Figure 3This invention employs a pyramid-shaped multi-granularity convolutional network to extract and encode multi-granularity physiological tissue structure and pathological information from source modal medical images (MRI and CT images). It utilizes the 18 middle layers to represent high-dimensional and low-dimensional features, encoding them into feature encoding vectors, which are then transformed into a shape... The vector.

[0078] 1.2 Frequency Texture Separation Module

[0079] Medical images are transformed from the image domain to the spatial frequency domain using an approximately symmetric compactly supported orthogonal wavelet transform. The mathematical expression of the one-dimensional wavelet transform model is as follows:

[0080]

[0081] Wherein, the scaling function is:

[0082]

[0083] Wavelet function:

[0084]

[0085] Inner product sum These are the scaling coefficients and wavelet coefficients, respectively. Wavelet transform achieves two-dimensional wavelet transform by performing one-dimensional transformations on the rows and columns of two-dimensional data.

[0086] 1.3 Feature Encoding Vector Fusion Module

[0087] In this invention, the fusion method of feature encoding vector and wavelet domain anisotropic texture encoding vector is as follows: Figure 4 As shown, feature encoding vector fusion is performed using vector concatenation.

[0088] Figure 4 Feature encoding vector fusion

[0089] in, This represents three high-dimensional feature vectors. This represents the four intermediate dimension feature vectors. This represents 11 low-dimensional feature vectors.

[0090] 1.4 Fusion Vector Progressive Decoding Module

[0091] See Figure 5The decoding module of this invention employs a progressive decoding structure from high-dimensional features to low-dimensional fine-grained features. It utilizes multiple convolutional network structures to perform frequency division and parallel generation of contour information, horizontal texture, vertical texture, and diagonal texture information. Inverse wavelet transform is then used to recombine each frequency division component, ultimately generating the target modal medical image. Each texture generator accepts 18 sets of feature vectors as input, used to progressively control the generator to generate the target modal medical image from high to low dimensions, which can be represented as follows: .

[0092] 1.5 Relative Discrimination Module

[0093] This invention employs a discrimination module to relatively discriminate the quality of the generated target modal image (PET). This discrimination module is a multi-layer convolutional network structure. Relative discrimination involves using relative probabilities to distinguish between the real PET image and the generated PET image. The specific discrimination method is shown in the following equation:

[0094]

[0095]

[0096] in, and These represent real PET images and generated PET images, respectively. , This represents the output of the discrimination module for the transformation.

[0097] 2. Model Training

[0098] 2.1 Data Input

[0099] Step a1: The training data includes multiple source modalities of images (such as MRI, CT, etc.) in the dataset. ,in Represents mode, This indicates the amount of training data.

[0100] Step a2: The training data includes datasets with paired reference target modal images (such as PET). ,in This refers to the amount of training data.

[0101] Step a3: Combine the source modal medical image MRI and CT datasets The input is fed into a multi-granularity feature coding model to obtain the feature coding vector.

[0102] Step a4: The feature encoding vector is then fused with the frequency-division texture vector to obtain the fused encoding vector.

[0103] Step a5: Fuse the feature encoding vector as the control encoding input to the progressive decoder generator to control the generator to generate the target modal medical image PET.

[0104] 2.2 Calculate the feature encoding vector

[0105] Step b1: Process multi-source modal medical images using a pyramid-shaped multi-granularity convolutional neural network. Feature extraction is performed, and the features of each intermediate layer in the pyramid network are transformed into feature vectors. ,in .

[0106] Step b2: Multi-granularity convolutional network initially generates target modal medical images. .

[0107] Step b3: Transform the initially generated target modal medical image using wavelet transform. Transformed to the wavelet domain, the LL, LH, HL, and HH subbands are separated (corresponding to contour information A, horizontal texture H, vertical texture V, and diagonal texture D, respectively), and the feature encoding vectors of the textures in each direction are obtained. ,in .

[0108] Step b4: Encode the shared feature vector and texture feature encoding vector By concatenating the components, a fused feature encoding vector is obtained. ,in .

[0109] 2.3 Decoding and Generating Target Modal Images

[0110] Step c1: Fuse the feature encoding vector Input to generator control generator The target modal image is generated progressively from high-dimensional features to low-dimensional features.

[0111] Step c2: Generated anisotropic textured target modal image .

[0112] Step c3: The generated anisotropic textured target modal images can be reconstructed using inverse wavelet transform to obtain the final target modal medical image. ,in .

[0113] 2.4 Judgment Module Judgment

[0114] Step e1: Discrimination module For the generated target modal image Pre-judgment and discrimination with the target modal image This determines the relative truth value of two images. The calculation formula is as follows:

[0115]

[0116] in Represents the true distribution. This represents the expected distribution.

[0117] 2.5 Loss Function Calculation and Parameter Optimization

[0118] Step e1: Calculate the global per-pixel loss : ,in Represents the true distribution. Indicates the expected distribution. These are slack variables.

[0119] Step e2: Calculate the perceptual loss :

[0120] Step e3: Calculate wavelet loss : .

[0121] Step e4: Overall Loss Function : .

[0122] The key innovations and points of protection of this invention are at least as follows:

[0123] 1. This invention proposes a multi-granularity convolutional structure to extract and encode features of physiological tissue structures and pathological information of multiple granularities in source modal medical images. This helps to extract shared physiological tissue and pathological information in source modal medical images (MRI, CT) more comprehensively and avoids ignoring fine-grained physiological tissue or pathological information.

[0124] 2. This invention proposes to convert the initially generated target medical image (PET) into the wavelet domain for frequency domain texture separation, decomposing it into four sub-bands with contour, lateral, longitudinal, and diagonal texture information. This helps to process each sub-band using specialized methods, improves the model's sensitivity to various textures, and enhances the quality of texture details in the generated target modal image.

[0125] 3. This invention proposes a shared feature coding vector and frequency domain texture coding vector fusion module, which integrates wavelet domain contour, horizontal, vertical and diagonal texture information in the shared feature coding vector, thereby increasing the amount of information in the target modal medical image generation control coding and greatly improving the accuracy of texture details in the target modal medical image.

[0126] 4. This invention proposes a multi-frequency parallel generation mechanism for cross-modal medical image decoding and generation. Multiple generators generate contour information, horizontal texture, vertical texture, and diagonal texture information of the target modality medical image, respectively. This allows each generator to focus on generating specific texture details, decoupling the generators' tasks in generating various texture details of the target modality medical image. Compared to a single generation module, this also simplifies the generation task of a single generator, helps reduce the training difficulty of the generator, and ultimately improves the quality of the generated target modality medical image.

[0127] 5. This invention proposes a progressive decoding mechanism to progressively decode the feature encoding vectors that fuse texture information from high-dimensional features to low-dimensional features. The proposed progressive decoding network structure progressively decodes the shared feature encoding vectors of multiple modal images, which can effectively improve the stability and accuracy of generating target modal medical images and solve the problem of artifacts and noise easily generated by generative adversarial networks.

[0128] 6. This invention proposes a semi-supervised mechanism with adversarial strategy characteristics for cross-modal medical image generation. Given the limited availability of paired multimodal medical images, the semi-supervised mechanism helps to better utilize unpaired multimodal medical images, extract features from each modality, and improve the model's feature extraction capabilities and the generation performance of target modal medical images.

[0129] Compared with the prior art, the present invention has at least the following advantages:

[0130] 1. This invention is a cross-modal medical image generation model based on progressive multi-granularity feature encoding and multi-frequency parallel decoding using adversarial strategies. A semi-supervised training method is designed utilizing the semi-supervised nature of adversarial strategies. Compared to existing technologies, this method can fully utilize unpaired multi-modal medical image data for training, thereby improving model performance.

[0131] 2. This invention proposes a multi-granularity convolution method to extract and encode features of various scales of pathological information of different granularities in source modal medical images. Compared with existing technologies, this method can better extract pathological information of various scales in source modal medical images and use it to control the generation of target modal medical images.

[0132] 3. This invention proposes a frequency-domain texture separation mechanism that transforms the target modal medical image into the wavelet domain for frequency-domain texture (contour, horizontal, vertical, and diagonal texture) separation, and then uses multiple generators for multi-frequency parallel generation. Compared with existing methods, this improves the generation quality of the target modal image on minute detail textures and reduces the training difficulty of a single generator.

[0133] 4. This invention proposes a progressive decoding mechanism, performing progressive decoding from high-dimensional features to low-dimensional features. Compared with existing technologies, this helps to stably and progressively refine the pathological information of the target modality image, improving the accuracy of physiological structure and pathological information in the generated medical images.

[0134] The technical solution of the present invention has been tested and proven to be feasible, and it can generate refined target modal images, thus solving the problem of blurred target modal images.

[0135] The cross-modal medical image generation method of this invention can be applied to image generation tasks in other industry contexts. The difference lies in that only the training dataset and model parameters need to be adjusted during the training phase.

[0136] Example 2

[0137] According to another embodiment of the present invention, a medical image cross-modal generation device is provided, see [link to previous document]. Figure 6 ,include:

[0138] The feature extraction and encoding unit 201 is used to extract and encode features of various scales of pathological information in the source modal medical image using a multi-granularity convolution method to generate the target modal medical image.

[0139] The encoding vector fusion and transformation unit 202 is used to convert the target modal medical image to the wavelet domain for anisotropic texture separation using the frequency domain texture separation method, and then use multiple generators to perform multi-frequency parallel generation and fuse them into a unified encoding vector.

[0140] The progressive decoding unit 203 is used to progressively decode a uniform encoded vector from high-dimensional features to low-dimensional features using a progressive decoding method.

[0141] The medical image cross-modal generation device in this invention proposes a multi-granularity convolution method to extract and encode pathological information at various scales in source modal medical images. Furthermore, it proposes a frequency-domain texture separation method to transform the target modal medical image into the wavelet domain for anisotropic texture separation, and then uses multiple generators for multi-frequency parallel generation, thereby improving the generation quality of the target modal image in minute detail textures and reducing the training difficulty of a single generator. It also proposes a progressive decoding method, progressively decoding from high-dimensional features to low-dimensional features, which helps to stably and gradually refine the pathological information of the target modal image, aiming to solve the problem of missing modalities in medical images.

[0142] The medical image cross-modal generation device of the present invention will be described in detail below with reference to specific embodiments:

[0143] Because different modalities of medical images have different emphases on the pathological information of the prominent response of the same physiological tissue, single-modal medical images often cannot reflect the complex characteristics of diseases. Therefore, comprehensive analysis of multiple modalities of medical images, utilizing the features and complementary relationships of different modalities, can provide doctors with more comprehensive diagnostic information and improve the accuracy of disease diagnosis. However, obtaining multiple modalities of medical images for the same patient in clinical practice presents problems such as high cost and potential health risks. This invention proposes a cross-modal medical image generation device based on an adversarial strategy-based two-stage feature encoding and decoding model, which can generate another modality of medical image from one or more source modalities of medical images, aiming to solve the problem of missing modalities in medical images.

[0144] This invention presents a semi-supervised generation model for cross-modal medical images based on progressive multi-granularity feature encoding and multi-frequency parallel decoding using adversarial strategies. It utilizes the semi-supervised characteristics of adversarial learning strategies to design a semi-supervised training method, which can fully leverage unpaired data for training to improve model performance. This invention proposes a multi-granularity convolution method to extract and encode features at various scales of pathological information in source modal medical images. Furthermore, it proposes a frequency-domain texture separation method to transform the target modal medical image into the wavelet domain for anisotropic texture separation, and then uses multiple generators for multi-frequency parallel generation, thereby improving the generation quality of subtle texture details in the target modal image and reducing the training difficulty of individual generators. This invention also proposes a progressive decoding method, progressively decoding from high-dimensional features to low-dimensional features, which helps to stably and gradually refine the pathological information of the target modal image.

[0145] Specifically, for ease of explanation, this invention uses the generation of PET (target modal medical images) from MRI and CT (source modal images) as an example for illustration.

[0146] In the first stage of this invention, a multi-granularity convolutional network structure is used to encode multi-granularity features from multiple source modalities of medical images (MRI, CT), extracting shared structural and pathological information among the various source modalities, and simultaneously generating a preliminary target modality image (PET). Next, the preliminarily generated PET image is transformed into the wavelet domain for lossless frequency domain decomposition, extracting the image's contour, lateral, longitudinal, and diagonal texture information, which is then fused with the features extracted by the multi-granularity convolution to form a unified encoding vector. The multi-granularity convolution scheme helps to solve the problem of information loss when extracting physiological structures and pathological information of various granularities from source modalities, better enhancing the pathological feature information of the generated image in the target modality, while ensuring consistency between pathological and brain structural information. Frequency domain texture separation helps to process texture information in each direction separately and generate texture information in the target modality image in parallel, solving the problem of blurred texture details in the final generated target modality image.

[0147] In the second stage of this invention, a frequency-division parallel progressive decoding structure is used. Four generators decode the feature encoding vector from the first stage from high-dimensional features to low-dimensional features, thereby generating a refined target modal image. Using multiple generators in parallel enhances the model's ability to capture and generate high-frequency detail textures (PET). The progressive decoding generation strategy facilitates pixel-by-pixel refinement from high-dimensional contours to low-dimensional features, resolving the issue of blurry generated target modal images.

[0148] This invention employs a two-stage progressive generation method, which helps to gradually improve the quality of generated target modality images (PET). In terms of training, this invention proposes a semi-supervised training method based on adversarial strategies, which can utilize prior information from multiple unpaired modal images for encoding and fusion, helping to solve the problem of insufficient training data samples for paired medical images. In summary, this invention mainly addresses the problems of missing and difficult acquisition of certain modalities in medical images, as well as high costs, through a cross-modal medical image generation method based on progressive multi-granularity feature encoding and multi-frequency parallel decoding using adversarial strategies.

[0149] This invention leverages the inherent semi-supervised nature of adversarial learning strategies, combined with the limited availability of paired medical images, to design a semi-supervised training strategy. This allows for training using both partially paired and unpaired multimodal medical images, improving the model's generation performance from source modal images to target modal images. In the encoding stage of shared features in source modal images (MRI, CT), this invention proposes a multi-granularity convolutional network structure to extract and encode features of physiological structures and pathological information at multiple scales in the source modal medical images. Furthermore, during multi-granularity feature extraction, high-dimensional and low-dimensional features are encoded separately, forming 18 sets of feature encoding vectors from high to low dimensions. A frequency-domain texture separation scheme is then used to decompose the initially generated target modal medical image (PET) into contour, lateral, longitudinal, and diagonal texture information. Finally, the 18 sets of feature vectors are fused with the contour, lateral, longitudinal, and diagonal texture information of the initially generated target modal image, ensuring that the feature encoding vectors incorporate the texture information of the initial target modal image.

[0150] In the feature decoding stage, a progressive decoding network structure is adopted, and four generators are used to focus on the generation of contour, horizontal texture, vertical texture, and diagonal texture, respectively. Eighteen feature vectors that integrate contour, horizontal, vertical, and diagonal texture information are used as inputs and fed into the generators of the progressive network structure, which decode and refine the generation step by step from the high-dimensional feature vectors.

[0151] For training the overall network model, this invention employs wavelet loss in the wavelet frequency domain, pixel-wise loss, perceptual loss (Lpips), and adversarial loss based on an adversarial strategy. The discriminant module for adversarial training utilizes a multi-layered convolutional neural network and employs a relative mean discriminant strategy. This means that when a decoded generated image is judged as real, a real image is relatively judged as fake. The relative mean discriminant module comprehensively considers the discrimination of generated and real data by the discriminant module, determining the probability that the input data is more real than the expected value of a random batch of samples of the opposing type. Introducing a relative discriminant strategy helps stabilize the training of the generative adversarial network.

[0152] For ease of explanation, this invention uses the generation of PET (target modality medical images) from MRI and CT (source modality images) as an example. The overall framework diagram of this invention is shown below. Figure 2 As shown. Specifically includes:

[0153] 1. Model Building

[0154] 1.1 Multi-granularity feature encoding module

[0155] See Figure 3 This invention employs a pyramid-shaped multi-granularity convolutional network to extract and encode multi-granularity physiological tissue structure and pathological information from source modal medical images (MRI and CT images). It utilizes the 18 middle layers to represent high-dimensional and low-dimensional features, encoding them into feature encoding vectors, which are then transformed into a shape... The vector.

[0156] 1.2 Frequency Texture Separation Module

[0157] Medical images are transformed from the image domain to the spatial frequency domain using an approximately symmetric compactly supported orthogonal wavelet transform. The mathematical expression of the one-dimensional wavelet transform model is as follows:

[0158]

[0159] Wherein, the scaling function is:

[0160]

[0161] Wavelet function:

[0162]

[0163] Inner product sum These are the scaling coefficients and wavelet coefficients, respectively. Wavelet transform achieves two-dimensional wavelet transform by performing one-dimensional transformations on the rows and columns of two-dimensional data.

[0164] 1.3 Feature Encoding Vector Fusion Module

[0165] In this invention, the fusion method of feature encoding vector and wavelet domain anisotropic texture encoding vector is as follows: Figure 4 As shown, feature encoding vector fusion is performed using vector concatenation.

[0166] Figure 4 Feature encoding vector fusion

[0167] in, This represents three high-dimensional feature vectors. This represents the four intermediate dimension feature vectors. This represents 11 low-dimensional feature vectors.

[0168] 1.4 Fusion Vector Progressive Decoding Module

[0169] See Figure 5 The decoding module of this invention employs a progressive decoding structure from high-dimensional features to low-dimensional fine-grained features. It utilizes multiple convolutional network structures to perform frequency division and parallel generation of contour information, horizontal texture, vertical texture, and diagonal texture information. Inverse wavelet transform is then used to recombine each frequency division component, ultimately generating the target modal medical image. Each texture generator accepts 18 sets of feature vectors as input, used to progressively control the generator to generate the target modal medical image from high to low dimensions, which can be represented as follows: .

[0170] 1.5 Relative Discrimination Module

[0171] This invention employs a discrimination module to relatively discriminate the quality of the generated target modal image (PET). This discrimination module is a multi-layer convolutional network structure. Relative discrimination involves using relative probabilities to distinguish between the real PET image and the generated PET image. The specific discrimination method is shown in the following equation:

[0172]

[0173]

[0174] in, and These represent real PET images and generated PET images, respectively. , This represents the output of the discrimination module for the transformation.

[0175] 2. Model Training

[0176] 2.1 Data Input

[0177] Step a1: The training data includes multiple source modalities of images (such as MRI, CT, etc.) in the dataset. ,in Represents mode, This indicates the amount of training data.

[0178] Step a2: The training data includes datasets with paired reference target modal images (such as PET). ,in This refers to the amount of training data.

[0179] Step a3: Combine the source modal medical image MRI and CT datasets The input is fed into a multi-granularity feature coding model to obtain the feature coding vector.

[0180] Step a4: The feature encoding vector is then fused with the frequency-division texture vector to obtain the fused encoding vector.

[0181] Step a5: Fuse the feature encoding vector as the control encoding input to the progressive decoder generator to control the generator to generate the target modal medical image PET.

[0182] 2.2 Calculate the feature encoding vector

[0183] Step b1: Process multi-source modal medical images using a pyramid-shaped multi-granularity convolutional neural network. Feature extraction is performed, and the features of each intermediate layer in the pyramid network are transformed into feature vectors. ,in .

[0184] Step b2: Multi-granularity convolutional network initially generates target modal medical images. .

[0185] Step b3: Transform the initially generated target modal medical image using wavelet transform. Transformed to the wavelet domain, the LL, LH, HL, and HH subbands are separated (corresponding to contour information A, horizontal texture H, vertical texture V, and diagonal texture D, respectively), and the feature encoding vectors of the textures in each direction are obtained. ,in .

[0186] Step b4: Encode the shared feature vector and texture feature encoding vector By concatenating the components, a fused feature encoding vector is obtained. ,in .

[0187] 2.3 Decoding and Generating Target Modal Images

[0188] Step c1: Fuse the feature encoding vector Input to generator control generator The target modal image is generated progressively from high-dimensional features to low-dimensional features.

[0189] Step c2: Generated anisotropic textured target modal image .

[0190] Step c3: The generated anisotropic textured target modal images can be reconstructed using inverse wavelet transform to obtain the final target modal medical image. ,in .

[0191] 2.4 Judgment Module Judgment

[0192] Step e1: Discrimination module For the generated target modal image Pre-judgment and discrimination with the target modal image This determines the relative truth value of two images. The calculation formula is as follows:

[0193]

[0194] in Represents the true distribution. This represents the expected distribution.

[0195] 2.5 Loss Function Calculation and Parameter Optimization

[0196] Step e1: Calculate the global per-pixel loss : ,in Represents the true distribution. Indicates the expected distribution. These are slack variables.

[0197] Step e2: Calculate the perceptual loss :

[0198] Step e3: Calculate wavelet loss : .

[0199] Step e4: Overall Loss Function : .

[0200] The key innovations and points of protection of this invention are at least as follows:

[0201] 1. This invention proposes a multi-granularity convolutional structure to extract and encode features of physiological tissue structures and pathological information of multiple granularities in source modal medical images. This helps to extract shared physiological tissue and pathological information in source modal medical images (MRI, CT) more comprehensively and avoids ignoring fine-grained physiological tissue or pathological information.

[0202] 2. This invention proposes to convert the initially generated target medical image (PET) into the wavelet domain for frequency domain texture separation, decomposing it into four sub-bands with contour, lateral, longitudinal, and diagonal texture information. This helps to process each sub-band using specialized methods, improves the model's sensitivity to various textures, and enhances the quality of texture details in the generated target modal image.

[0203] 3. This invention proposes a shared feature coding vector and frequency domain texture coding vector fusion module, which integrates wavelet domain contour, horizontal, vertical and diagonal texture information in the shared feature coding vector, thereby increasing the amount of information in the target modal medical image generation control coding and greatly improving the accuracy of texture details in the target modal medical image.

[0204] 4. This invention proposes a multi-frequency parallel generation mechanism for cross-modal medical image decoding and generation. Multiple generators generate contour information, horizontal texture, vertical texture, and diagonal texture information of the target modality medical image, respectively. This allows each generator to focus on generating specific texture details, decoupling the generators' tasks in generating various texture details of the target modality medical image. Compared to a single generation module, this also simplifies the generation task of a single generator, helps reduce the training difficulty of the generator, and ultimately improves the quality of the generated target modality medical image.

[0205] 5. This invention proposes a progressive decoding mechanism to progressively decode the feature encoding vectors that fuse texture information from high-dimensional features to low-dimensional features. The proposed progressive decoding network structure progressively decodes the shared feature encoding vectors of multiple modal images, which can effectively improve the stability and accuracy of generating target modal medical images and solve the problem of artifacts and noise easily generated by generative adversarial networks.

[0206] 6. This invention proposes a semi-supervised mechanism with adversarial strategy characteristics for cross-modal medical image generation. Given the limited availability of paired multimodal medical images, the semi-supervised mechanism helps to better utilize unpaired multimodal medical images, extract features from each modality, and improve the model's feature extraction performance and the generation performance of target modal medical images.

[0207] Compared with the prior art, the present invention has at least the following advantages:

[0208] 1. This invention is a cross-modal medical image generation model based on progressive multi-granularity feature encoding and multi-frequency parallel decoding using adversarial strategies. A semi-supervised training method is designed utilizing the semi-supervised nature of adversarial strategies. Compared to existing technologies, this method can fully utilize unpaired multi-modal medical image data for training, thereby improving model performance.

[0209] 2. This invention proposes a multi-granularity convolution method to extract and encode features of various scales of pathological information of different granularities in source modal medical images. Compared with existing technologies, this method can better extract pathological information of various scales in source modal medical images and use it to control the generation of target modal medical images.

[0210] 3. This invention proposes a frequency-domain texture separation mechanism that transforms the target modal medical image into the wavelet domain for frequency-domain texture (contour, horizontal, vertical, and diagonal texture) separation, and then uses multiple generators for multi-frequency parallel generation. Compared with existing methods, this improves the generation quality of the target modal image on minute detail textures and reduces the training difficulty of a single generator.

[0211] 4. This invention proposes a progressive decoding mechanism, performing progressive decoding from high-dimensional features to low-dimensional features. Compared with existing technologies, this helps to stably and progressively refine the pathological information of the target modality image, improving the accuracy of physiological structure and pathological information in the generated medical images.

[0212] The technical solution of the present invention has been tested and proven to be feasible, and it can generate refined target modal images, thus solving the problem of blurred target modal images.

[0213] The cross-modal medical image generation method of this invention can be applied to image generation tasks in other industry contexts. The difference lies in that only the training dataset and model parameters need to be adjusted during the training phase.

[0214] Example 3

[0215] A storage medium storing program files capable of implementing any of the above-described medical image cross-modal generation methods.

[0216] Example 4

[0217] A processor for running a program, wherein the program executes any of the above-mentioned methods for cross-modal generation of medical images.

[0218] The sequence numbers of the above embodiments of the present invention are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.

[0219] In the above embodiments of the present invention, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.

[0220] In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. The system embodiments described above are merely illustrative; for example, the division of units can be a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection of units or modules may be electrical or other forms.

[0221] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0222] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0223] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.

[0224] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.

Claims

1. A medical image cross-modality generation method, characterized in that, Includes the following steps: Multi-granularity convolution is used to extract and encode features of pathological information at various scales in the source modal medical images to generate target modal medical images. The target modal medical image is converted to the wavelet domain for anisotropic texture separation using a frequency domain texture separation method. Then, multiple generators are used to generate multi-frequency parallel generation and fuse them into a unified coding vector. The progressive decoding method is used to progressively decode the unified coding vector from high-dimensional features to low-dimensional features; The method of using frequency domain texture separation to convert the target modal medical image to the wavelet domain for anisotropic texture separation, and then using multiple generators to perform multi-frequency parallel generation and fuse them into a unified encoding vector includes: The initially generated target modal medical image is converted to the wavelet domain for lossless frequency domain decomposition, extracting the contour, horizontal, vertical and diagonal texture information of the image, and then fused with the features extracted by multi-granularity convolution to transform it into a unified coding vector. Multiple sets of feature vectors are fused with the contour, lateral, longitudinal, and diagonal texture information of the initially generated target modal image, and the feature encoding vector contains the texture information of the initial target modal image.

2. The medical image cross-modality generation method of claim 1, wherein, The method further includes: A semi-supervised generative model for cross-modal medical images based on progressive multi-granularity feature encoding and multi-frequency parallel decoding using an adversarial strategy is constructed. Using this model, a semi-supervised training method is designed with the semi-supervised characteristics of the adversarial learning strategy, and unpaired data is used for training to improve the model's performance.

3. The medical image cross-modality generation method of claim 2, wherein, The adversarial training discrimination module consists of a multi-layer convolutional neural network and uses a relative mean discrimination strategy for discrimination.

4. The medical image cross-modality generation method of claim 1, wherein, The method of using multi-granularity convolution to extract and encode features of pathological information at various scales in the source modality medical image to generate the target modality medical image includes: A multi-granularity convolutional network structure is used to encode the source modalities of medical images from multiple sources, extracting shared structural and pathological information features among the multiple source modalities, and generating preliminary target modal images.

5. The medical image cross-modal generation method according to claim 1, characterized in that, The method of progressive decoding, which progressively decodes the unified encoded vector from high-dimensional features to low-dimensional features, includes: Multiple generators are used to decode the unified encoded vector from high-dimensional features to low-dimensional features.

6. The medical image cross-modality generation method of claim 5, wherein, The step of using multiple generators to decode the unified encoded vector from high-dimensional features to low-dimensional features includes: In the multi-granularity feature extraction process, high-dimensional features and low-dimensional features are encoded separately, forming multiple sets of feature encoding vectors from high-dimensional to low-dimensional. 7.A medical image cross-modality generation apparatus, characterized by, include: The feature extraction and encoding unit is used to extract and encode pathological information at various scales in the source modal medical images using multi-granularity convolution methods to generate target modal medical images. The encoding vector fusion and transformation unit is used to transform the target modal medical image into the wavelet domain for anisotropic texture separation using a frequency domain texture separation method. Then, multiple generators are used to generate multiple frequencies in parallel and fuse them into a unified encoding vector. Specifically, it includes: transforming the initially generated target modal medical image into the wavelet domain for lossless frequency domain decomposition, extracting the contour, lateral, longitudinal, and diagonal texture information of the image, and fusing it with the features extracted by multi-granularity convolution to transform it into a unified encoding vector; fusing multiple sets of feature vectors with the contour, lateral, longitudinal, and diagonal texture information of the initially generated target modal image, so that the feature encoding vector contains the texture information of the initial target modal image; The progressive decoding unit for encoded vectors is used to progressively decode a uniform encoded vector from high-dimensional features to low-dimensional features using a progressive decoding method.