A medical image cross-modality conversion method based on wavelet decomposition and parallel schrodinger bridge
By employing wavelet decomposition and a parallel Schrödinger bridge method, the frequency components of MRI and CT images are explicitly processed, solving the problems of structural distortion and texture loss in image conversion in existing technologies. This achieves high-precision cross-modal conversion of medical images, making it suitable for clinical applications.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ZHEJIANG NORMAL UNIV
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-19
AI Technical Summary
Existing medical image cross-modal conversion technologies fail to explicitly decouple and process different frequency components separately when processing MRI and CT images, resulting in structural distortion and loss of texture details in the converted images, affecting anatomical accuracy and diagnostic value. Furthermore, existing methods have high computational costs and unstable training, making it difficult to meet the needs of real-time clinical processing.
A cross-modal conversion method for medical images based on wavelet decomposition and parallel Schrödinger bridge is adopted. Wavelet decomposition is performed through a feature decomposition module, high-frequency and low-frequency features are processed separately using a parallel diffusion Schrödinger bridge module, and interactive fusion is performed through an attention fusion module to generate the target modality image.
It significantly improves image quality, preserves organ boundary integrity and key anatomical information, and generates highly accurate target modal images, meeting the accuracy and efficiency requirements of clinical applications.
Smart Images

Figure CN122243726A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of medical image modality conversion technology, and in particular to a medical image cross-modality conversion method based on wavelet decomposition and parallel Schrödinger bridge. Background Technology
[0002] Medical imaging plays a crucial role in clinical diagnosis and treatment planning, with MRI (magnetic resonance imaging) and CT (computed tomography) being two widely used and complementary modalities. Converting MRI images to CT images has significant clinical value in areas such as radiotherapy planning and surgical navigation, avoiding duplicate scans and reducing radiation exposure.
[0003] Existing MRI-to-CT image conversion techniques primarily rely on deep learning models, specifically including: methods based on Generative Adversarial Networks (GANs), such as CycleGAN and Pix2Pix. These methods learn image style transfer through adversarial training between the generator and discriminator. CycleGAN utilizes cycle consistency loss to achieve bidirectional conversion of unpaired data, while Pix2Pix uses conditional GANs to process paired data; and methods based on diffusion models, such as the Denoising Diffusion Probability Model (DDPM) and Score-Based Generative Models (SGM), which generate images through a framework of progressively adding noise (forward process) and denoising (backward process).
[0004] The Diffusion Schrödinger Bridge (DSB), an extension of the diffusion model, is based on optimal transport theory and uses the Iterative Proportional Fit (IPF) algorithm to find the optimal path connecting two distributions, demonstrating excellent performance in image generation tasks. SynDiff and I²SB are specific works applying the diffusion / bridge model to medical image conversion. SynDiff uses an adversarial diffusion model, while I²SB applies the Schrödinger bridge concept to image restoration tasks.
[0005] Despite advancements in existing methods, significant differences exist in the frequency domain feature distributions of MRI and CT images in high-precision tasks like medical image conversion. CT images are sensitive to hard tissues such as bone and are rich in high-frequency details, while MRI excels at displaying soft tissues and contains more mid- and low-frequency information. Existing methods treat the image as a whole, failing to explicitly decouple and process these frequency components with different physical meanings separately, leading to structural distortion and loss of texture details in the converted image. Due to these frequency ambiguities, existing methods are prone to blurring or distortion when converting complex anatomical structures (such as the boundaries of multiple abdominal organs), impairing the anatomical accuracy of the image and directly affecting its reliability in clinical applications such as precision radiotherapy. Ignoring frequency differences can cause structural distortion at the boundaries of abdominal organs. Existing models struggle to effectively preserve key anatomical information from the source image (MRI) during conversion, resulting in a gap in mutual information (NMI) between the converted CT image and the original CT image, reducing its diagnostic value. GAN models are prone to training instability and model collapse, while traditional diffusion models and DSB methods typically have huge computational costs and slow convergence speeds, making them unsuitable for real-time or near-real-time clinical processing.
[0006] Therefore, proposing a medical image cross-modal conversion method based on wavelet decomposition and parallel Schrödinger bridge to solve the difficulties of existing technologies is a problem that urgently needs to be solved by those skilled in the art. Summary of the Invention
[0007] In view of this, the present invention provides a medical image cross-modal conversion method based on wavelet decomposition and parallel Schrödinger bridge, which can significantly improve the quality and clinical applicability of medical image cross-modal conversion.
[0008] To achieve the above objectives, the present invention adopts the following technical solution: A cross-modal conversion method for medical images based on wavelet decomposition and a parallel Schrödinger bridge includes: S1. Construct a cross-modal conversion framework for medical images based on wavelet decomposition and parallel Schrödinger bridge, including: feature decomposition module, parallel diffusion Schrödinger bridge module and attention fusion module; S2. Input the source modal medical image into the medical image cross-modal transformation framework based on wavelet decomposition and parallel Schrödinger bridge. The feature decomposition module performs wavelet decomposition on the input source modal medical image to obtain the high-frequency and low-frequency features of the source modal medical image. S3, the parallel diffusion Schrödinger bridge module performs cross-modal mapping on the high-frequency and low-frequency features of the source modal medical image to generate the high-frequency and low-frequency features of the target modal image; S4. The attention fusion module performs interactive fusion of high-frequency and low-frequency features of the target modality image to generate the final target modality image.
[0009] Optionally, in the above method, in S2, the feature decomposition module performs wavelet decomposition on the input source modal medical image, specifically as follows: The size of the input source modal medical image is The feature decomposition module uses Haar wavelet transform to perform one-dimensional Haar wavelet transform in the row and column directions on the input source modal medical image, obtaining low-frequency components and high-frequency components in the horizontal, vertical and diagonal directions. The three high-frequency components are concatenated and then combined with the low-frequency components for convolution, activation, and batch normalization to obtain the high-frequency features of the source modality medical image. and low frequency characteristics , Capture the texture information of the image. It reflects the overall structure and outline of the image.
[0010] Optionally, in the above method, in S3, the parallel diffusion Schrödinger bridge module performs cross-modal mapping on the high-frequency and low-frequency features of the source modality medical image to generate the high-frequency and low-frequency features of the target modality image, specifically as follows: The parallel diffusion Schrödinger bridge module contains two independent diffusion bridge models, which perform inter-modal probability path mapping on the high-frequency and low-frequency features of the source modality medical image, respectively, to obtain the high-frequency features corresponding to the target modality image. and low frequency characteristics .
[0011] Optionally, in the above method, in S4, the attention fusion module interactively fuses the high-frequency and low-frequency features of the target modality image to generate the final target modality image, specifically as follows: Attention fusion module and Interactive fusion is performed, employing a cross-attention mechanism to calculate high-frequency features. and low frequency characteristics The attention weights between features are calculated, and the features are weighted and fused based on these weights to generate the final target modality image.
[0012] As can be seen from the above technical solutions, compared with the prior art, the present invention provides a medical image cross-modal conversion method based on wavelet decomposition and parallel Schrödinger bridge, which has the following beneficial effects: The present invention outperforms existing methods in terms of pixel accuracy, structural fidelity, information retention, and visual realism, and significantly improves image quality; through frequency separation and parallel bridge model, it can explicitly process the structural and textural components of the image separately, generate a highly faithful target modal image, preserve the integrity of organ boundaries, and effectively solve the problems of structural distortion and texture loss; the cross-attention fusion mechanism used can dynamically and adaptively integrate features from different frequency domains, ensuring that not only the key anatomical information of the source image is preserved during the conversion process, but also tissue texture that conforms to the characteristics of the target modality is generated, significantly improving the clinical diagnostic value of the generated image and realizing the intelligent retention and fusion of key diagnostic information; the use of Haar wavelets for frequency decomposition provides a guarantee for achieving the best balance between strong edge detection capability and computational efficiency; while achieving optimal image quality, the present invention maintains reasonable model complexity and computational overhead, and its inference speed can meet the needs of most non-real-time clinical application scenarios, demonstrating good clinical practical potential and promotional value. Attached Figure Description
[0013] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.
[0014] Figure 1 A flowchart of a medical image cross-modal conversion method based on wavelet decomposition and parallel Schrödinger bridge provided by the present invention; Figure 2 A framework diagram for cross-modal conversion of medical images based on wavelet decomposition and parallel Schrödinger bridge provided by this invention; Figure 3 This is a structural diagram of the Feature Decomposition (FDM) module in the medical image cross-modal conversion framework based on wavelet decomposition and parallel Schrödinger bridge provided by the present invention. Figure 4 This is a structural diagram of the attention fusion module (AFM) in the medical image cross-modal conversion framework based on wavelet decomposition and parallel Schrödinger bridge provided by the present invention. Figure 5 In a specific embodiment provided by the present invention, the generation results of the method of the present invention and the methods of CycleGAN, Pix2Pix, SynDiff, I²SB, BBDM, SelfRDB, etc. are visually compared on four datasets; Figure 6The figure below shows the results of parameter sensitivity analysis on the MMWHS dataset in a specific embodiment of the present invention. Detailed Implementation
[0015] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0016] Reference Figure 1 As shown, this invention discloses a medical image cross-modal conversion method based on wavelet decomposition and a parallel Schrödinger bridge, comprising: S1. Construct a cross-modal conversion framework for medical images based on wavelet decomposition and a parallel Schrödinger bridge, referring to... Figure 2 As shown, it includes: Feature Decomposition Module (FDM), Parallel Diffusion Schrödinger Bridge Module (PDSB), and Attention Fusion Module (AFM). S2. Input the source modal medical image into the medical image cross-modal transformation framework based on wavelet decomposition and parallel Schrödinger bridge. The feature decomposition module performs wavelet decomposition on the input source modal medical image to obtain the high-frequency and low-frequency features of the source modal medical image. S3, the parallel diffusion Schrödinger bridge module performs cross-modal mapping on the high-frequency and low-frequency features of the source modal medical image to generate the high-frequency and low-frequency features of the target modal image; S4. The attention fusion module performs interactive fusion of high-frequency and low-frequency features of the target modality image to generate the final target modality image.
[0017] Furthermore, in S2, the feature decomposition module performs wavelet decomposition on the input source modal medical image, specifically as follows: The size of the input source modal medical image is , refer to Figure 3 As shown, the feature decomposition module uses Haar wavelet transform to perform one-dimensional Haar wavelet transform in the row and column directions on the input source modal medical image, obtaining low-frequency components and high-frequency components in the horizontal, vertical and diagonal directions. The three high-frequency components are concatenated and then combined with the low-frequency components for convolution, activation, and batch normalization to obtain the high-frequency features of the source modality medical image. and low frequency characteristics , Capture the texture information of the image. It reflects the overall structure and outline of the image.
[0018] Furthermore, in the feature decomposition module, assuming there is two-dimensional data... Its size is ,and ,in For integers; perform a one-dimensional Haar wavelet transform on each row of the two-dimensional data: For a one-dimensional transformation in the row direction, if the row data is Then the approximation coefficient and detail coefficient The calculation is as follows: In this way, each row of data is decomposed into low-frequency and high-frequency components, and the entire image obtains low-frequency and high-frequency subbands in the row direction. Then, the low-frequency and high-frequency subbands in the row direction (assuming their respective data are...) are... and Perform a one-dimensional Haar wavelet transform along the column direction, and calculate the corresponding approximation coefficients and detail coefficients as follows: for The list , Let the column data be ; for The list , Let the column data be ; Finally, the three high-frequency components are concatenated and then combined with the low-frequency components for convolution, activation, and batch normalization to obtain the final high- and low-frequency features.
[0019] Furthermore, in S3, the parallel diffusion Schrödinger bridge module performs cross-modal mapping on the high-frequency and low-frequency features of the source modality medical image to generate the high-frequency and low-frequency features of the target modality image, specifically: The parallel diffusion Schrödinger bridge module contains two independent diffusion bridge models, which perform inter-modal probability path mapping on the high-frequency and low-frequency features of the source modality medical image, respectively, to obtain the high-frequency features corresponding to the target modality image. and low frequency characteristics .
[0020] Furthermore, in the attention fusion module, the high- and low-frequency features of the input are layer-normalized, and then three learnable dimensional weight matrices are used to map the high- and low-frequency features to linearly. and .
[0021] Use learnable matrices , and The expression is as follows: In calculating attention weights and At that time, another frequency feature query is used for calculation, as shown in the following expression: Then, convolutional operations are performed to obtain the final output. Throughout the process, high- and low-frequency features calculate attention based on their own information and information about each other, and adjust their weights using information from other features. This allows the model to dynamically focus on valuable parts of features at different frequencies.
[0022] Furthermore, in S4, the attention fusion module interactively fuses the high-frequency and low-frequency features of the target modality image to generate the final target modality image, specifically: Attention fusion module and To achieve interactive integration, refer to Figure 4 As shown, a cross-attention mechanism is used to calculate high-frequency features. and low frequency characteristics The attention weights between features are calculated, and the features are weighted and fused based on these weights to generate the final target modality image.
[0023] In one specific embodiment, a comprehensive evaluation was performed on a public dataset covering multiple anatomical regions, including the brain, heart, abdomen, and head and neck. Tables 1, 2, 3, and 4 show the quantitative evaluation results (PSNR, SSIM, NMI, FID) of the method described in this invention (FreDec) on four datasets, along with methods such as CycleGAN, Pix2Pix, SynDiff, I²SB, BBDM, and SelfRDB. The best results are indicated in bold, and the second-best results are indicated by underline.
[0024]
[0025] Table 2 Comparison of quantitative performance on the MMWHS (Heart) dataset
[0026] Table 3 Comparison of quantitative performance on the CHAOS (abdomen) dataset
[0027] Table 4 Comparison of quantitative performance on the HaN-Seg (Head and Neck) dataset
[0028] FreDec achieved state-of-the-art or best-in-class performance on all evaluation metrics across the four datasets; notably, on the most challenging abdominal (CHAOS) dataset, FreDec improved PSNR by approximately 0.42 compared to the second-best performing SelfRDB. The dB improvement in NMI by 0.003 and the reduction in FID by 3.4 fully demonstrate its powerful ability to handle complex anatomical structures and individual differences. FreDec ranked first in SSIM across all four datasets, especially achieving 0.922 on the HaN-Seg dataset. This indicates that our frequency decoupling strategy and parallel DSB design effectively preserve and accurately transform the global anatomical structure of the source images. FreDec achieved the best NMI performance across all datasets, with a significant advantage on the abdominal and head and neck datasets. This validates that TSDM and BFM can effectively capture and preserve key diagnostic information in MRI and map it to the CT domain. FreDec outperformed in PSNR across the board, indicating that its generated images are closer to real CT images at the pixel level. At the same time, the lowest FID value indicates that the images generated by FreDec are not only of higher quality but also closer to the distribution of real CT images in terms of diversity.
[0029] GAN-based methods (CycleGAN, Pix2Pix) perform reasonably well on SSIM, but perform poorly on PSNR and FID, reflecting pixel distortion and insufficient diversity in the generated images. Diffusion-based methods (SynDiff, SelfRDB) perform better overall, demonstrating the powerful generative capabilities of diffusion models. Bridge-based methods (I²SB, BBDM) are similar to our work, but FreDec surpasses them in all metrics by introducing frequency decoupling and bidirectional fusion, highlighting the advancement of the deep frequency decoupling paradigm.
[0030] Reference Figure 5 As shown, a visual comparison of the results generated by this invention with those generated by other methods is presented on four datasets. Each sub-image includes the generated CT image, the input MRI source image, and the actual CT target image, with a magnified view of key regions provided in the second row. (1) Brain (Harvard): The magnified area focuses on the area around the ventricles; the results of CycleGAN and Pix2Pix are obviously blurry, and the boundaries of the ventricles are unclear; I²SB is generally acceptable, but after magnification, the edges are distorted and details are lost; SynDiff and SelfRDB have higher clarity, but the texture is slightly smooth; the image generated by FreDec is closest to the real CT in terms of overall structure and local texture, with sharp edges of the ventricles and rich and natural texture of the surrounding tissues; (2) Heart (MMWHS): The magnified area shows the structure of the heart valves; the results of the GAN-based method are blurry, and the valve details are almost lost; I²SB and BBDM can outline the general shape, but lack sharpness; FreDec and SelfRDB both perform well, but FreDec is better in the texture details of the valve tip and junction, and is closer to the presentation of real CT. (3) Abdomen (CHAOS): The magnified area is the edge of the liver and the spine; this is an extremely challenging scene; most baseline models (such as CycleGAN, I²SB) generate structures with severe distortion or blurring, and the liver boundary is irregular; SelfRDB has improved this, but the spine texture is blurred; FreDec significantly preserves the smooth outline of the liver and the clear bone texture of the spine. Although there is still room for improvement in the extremely fine vascular structures, its fidelity far exceeds that of other methods. (4) Head and Neck (HaN-Seg): The magnified area shows the mandible and teeth region; all methods performed relatively well on this dataset because the anatomical structure is fixed; however, FreDec showed the highest visual fidelity and sharpness in the fine structure of teeth, the clarity of the bone cortex, and the transition at the soft tissue-bone junction. The visualization results consistently demonstrate that FreDec has a significant advantage in maintaining global anatomical consistency and restoring local high-frequency detail textures, and the resulting images have excellent "realism" and clinical reference value.
[0031] In another specific embodiment, to verify the effectiveness of each core component in the FreDec framework, we conducted a systematic ablation experiment on the MMWHS dataset, and the results are shown in Table 5.
[0032] Comparing Model A (baseline) with Models B, C, and D, it can be seen that introducing any form of frequency decomposition (DFT, DCT, Haar wavelet) can bring significant performance improvements (PSNR improvement > 2 dB). This indicates that treating the image as a whole has fundamental limitations, and frequency decoupling is effective. Model D (Haar wavelet) significantly outperforms Model B (DFT) and Model C (DCT), with a PSNR increase of approximately 1.1 dB. This verifies the advantage of wavelet transform in capturing the local spatial-frequency characteristics of medical images, and its multi-resolution analysis capability is more suitable for anatomical structures. Although DFT and DCT can also decompose frequencies, they are global transforms and cannot simultaneously preserve spatial information, resulting in inferior local structure preservation compared to wavelets. Comparing Model D and Model E, replacing simple feature stitching with BFM while maintaining Haar decomposition and a single DSB improves performance by approximately 0.4 dB. The PSNR improvement of dB demonstrates that our proposed cross-attention fusion mechanism can more effectively integrate structural and textural information to achieve synergistic enhancement between features. Compared with model E and the complete FreDec model F, splitting the single DSB into two parallel DSBs to process high-frequency and low-frequency features respectively brings further performance gains (PSNR improvement of 0.32 dB and SSIM improvement of 0.031 dB). This confirms our core hypothesis: high-frequency and low-frequency features have distinct cross-modal mapping rules, making it crucial to design dedicated nonlinear transformers for them.
[0033] Furthermore, we compared the effects of different wavelet basis functions within the complete framework, as shown in Table 6; the Haar wavelet achieved the best balance between performance and computational complexity.
[0034]
[0035] FreDec's performance is affected by several design choices; we conducted a detailed sensitivity study on the MMWHS dataset to guide hyperparameter selection and demonstrate the rationality of the default configuration. The analysis results are referenced in [reference needed]. Figure 6 As shown: (1) Number of Haar wavelet decomposition layers Number of floors This determines the level of detail in which the input image is decomposed into frequency subbands, such as... Figure 6 As shown in (a), when Performance improvement when increasing from 1 to 3; shallow decomposition ( The inability to fully separate global anatomical structure from local texture limits the benefits of dedicated processing; L At 3, the model achieves an optimal balance: low-frequency components stably encode the overall heart structure, while high-frequency components capture sufficiently detailed texture; increasing to 3, the model achieves a better balance. L=4 only brings a marginal gain (PSNR +0.07 dB), but increases the computational cost of subsequent DSBs by 22% because they need to process a larger number of feature maps; therefore, choosing L =3 as the default value provides a favorable trade-off between fidelity and model complexity; (2) Reconstruction loss weight This weight balances the diffusion loss of parallel DSB and the pixel-level L1 reconstruction loss; Figure 6 (b) shows the reconstruction loss weights. The impact; too small ( ) and too large ( The value of each weight will decrease performance; too low a weight will underestimate the importance of reconstruction, resulting in a high FID despite a reasonable PSNR, indicating a mismatch in high-level feature distribution; too high a weight will force the model to overfit pixel-level matching, suppressing the generation diversity of the diffusion process, resulting in an overly smooth output and a low PSNR; At this time, PSNR, SSIM, and FID achieve optimal performance; (3) Computational efficiency analysis: We compared the computational cost of FreDec with other diffusion-based methods in Table 7; all metrics were measured on an NVIDIA RTX 4090 GPU with an image size of 256×256.
[0036]
[0037] The data reveals FreDec's nuanced efficiency characteristics; its parameter count is slightly higher than the efficient I²SB model, a direct result of using two parallel networks; however, its FLOPs are significantly lower than the large-scale SynDiff model. This indicates that while the parallel architecture increases the number of parameters, the computational cost per forward pass is well controlled. This is because each DSB processes the downsampled frequency components, which reduces the spatial dimension and computational intensity, making it more efficient than processing full-resolution images in a single complex model (such as SynDiff). Therefore, FreDec strikes a balance between the parametric efficiency of I²SB and the high capacity of SynDiff, while achieving the lowest FLOPs among competitive parametric models.
[0038] FreDec achieves a competitive inference time of 0.44 seconds per image; it is more than 36% faster than SynDiff, but slightly slower than the more streamlined I²SB; this is consistent with its architectural position; it carries more parameters for dedicated pathways than I²SB and requires slightly more computation, but its efficient design ensures that it does not approach SynDiff's slower inference; this speed is clinically feasible for non-real-time applications such as preoperative planning.
[0039] The most critical insight is FreDec's superior accuracy-efficiency tradeoff; it delivers significantly higher PSNR than both I²SB and SynDiff; FreDec is well-positioned in the joint space of performance and efficiency; it provides a significant accuracy improvement over I²SB at the cost of a small increase in inference time; compared to SynDiff, it offers both higher accuracy and faster inference; therefore, the modest increase in parameters and computational cost relative to the simplest baseline is directly justified by a disproportionate gain in translation fidelity; FreDec is optimized for scenarios where the highest possible image quality is desired and sub-second inference times are acceptable.
[0040] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, for system or system embodiments, since they are basically similar to method embodiments, the description is relatively simple, and relevant parts can be referred to the descriptions in the method embodiments. The systems and system embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without creative effort.
[0041] The above description of the disclosed embodiments enables those skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A method for cross-modal conversion of medical images based on wavelet decomposition and parallel Schrödinger bridge, characterized in that, include: S1. Construct a cross-modal conversion framework for medical images based on wavelet decomposition and parallel Schrödinger bridge, including: feature decomposition module, parallel diffusion Schrödinger bridge module and attention fusion module; S2. Input the source modal medical image into the medical image cross-modal transformation framework based on wavelet decomposition and parallel Schrödinger bridge. The feature decomposition module performs wavelet decomposition on the input source modal medical image to obtain the high-frequency and low-frequency features of the source modal medical image. S3, the parallel diffusion Schrödinger bridge module performs cross-modal mapping on the high-frequency and low-frequency features of the source modal medical image to generate the high-frequency and low-frequency features of the target modal image; S4. The attention fusion module performs interactive fusion of high-frequency and low-frequency features of the target modality image to generate the final target modality image.
2. The medical image cross-modal conversion method based on wavelet decomposition and parallel Schrödinger bridge according to claim 1, characterized in that, In S2, the feature decomposition module performs wavelet decomposition on the input source modal medical image, specifically as follows: The size of the input source modal medical image is The feature decomposition module uses Haar wavelet transform to perform one-dimensional Haar wavelet transform in the row and column directions on the input source modal medical image, obtaining low-frequency components and high-frequency components in the horizontal, vertical and diagonal directions. The three high-frequency components are concatenated and then combined with the low-frequency components for convolution, activation, and batch normalization to obtain the high-frequency features of the source modality medical image. and low frequency characteristics , Capture the texture information of the image. It reflects the overall structure and outline of the image.
3. The medical image cross-modal conversion method based on wavelet decomposition and parallel Schrödinger bridge according to claim 2, characterized in that, In S3, the parallel diffusion Schrödinger bridge module performs cross-modal mapping on the high-frequency and low-frequency features of the source modality medical image to generate the high-frequency and low-frequency features of the target modality image, specifically: The parallel diffusion Schrödinger bridge module contains two independent diffusion bridge models, which perform inter-modal probability path mapping on the high-frequency and low-frequency features of the source modality medical image, respectively, to obtain the high-frequency features corresponding to the target modality image. and low frequency characteristics .
4. The medical image cross-modal conversion method based on wavelet decomposition and parallel Schrödinger bridge according to claim 3, characterized in that, In S4, the attention fusion module interactively fuses the high-frequency and low-frequency features of the target modality image to generate the final target modality image, specifically: Attention fusion module and Interactive fusion is performed, employing a cross-attention mechanism to calculate high-frequency features. and low frequency characteristics The attention weights between features are calculated, and the features are weighted and fused based on these weights to generate the final target modality image.