Synthesis methods, systems, devices, and media for cross-modality medical imagery
By combining forward and backward diffusion techniques, and utilizing linear interpolation and distribution correction technology, the distribution drift problem in cross-modal medical image synthesis was solved, generating high-fidelity synthesized images and improving the usability and accuracy of clinical applications.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ANHUI PROVINCIAL HOSPITAL
- Filing Date
- 2026-02-04
- Publication Date
- 2026-06-16
AI Technical Summary
Existing cross-modal medical image synthesis methods have shortcomings in addressing the distribution drift problem, especially in non-uniform contrast areas such as pathological regions. This leads to intensity distribution deviations in the synthesized images, and the synthesis quality deteriorates under noise interference, making it difficult to meet clinical stability requirements.
By employing linear interpolation and progressive noise addition in the forward diffusion stage, combined with distribution correction and structure guidance in the backward diffusion stage, and using an optimal transmission algorithm to align and correct the distribution of intermediate results, a high-fidelity composite image is generated.
Significant improvements have been achieved in global distribution, local structure, and noise suppression, enhancing the clinical usability of cross-modal conversion and the accuracy of synthetic images, thus meeting clinical precision requirements.
Smart Images

Figure CN121639839B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of medical image processing technology, and in particular to a method, system, device and medium for synthesizing cross-modal medical images. Background Technology
[0002] Medical image synthesis refers to the technical process of converting one medical imaging modality (such as MRI-T2) into another (such as MRI-PD or CT). This technology has significant value in clinical workflows, addressing issues such as missing multimodal imaging data, reducing examination costs, and minimizing patient radiation exposure. In recent years, deep learning-based methods, particularly generative adversarial networks and diffusion models, have become the mainstream technical solutions for medical image synthesis. However, existing methods still face the problem of intensity distribution drift in clinical applications. This means that current methods struggle to maintain the global statistical characteristics of the target modality during cross-modal conversion, especially in non-uniform contrast regions such as pathological areas, leading to intensity distribution deviations in the synthesized image. Therefore, there is a need to provide a method, system, device, and medium for cross-modal medical image synthesis. Summary of the Invention
[0003] This invention provides a method, system, device, and medium for synthesizing cross-modal medical images to solve the technical problem of distribution drift in existing cross-modal conversion images.
[0004] This invention provides a method for synthesizing cross-modal medical images. The method includes: acquiring source modal medical images and target modal medical images of the same organ; inputting the target modal medical image and the source modal medical image into the forward diffusion branch of a diffusion model, performing linear interpolation on the target modal medical image and the source modal medical image according to a preset time step, and generating a noisy image for the corresponding time step based on the interpolation result; inputting the source modal medical image and the noisy images for each time step into the backward diffusion branch of the diffusion model, and performing time-step denoising processing starting from the last noisy image to generate a synthesized medical image. In the process, the i-th noisy image is processed as follows: the corrected intermediate results of the source modal medical image, the i-th noisy image, and the (i-1)-th noisy image are input into the backdiffusion branch to generate the intermediate result of the current noisy image. The intermediate result is then aligned based on the optimal transmission algorithm. Distribution correction information is generated based on the alignment result, and the intermediate result is corrected based on the distribution correction information to obtain the corrected intermediate result. For the last noisy image, the corrected intermediate result of its corresponding (i-1)-th noisy image has an initial preset value, and the last corrected intermediate result is used as the composite image.
[0005] In one embodiment of the present invention, the step of acquiring source modal medical images and target modal medical images of the same organ includes: acquiring initial source modal medical images and initial target modal medical images of the same organ; performing spatial registration and intensity normalization processing on the initial source modal medical images and the initial target modal medical images to obtain standardized source modal medical images and target modal medical images.
[0006] In one embodiment of the present invention, the method further includes: inputting the source modal medical image, the noisy image sequence and the noise mask into the back diffusion branch of the diffusion model, and performing time-step denoising processing starting from the last noisy image to generate a synthetic medical image, wherein the noise mask is obtained by performing local signal-to-noise ratio detection on the source modal medical image to generate a noise mask.
[0007] In one embodiment of the present invention, the step of inputting the source modal medical image, the noisy image sequence, and the noise mask into the backdiffusion branch of the diffusion model, and performing time-step denoising processing starting from the last noisy image to generate a synthetic medical image includes: processing the i-th noisy image as follows: inputting the corrected intermediate results of the source modal medical image, the i-th noisy image, and the (i-1)-th noisy image into the backdiffusion branch to generate an intermediate result for the current noisy image; performing distribution alignment on the intermediate result based on the optimal transmission algorithm, and generating distribution correction information based on the alignment result; and correcting the intermediate result based on the distribution correction information under the constraint of the noise mask to obtain a corrected intermediate result.
[0008] In one embodiment of the present invention, the step of correcting the intermediate result based on the distribution correction information under the constraint of the noise mask to obtain a corrected intermediate result includes: performing distribution consistency correction on the intermediate result according to the distribution correction information to obtain a first correction result; calculating the structural differences between the first correction result and the source modality medical image along the horizontal direction, vertical direction, and two diagonal directions at multiple preset resolution levels to obtain a structural change map for characterizing structural changes in different directions; performing weighted attenuation processing on the structural difference values corresponding to the noise regions in the structural change map based on the noise regions indicated by the noise mask to obtain a weighted structural change map; and performing structural correction on the first correction result based on the weighted structural change map to obtain the final corrected intermediate result.
[0009] In one embodiment of the present invention, the step of performing distribution consistency correction on the intermediate result according to the distribution correction information to obtain a first correction result includes: performing distribution consistency correction on the intermediate result according to the distribution correction information to obtain an initial first correction result; dynamically adjusting the gradient update magnitude of the i-th noisy image according to the difference between the initial first correction result and its posterior estimate; performing gradient update processing on the initial first correction result based on the difference to obtain a final first correction result; wherein, the posterior estimate is generated in the i-th noisy image through the backdiffusion branch.
[0010] In one embodiment of the present invention, after generating the synthetic medical image, the method further includes: performing wavelet decomposition on the synthetic medical image and the source modality medical image respectively, and extracting synthetic global information and synthetic local information from the synthetic medical image and source modality global information and source modality local information from the source modality medical image respectively; performing modal appearance consistency constraint fusion on the synthetic global information and the source modality global information to obtain fused global information; constructing a saliency map based on the synthetic local information and the source modality local information; performing adaptive weight fusion on the synthetic local information and the source modality local information based on the saliency map to obtain fused local information; performing inverse wavelet transform reconstruction on the fused local information and the fused global information to obtain a wavelet fused image; and performing anatomical structure correction on the wavelet fused image based on the source modality medical image to obtain the final synthetic medical image.
[0011] This invention also provides a cross-modal medical image synthesis system, comprising: an image acquisition module for acquiring source modal medical images and target modal medical images of the same organ; a forward diffusion module for inputting the target modal medical image and the source modal medical image into the forward diffusion branch of a diffusion model, performing linear interpolation on the target modal medical image and the source modal medical image according to a preset time step, and generating a noisy image for the corresponding time step based on the interpolation result; and a back diffusion module for inputting the source modal medical image and the noisy images at each time step into the back diffusion branch of the diffusion model, and performing time-step denoising processing starting from the last noisy image. The process is as follows: For the i-th noisy image, the corrected intermediate results of the source modal medical image, the i-th noisy image, and the (i-1)-th noisy image are input into the backdiffusion branch to generate an intermediate result for the current noisy image. The intermediate result is then aligned based on the optimal transmission algorithm. Distribution correction information is generated based on the alignment result, and the intermediate result is corrected based on the distribution correction information to obtain a corrected intermediate result. For the last noisy image, the corrected intermediate result of its corresponding (i-1)-th noisy image has an initial preset value, and the last corrected intermediate result is used as the composite image.
[0012] The present invention also provides an electronic device, comprising: one or more processors; and a storage device for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the electronic device enables the method of synthesizing cross-modal medical images as described above.
[0013] The present invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a computer processor, causes the computer to perform any of the above-described methods for synthesizing cross-modal medical images.
[0014] The beneficial effects of this invention are as follows: The present invention proposes a method, system, device, and medium for synthesizing cross-modal medical images. In the forward diffusion stage, linear interpolation of the source and target modal images is used, and noise is gradually added to create a continuous mapping between the modal appearance and anatomical structure, thereby effectively mitigating modal distribution differences. In the back diffusion stage, noise is gradually reduced under the joint guidance of the source modal image and the correction results of the previous time step, maintaining structural consistency and inference stability in the generation process. Furthermore, a distribution alignment mechanism based on optimal transmission is used to perform statistical characteristic correction on the intermediate results of each time step, making the generated image gradually approximate the intensity distribution of the target modality. Through the above-mentioned gradual correction and structural guidance, a synthesized medical image that is significantly superior to existing technologies in terms of global distribution, local structure, and noise suppression is finally obtained. This invention achieves high-fidelity synthesis of cross-modal medical images by combining forward diffusion and conditional back diffusion, improving the clinical usability of cross-modal conversion. Attached Figure Description
[0015] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application. It is obvious that the drawings described below are merely some embodiments of this application, and those skilled in the art can obtain other drawings based on these drawings without any inventive effort.
[0016] In the attached diagram:
[0017] Figure 1 This is a schematic flowchart of a method for synthesizing cross-modal medical images according to an embodiment of the present invention;
[0018] Figure 2 This is a structural block diagram of a cross-modal medical image synthesis system provided in one embodiment of the present invention;
[0019] Figure 3 This is a schematic diagram of the structure of an electronic device provided in one embodiment of the present invention. Detailed Implementation
[0020] The following specific examples illustrate the implementation of the present invention. Those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments. Various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present invention. In the absence of conflict, the following embodiments and features in the embodiments can be combined with each other.
[0021] It should be noted that the illustrations provided in the following embodiments are only schematic representations of the basic concept of the present invention. The drawings only show the components related to the present invention and are not drawn according to the actual number, shape and size of the components in the actual implementation. In the actual implementation, the form, quantity and proportion of each component can be arbitrarily changed, and the layout of the components may also be more complex.
[0022] In the following description, numerous details are explored to provide a more thorough explanation of embodiments of the invention. However, it will be apparent to those skilled in the art that embodiments of the invention may be practiced without these specific details. In other embodiments, well-known structures and devices are shown in block diagram form rather than in detail to avoid obscuring embodiments of the invention.
[0023] The processing of user personal information, including its collection, storage, use, processing, transmission, provision, and disclosure, in the technical solution of this invention complies with relevant laws and regulations and does not violate public order and good morals.
[0024] Research has revealed that although deep learning-based medical image synthesis technology has made significant progress in recent years, enabling applications such as multimodal image complementation, examination protocol optimization, and training data augmentation—for example, reducing the need for multiple examinations for patients, using conventional MRI sequences to synthesize images with specific contrast to reduce costs, and providing more paired data for model training—these methods still have significant drawbacks in real-world clinical settings. First, existing diffusion methods generally employ a static "one-time estimation" intermediate sample strategy, lacking a dynamic distribution calibration mechanism. This leads to deviations in the global grayscale distribution of the generated results from the target modality, particularly noticeable in areas with uneven contrast, such as pathological regions. Second, due to insufficient noise robustness, the synthesis quality significantly degrades when the input image contains noise interference (such as low-field MRI). Even with a noise standard deviation σ = 15, the Dice coefficient can decrease by as much as 30%–40%, failing to meet clinical stability requirements. Furthermore, in high-resolution 3D image synthesis, traditional diffusion models require numerous iterative steps (typically 50–100 steps), resulting in excessively long inference times and making them difficult to integrate into actual clinical workflows.
[0025] On the other hand, current medical image synthesis technology also has significant shortcomings in terms of clinical adaptability: First, clinical scenarios require extremely high structural accuracy; for example, tumor boundary errors need to be controlled within 1 mm, while existing methods struggle to guarantee sufficient anatomical fidelity. Second, its adaptability to complex situations (including lesion variations, image noise, and non-standard acquisition conditions) is limited, resulting in significant fluctuations in generated quality. Third, existing technologies are mostly still in the research prototype stage, lacking end-to-end solutions that can be directly embedded into clinical systems, including customized synthesis strategies for different departments or tasks, and interpretability analysis and quality assessment mechanisms for the synthesis results, making it difficult to promote and apply them in real-world medical scenarios.
[0026] To address the aforementioned problems, this invention provides a method for synthesizing cross-modal medical images. In the forward diffusion stage, linear interpolation of the source and target modal images is used, with progressive noise addition, to create a continuous mapping between the modal appearance and anatomical structure, effectively mitigating modal distribution differences. In the backdiffusion stage, noise is progressively denoised under the joint guidance of the source modal image and the correction results from the previous time step, maintaining structural consistency and inference stability during the generation process. Furthermore, a distribution alignment mechanism based on optimal transmission performs statistical characteristic correction on the intermediate results of each time step, causing the generated image to progressively approximate the intensity distribution of the target modality. Through this progressive correction and structural guidance, a synthesized medical image is ultimately obtained that significantly outperforms existing technologies in terms of global distribution, local structure, and noise suppression. This invention achieves high-fidelity synthesis of cross-modal medical images by combining forward diffusion and conditional backdiffusion, improving the clinical usability of cross-modal conversion.
[0027] like Figure 1 As shown, the method for synthesizing cross-modal medical images includes the following steps:
[0028] S100: Acquire source modal medical images and target modal medical images of the same organ.
[0029] Source modality medical images refer to images that provide anatomical structural information, used to provide prior structural information in subsequent generation processes, such as MRI images with high soft tissue contrast or CT images with clear bony structures. Target modality medical images refer to images with a specific intensity distribution and imaging style, used to provide intensity distribution information and imaging style reference for the target modality in subsequent diffusion models, such as MRI images with a certain contrast. It should be noted that both source and target modality medical images can be any clinically common type; there are no specific limitations, as long as the images correspond to the same organ structure of the same patient.
[0030] In an optional embodiment of the present invention, step S100 includes the following process: acquiring initial source modal medical images and initial target modal medical images of the same organ; performing spatial registration and intensity normalization processing on the initial source modal medical images and the initial target modal medical images to obtain standardized source modal medical images and target modal medical images.
[0031] To eliminate differences in spatial location, local deformation, and intensity distribution among different modalities of medical images, this invention further processes the acquired initial target modal medical images before inputting them into the diffusion model. and initial source modal medical images Spatial registration and intensity normalization are performed separately to generate standardized images that can be directly input into diffusion models. Specifically, this invention employs a deep learning-based non-rigid registration model to process the initial target modal medical images. and initial source modal medical images The input is fed into a pre-trained registration network, and the deformation field generated by the network is processed. For initial source modal medical images Local deformation correction is performed. The registration network can be based on... The architecture can be a 3D registration network, or a similar deep learning non-rigid registration model, without specific limitations. The source modality medical image, spatially aligned with the target modality, can be obtained based on the deformation field, as shown in formula (1):
[0032] (1)
[0033] in, For spatially aligned source modal medical images, Indicates through deformation field right Spatial transformation is performed. After this processing, the spatially aligned source modal medical image is obtained. Spatial coordinates and initial target modal medical images Similarly. Furthermore, to address the issue of large differences in intensity distribution between different modalities, this invention also uses an adaptive histogram equalization method to separately process the initial target modal medical images. Source modal medical images aligned with space Contrast enhancement processing is performed, resulting in enhanced target modal medical images and enhanced source modal medical images. Specifically, the contrast enhancement process is performed... and As images, the following processing is performed: the images are divided into P local regions according to a preset size (e.g., For each local region (pixel block), a local histogram and its cumulative distribution function (CDF) are calculated, and the image is mapped as shown in formula (2):
[0034] (2)
[0035] in, The image at a certain pixel coordinate ( The grayscale value of ) This is the cumulative distribution function of the local region where the pixel is located. The preset maximum gray level (e.g., 255). The pixel coordinates after equalization ( The grayscale value of ).
[0036] Furthermore, to avoid noise amplification in low signal-to-noise ratio regions due to contrast enhancement, this invention also performs intensity normalization processing on the enhanced target modal medical image and the enhanced source modal medical image according to a preset contrast limit threshold (e.g., 2.0) to suppress excessive enhancement in local regions. Through the above spatial registration and intensity normalization process, a standardized target modal medical image can be obtained as the final input to the diffusion model. and standardized source modal medical imaging As shown in formulas (3) and (4) respectively:
[0037] (3)
[0038] (4)
[0039] Norm represents the spatial registration and intensity normalization operations. The above preprocessing is fully automated, with an average processing time of less than 2 minutes, meeting clinical real-time requirements. It should be noted that the registration network was pre-trained, and during training, it was optimized and updated by minimizing the loss function shown in formula (5).
[0040] (5)
[0041] in, Image similarity loss is used to measure the spatial alignment of source modal medical images. With initial target modal medical images The structural consistency, where image similarity loss can be calculated using mutual information or mean square error. For deformation field The smoothing constraint term is used to ensure the continuity of local deformation. The preset balance factor (e.g., 0.01).
[0042] S200. Input the target modal medical image and the source modal medical image together into the forward diffusion branch of the diffusion model. Perform linear interpolation on the target modal medical image and the source modal medical image together according to a preset time step. Generate a noisy image for the corresponding time step based on the interpolation result.
[0043] After spatial calibration and intensity normalization of the initial source modal medical images and the initial target modal medical images, standardized source modal medical images are obtained. and standardized target modal medical imaging Both are then input into the forward diffusion branch of the diffusion model. During the forward diffusion process, multiple time steps are preset (e.g., time steps 1 to 10). Following a preset monotonically increasing noise scheduling strategy, standardized source modal medical images are used at each time step. and standardized target modal medical imaging The time-related weighting coefficients are used to perform linear interpolation on the two, and random noise corresponding to the noise scheduling parameter of the current time step is gradually added to the interpolation result. This allows the target mode to gradually form noisy images with different noise intensities while maintaining its correlation with the source mode. Through the combined effect of the above linear interpolation and incremental noise addition, the gradual preservation and fusion of cross-modal multi-scale features can be achieved. The noisy images obtained at each time step are arranged in time step order to form a noisy image sequence. In the monotonically increasing noise scheduling strategy, the noise scheduling parameter of the i-th time step is less than the noise scheduling parameter of the (i+1)-th time step. For example, from arrive It should be noted that the acquired initial target modality medical images can also be used. and initial source modal medical images The above process is used to process the data to obtain a synthetic medical image. However, to improve the quality of the final synthetic image, it is preferable to use standardized source modal medical images. and standardized target modal medical imaging Input diffusion model.
[0044] Specifically, for the i-th time step, the following noise reduction process is performed: the noisy image corresponding to the (i-1)-th time step is... Standardized target modal medical imaging and standardized source modal medical imaging The input is fed into the forward diffusion branch and processed according to the preset noise scheduling parameters. Calculate the retention coefficient and cumulative retention coefficient at the current i-th time step, as shown in formulas (6) and (7):
[0045] =1- (6)
[0046] = (7)
[0047] in, The retention coefficient at the i-th time step. Let be the cumulative retention coefficient at the i-th time step. and These are the preset noise scheduling parameters for the i-th and s-th time steps, respectively. The weighting coefficients of the target mode and the source mode are generated based on the cumulative retention coefficients, as shown in formulas (8) and (9):
[0048] = (8)
[0049] = (9)
[0050] in, Standardized target modal medical imaging The retention rate at time i. For standardized source modal medical imaging The participation ratio at time step i. The standardized target modality medical image is weighted according to the aforementioned weighting coefficients. and standardized source modal medical imaging Perform linear interpolation to obtain the base interpolated image at the i-th time step. As shown in formula (10):
[0051] = + (10)
[0052] And calculate the noise standard deviation at the current i-th time step based on the cumulative retention coefficient, as shown in formula (11):
[0053] (11)
[0054] Basic interpolated image Intermediate state with the previous time step The fusion is performed according to the preset path preservation coefficient, resulting in the diffusion state shown in formula (12):
[0055] (12)
[0056] in, Let i be the diffusion state at the i-th time step. This is the preset path preservation coefficient at the i-th time step. A coefficient satisfying a normal distribution is added to the diffusion state. Gaussian noise This forms the noisy image at the i-th time step. As shown in formula (13):
[0057] = + (13)
[0058] S300: Input the source modal medical image and the noisy images at each time step into the back diffusion branch of the diffusion model, and perform time-step denoising processing starting from the last noisy image to generate a synthetic medical image; wherein, for the i-th noisy image, the following processing is performed: input the corrected intermediate results of the source modal medical image, the i-th noisy image, and the (i-1)-th noisy image into the back diffusion branch to generate the intermediate result of the current noisy image, perform distribution alignment on the intermediate result based on the optimal transmission algorithm, generate distribution correction information based on the alignment result, and correct the intermediate result based on the distribution correction information to obtain the corrected intermediate result; wherein, for the last noisy image, the corrected intermediate result of its corresponding (i-1)-th noisy image has an initial preset value, and the last corrected intermediate result is used as the synthetic image.
[0059] Specifically, spatially aligned source modal medical images The noisy image sequence generated by the forward diffusion branch is input together with the back diffusion branch, and denoising is performed in reverse time step order, starting from the last noisy image. Specifically, at the i-th time step, the spatially aligned source modal medical image is... The noisy image at the current i-th time step The corrected intermediate results from the previous time step are input together into the backdiffusion branch to generate the intermediate result for the current i-th time step, thus initially restoring the relevant features of the standardized target modality medical image. To avoid the accumulation of biases caused by inconsistencies in cross-modal distributions, this invention also performs distribution alignment processing on the intermediate results for the current i-th time step. By evaluating the differences between the intermediate results and the standardized target modality medical image, distribution correction information is generated for correction. Based on this distribution correction information, the intermediate results are corrected and updated to make them closer to the true distribution of the target modality in terms of global grayscale statistics, regional contrast, and lesion expression. The corrected intermediate results for the current i-th time step are used as one of the inputs for the denoising process in the next time step to achieve recursive optimization between time steps. As the time steps converge, the corrected intermediate results of the first time step in the aforementioned noisy image sequence are used as the synthetic medical image generated by this invention. It should be noted that the initial source modal medical image and the noisy image sequence generated by the forward diffusion branch can also be input together into the back diffusion branch for the above processing. During feature comparison, the relevant features of the initial target modal medical image can be gradually recovered, and distribution correction information for correction can be generated by evaluating the difference between the intermediate results and the initial target modal medical image. However, to improve the accuracy of the final medical image synthesis, preferably, the spatially aligned source modal medical image is used... Input is fed into the backdiffusion branch, with a special comparison using standardized target modal medical images.
[0060] In an optional embodiment of the present invention, the method further includes: inputting the source modal medical image, the noisy image sequence and the noise mask into the back diffusion branch of the diffusion model, and performing time-step denoising processing starting from the last noisy image to generate a synthetic medical image, wherein the noise mask is obtained by performing local signal-to-noise ratio detection on the source modal medical image to generate a noise mask.
[0061] Specifically, the above-mentioned standardized source modal medical images The system is divided into several local regions. For each local region, its mean gray value and standard deviation are calculated, and the local signal-to-noise ratio of the region is calculated according to formula (14):
[0062] (14)
[0063] in, Let be the local signal-to-noise ratio of the k-th local region in the source modality medical image. Let be the average gray value of the k-th local region. Let be the standard deviation of the gray level of the k-th local region. When the p-th (p For standardized source modal medical imaging When the local signal-to-noise ratio of a local region is less than a preset noise threshold (e.g., 10 dB), that local region is considered a significant noise region, and a noise mask is generated accordingly. This mask is used to indicate the regions where structural constraint strength needs to be reduced during backdiffusion. The noise mask and the noise-enhanced image are input together into the backdiffusion branch to generate accurate cross-modal composite images even under complex noise interference conditions. It should be noted that the noise mask can be obtained from standardized source modal medical images. Image generation can also be achieved from initial source modal medical images. To improve the accuracy of subsequent image synthesis, standardized source modal medical images are preferably used. generate.
[0064] In an optional embodiment of the present invention, the step of inputting the source modal medical image, the noisy image sequence, and the noise mask into the backdiffusion branch of the diffusion model, and performing time-step denoising processing starting from the last noisy image to generate a synthetic medical image includes: processing the i-th noisy image as follows: inputting the corrected intermediate results of the source modal medical image, the i-th noisy image, and the (i-1)-th noisy image into the backdiffusion branch to generate an intermediate result for the current noisy image; performing distribution alignment on the intermediate result based on the optimal transmission algorithm, and generating distribution correction information based on the alignment result; and correcting the intermediate result based on the distribution correction information under the constraint of the noise mask to obtain a corrected intermediate result.
[0065] In the backdiffusion phase, the spatially aligned source modal medical images The noisy image sequence obtained from forward diffusion and the noise mask generated based on the source modality image are input together into the back diffusion branch of the diffusion model. Denoising and reconstruction are performed progressively from the last noisy image, following the reverse time step order. At each time step, the image distribution of the target modality is gradually restored through multi-source information fusion. Specifically, for any i-th noisy image, the source modality medical image, the i-th noisy image, and the i-th noisy image are... The corrected intermediate result of one time step is input into the backdiffusion branch to generate the intermediate result of the current i-th time step. This intermediate result includes preliminary denoising of the current i-th noisy image and anatomical structural information of the active modality medical image. The continuity of the generated intermediate result is ensured by using the corrected intermediate result of the (i-1)-th time step. Further, the intermediate result is subjected to distribution alignment processing based on the optimal transmission algorithm. By comparing the distance between the current intermediate result and the potential distribution of the target modality, distribution correction information for correction is generated. To further improve stability under noise interference conditions, a noise mask is introduced during distribution correction. This reduces the structural consistency constraint strength in noisy regions and maintains a strong structure preservation effect in regions with high signal-to-noise ratios, thereby avoiding over-enhancement or miscorrection of noisy regions. Based on the above distribution correction information and the adaptive constraints formed by the noise mask, the current intermediate result is updated to obtain the corrected intermediate result of the i-th time step, which is then input into the backdiffusion processing of the next time step. As the time steps iterate forward, the corrected intermediate result of the final time step becomes the synthetic medical image generated by this invention.
[0066] Furthermore, in an optional embodiment of the present invention, under the constraint of the noise mask, the step of correcting the intermediate result based on the distribution correction information to obtain the corrected intermediate result includes:
[0067] S310. Perform distribution consistency correction on the intermediate result according to the distribution correction information to obtain the first correction result.
[0068] Considering the differences between source and target modal medical images in terms of grayscale statistics, intensity distribution, and texture variations, the intermediate results generated during backdiffusion often favor the features of the source modality. This leads to discrepancies in grayscale range, contrast, and other characteristics between the final synthesized image and the target modality medical image. This is known as the distribution drift problem in cross-modal medical image conversion. To address this issue, this invention generates distribution correction information based on an optimal transmission algorithm and updates the intermediate results generated at the current time step accordingly. This makes the overall grayscale distribution and texture statistics more closely resemble the potential distribution of the target modality medical image, reducing the distribution shift generated during cross-modal conversion and thus obtaining the first correction result.
[0069] In an optional embodiment of the present invention, step S310 includes the following process: performing distribution consistency correction on the intermediate result according to the distribution correction information to obtain an initial first correction result; dynamically adjusting the gradient update magnitude of the i-th noisy image according to the difference between the initial first correction result and its posterior estimate; performing gradient update processing on the initial first correction result based on the difference to obtain a final first correction result; wherein, the posterior estimate is generated in the i-th noisy image through the backdiffusion branch.
[0070] Specifically, during backdiffusion, the intermediate results generated at the current i-th time step are subjected to distribution consistency correction based on distribution correction information, so that the global gray-scale statistics of the intermediate results are closer to the potential distribution of the target mode, thereby obtaining the initial first correction result. Furthermore, the posterior estimate generated by the backdiffusion branch at the i-th time step is used to adaptively optimize the generated first correction result to obtain the final first correction result.
[0071] In an optional embodiment of the present invention, the step of performing distribution consistency correction on the intermediate result according to the distribution correction information to obtain a first correction result includes: performing distribution consistency correction on the intermediate result according to the distribution correction information to obtain an initial first correction result; dynamically adjusting the gradient update magnitude of the i-th noisy image according to the difference between the initial first correction result and its posterior estimate; performing gradient update processing on the initial first correction result based on the difference to obtain a final first correction result; wherein, the posterior estimate is generated in the i-th noisy image through the backdiffusion branch.
[0072] Specifically, the backdiffusion branch of this invention includes a condition generator and a time-aware latent network. During the backdiffusion process, this invention generates a noisy image of the last time step T obtained from the forward diffusion. Starting from this point, backdiffusion is performed step-by-step to generate the source modality medical image. For the i-th time step, the source modality medical image is... The noisy image at the current i-th time step The corrected result from the previous time step is input into the trained back-diffusion branch, which is then processed by the conditional generator. The intermediate result for the current time step is given according to formula (15). :
[0073] (15)
[0074] Based on this, the trained temporal-aware latent network is used to analyze noisy images. and intermediate results The potential distribution differences between them are measured, and the distribution correction information for the current time step is calculated in real time according to the optimal transmission loss construction method. Specifically, the noisy image at the i-th time step is... The input is fed into a time-aware latent network to generate a latent bias for approximating the Wasserstein distance, which, together with the difference term of the intermediate result, constitutes the optimal transmission loss, as shown in Equation (16):
[0075] (16)
[0076] in, The noisy image at the current i-th time step and intermediate results The distance between them for Expectations For the time-aware latent network, based on the noisy image at the i-th time step The generated distribution correction information, The optimal transmission loss is determined by minimizing this loss, which yields the distribution correction information for the current time step. .
[0077] Based on the obtained distribution correction information, the intermediate result generated at the current i-th time step is subjected to distribution consistency correction, so that the overall intensity distribution of the intermediate result converges to the latent statistical characteristics of the target mode, thereby obtaining the initial first correction result. Obtain the posterior estimate generated by the backdiffusion branch under the noisy image at the current i-th time step. And calculate the difference between the initial first correction result and the posterior estimate according to formula (17):
[0078] = (17)
[0079] in, Let the difference be the value at the i-th time step. for and The distance between them is used as the basis for dynamically adjusting the residual update magnitude at the current time step: increasing the update intensity when the difference is large and decreasing the update magnitude when the difference is small. Thus, adaptive updates based on gradient direction are performed according to the degree of difference, causing the corrected generated result to converge along the optimal direction indicated by the posterior estimate, obtaining the final first corrected result at the i-th time step. .
[0080] S320. At multiple preset resolution levels, the structural differences between the first correction result and the source modal medical image are calculated along the horizontal direction, the vertical direction, and the two diagonal directions, respectively, to obtain a structural change map that characterizes structural changes in different directions.
[0081] Specifically, the final first correction result at the i-th time step Spatial average pooling is performed on the source modal medical image y at multiple preset resolution levels, and differentiable gradient operators are applied along the horizontal, vertical and two diagonal directions at each scale. Calculate according to formula (18):
[0082] (18)
[0083] Among them, the above This is a structural change map at the current scale and in the current direction. The pixel values in the structural change map represent local structural differences such as edge position offset and local morphological changes at that location in the corresponding direction. The result is obtained by spatial average pooling of the source modal medical image y. The final first correction result at the i-th time step The result after spatial average pooling.
[0084] S330. Based on the noise region indicated by the noise mask, the structural difference values corresponding to the noise region in the structural change map are weighted and attenuated to obtain a weighted structural change map.
[0085] Since noise masks are used to represent regions in source modal medical images with low signal-to-noise ratios (SNR) and susceptible to random noise, gradient changes within these noisy regions are often not caused by actual anatomical differences but by spurious differences due to noise fluctuations. Therefore, based on the low SNR regions indicated by the noise mask, the corresponding structural difference values in the structural change map can be weighted attenuated, reducing the strength of structural constraints in noisy regions while maintaining the original weights in high SNR regions. The weighted structural change map obtained after this processing can effectively suppress noise-induced misjudgments, allowing subsequent structural correction processes to focus more on actual anatomical changes, thereby improving the stability and reliability of the synthesized image in noisy environments.
[0086] S340. Based on the weighted structural change diagram, perform structural correction on the first correction result to obtain the final corrected intermediate result.
[0087] After obtaining the weighted structural change map, this invention further utilizes this map to guide the structural correction of the results generated at the current time step. Specifically, the weighted structural change map characterizes the intensity of structural differences after noise masking pixel by pixel, reflecting the local structural offset between the first correction result and the source modality medical image at different directions and scales. Based on this structural offset information, according to the offset direction indicated by the structural change map, the gradient, edges, and local textures of the corresponding pixel positions in the first correction result are adjusted in a targeted manner to gradually approach the true anatomical morphology of the source modality medical image.
[0088] In practice, structural correction is achieved through the construction and weighted structural changes. Figure 1 A pixel-level gradient update field is generated and applied to the first correction result, allowing the anatomical structure to be adaptively corrected according to the magnitude of the difference. Specifically, a stronger correction is applied to regions with significant structural differences, while a weaker constraint is maintained in low signal-to-noise ratio regions indicated by noise masks to avoid spurious corrections caused by noise interference. Through this method, the final corrected intermediate result for the current time step can be obtained, further satisfying the requirements of anatomical structure continuity and boundary fineness on the basis of achieving distribution consistency correction, providing a more accurate structural prior for backdiffusion in the next time step.
[0089] In an optional embodiment of the present invention, after generating the synthetic medical image, the method further includes: performing wavelet decomposition on the synthetic medical image and the source modality medical image respectively to obtain low-frequency sub-bands and high-frequency sub-bands; extracting synthetic global information and synthetic local information from the synthetic medical image and source modality global information and source modality local information from the source modality medical image respectively; fusing the synthetic global information and the source modality global information to obtain fused global information; constructing a saliency map based on the synthetic local information and the source modality local information; performing adaptive weighted fusion on the synthetic local information and the source modality local information based on the saliency map to obtain fused local information; performing inverse wavelet transform reconstruction on the fused local information and the fused global information to obtain a wavelet fused image; and performing anatomical structure correction on the wavelet fused image based on the source modality medical image to obtain the final synthetic medical image.
[0090] After completing cross-modal generation, this invention further enhances the clinical readability and anatomical detail fidelity of the final image through frequency domain fusion technology. Specifically, the synthetic medical image and the source modality medical image are decomposed into wavelet decomposition, which decomposes them into synthetic global information and source modality global information representing the overall imaging appearance, and synthetic local information and source modality local information representing edges, textures, and anatomical details. For the global information, this invention fuses the synthetic global information and source modality global information through modal appearance consistency constraints, so that the final global appearance combines the imaging characteristics of the target modality with the overall structural style of the source modality, thus obtaining fused global information. A saliency map is constructed using the synthetic local information and source modality local information as input. This saliency map is used to identify the texture contribution of anatomical structures in local regions. Guided by the saliency map, this invention performs adaptive weight fusion of the synthetic local information and source modality local information, so that local structures preferentially retain the true anatomical details of the source modality in highly saliency regions, thereby obtaining fused local information. By fusing global and local information, a wavelet fused image is reconstructed using inverse wavelet transform, resulting in optimized global appearance and local structure. Combined with anatomical references from the source modal medical images, this invention performs structural correction on the wavelet fused image to eliminate local hysteresis or fusion errors, thereby obtaining the final synthetic medical image. This achieves high-fidelity cross-modal medical image synthesis. The final synthesized medical image refers to a cross-modal generated image that has the appearance of the target modality and maintains the consistency of the anatomical structure of the source modality.
[0091] Furthermore, to improve the reliability of the anatomical structure in the final synthesized image, this invention also introduces a deep learning-based anatomical structure correction network to automatically detect and repair potential structural abnormalities in the synthesized medical image. Specifically, the synthesized medical image... With source modal medical imaging The input is fed into the anatomical structure correction network to generate anatomically corrected medical images, which are then used as the final synthetic medical images. This network employs a U-Net encoder-decoder architecture and introduces an attention mechanism in skip connections to enhance its responsiveness to edges and fine anatomical structures. During training, the Dice loss and L1 pixel difference loss are weighted to obtain the correction loss, as shown in formula (19):
[0092] (19)
[0093] in, To correct for the loss, For Dice's loss, The preset weighting coefficient (e.g., 0.1). for and L1 pixel difference loss, For source modal medical imaging, Synthetic medical images generated by an anatomical structure correction network.
[0094] Experiments have verified that the correction network achieves an overall accuracy of 98.5% in structural repair tasks. Furthermore, to objectively evaluate the quality of the final synthesized images, this invention constructs a quality assessment subsystem. In addition to outputting commonly used metrics such as Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM), a deep learning quality assessment model based on clinical radiologist scoring criteria is also established. This model uses a ResNet-50 architecture and is fine-tuned through transfer learning for a regression task. It takes the synthesized medical image and its corresponding target modality medical image as input and outputs a quality score ranging from 0 to 1 to quantify the clinical usability of the image. Experimental results show that the correlation between this quality assessment model and expert scores reaches 0.91, reliably reflecting the clinical quality of medical images.
[0095] Furthermore, to ensure seamless integration of the synthesized results with clinical workflows, the output synthesized medical images conform to the DICOM standard and fully retain all original metadata of the input images, thereby ensuring high compatibility with existing hospital image archiving and communication systems (PACS) and radiology information systems (RIS). This original metadata includes, but is not limited to: patient identification information (such as name, patient ID, date of birth, etc.), image acquisition parameters (such as imaging modality, scan sequence, slice thickness, pixel pitch, and resolution, etc.), and clinical annotation information (such as anatomical location, diagnostic notes, acquisition timestamp, etc.). This metadata is automatically parsed and extracted from the input DICOM file and embedded into the corresponding fields of the output DICOM file when generating the synthesized medical images, ensuring consistency and traceability between the synthesized medical images and the original images during clinical use.
[0096] Experiments demonstrate that the proposed dual-alignment diffusion bridge model has significant advantages in the field of medical image synthesis. Firstly, through an innovative dual-alignment mechanism, in the cross-modal synthesis task of synthesizing T2-weighted images into PD-weighted images on the IXI dataset, the model achieves a peak signal-to-noise ratio (PSNR) of 34.67 dB, an improvement of 1.45 dB over existing technologies, and a 42% improvement in boundary sharpness. The IXI dataset consists of multimodal MRI images of the brains of 600 healthy subjects, with each subject's MRI data containing multiple scan sequences, such as T1-weighted, T2-weighted, and PD-weighted images. Secondly, the optimized noise scheduling technique compresses the inference steps to 10, improving efficiency by 5 times, making high-resolution 3D image synthesis clinically applicable. Furthermore, it maintains a 93.2% structural similarity index (SSIM) even under high noise conditions of σ=15, improving anti-interference capability by 60%. Finally, parametric design meets the needs of different clinical scenarios, significantly improving clinical applicability.
[0097] The dual-alignment diffusion bridge framework proposed in this invention, by innovatively combining optimal transport theory and gradient constraint mechanisms, can guarantee both global statistical consistency and local anatomical accuracy, and ensure the stability of synthesized medical images under complex conditions. It significantly outperforms existing technologies in multiple clinical indicators. Therefore, this invention effectively solves key problems such as modal gap and error accumulation in existing technologies through the synergistic effect of implicit distribution alignment guided by optimal transport and explicit structures constrained by multi-scale gradients, providing a reliable cross-modal image synthesis tool for precision medicine in clinical practice. Furthermore, this invention achieves high-quality image synthesis within 10 optimization steps using a monotonic noise scheduling algorithm.
[0098] like Figure 2As shown, the cross-modal medical image synthesis system includes an image acquisition module 110, a forward diffusion module 120, and a back diffusion module 130. The image acquisition module acquires source and target modal medical images of the same organ. The forward diffusion module 120 inputs the target and source modal medical images into the forward diffusion branch of the diffusion model, performs linear interpolation on the target and source modal medical images according to a preset time step, and generates a noisy image for the corresponding time step based on the interpolation result. The backdiffusion module 130 is used to input the source modal medical image and the noisy images at each time step into the backdiffusion branch of the diffusion model, and perform time-step denoising processing starting from the last noisy image to generate a synthetic medical image; wherein, for the i-th noisy image, the following processing is performed: the corrected intermediate results of the source modal medical image, the i-th noisy image, and the (i-1)-th noisy image are input into the backdiffusion branch to generate the intermediate result of the current noisy image, the intermediate result is distributed and aligned based on the optimal transmission algorithm, distribution correction information is generated based on the alignment result, and the intermediate result is corrected based on the distribution correction information to obtain the corrected intermediate result; wherein, for the last noisy image, the corrected intermediate result of its corresponding (i-1)-th noisy image has an initial preset value, and the last corrected intermediate result is used as the synthetic image.
[0099] Specific limitations regarding the compositing system for cross-modal medical images can be found in the limitations on the compositing method for cross-modal medical images described above, and will not be repeated here. Each module in the aforementioned compositing system for cross-modal medical images can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in the processor of a computer device in hardware format or independent of it, or stored in the memory of a computer device in software format, so that the processor can call the corresponding operations of each module.
[0100] It should be noted that, in order to highlight the innovative aspects of this invention, this embodiment does not include modules that are not closely related to solving the technical problems proposed by this invention, but this does not mean that there are no other modules in this embodiment.
[0101] like Figure 3 As shown, the electronic device 3 may include a memory 31, a processor 32 and a bus, and may also include a computer program stored in the memory 31 and executable on the processor 32, such as a cross-modal medical image synthesis program.
[0102] The memory 31 includes at least one type of readable storage medium, including flash memory, portable hard drive, multimedia card, card-type memory (e.g., SD or DX memory), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 31 can be an internal storage unit of the electronic device 3, such as a portable hard drive. In other embodiments, the memory 31 can be an external storage device of the electronic device 3, such as a plug-in portable hard drive, smart media card (SMC), secure digital (SD) card, flash card, etc., equipped on the electronic device 3. Furthermore, the memory 31 can include both internal and external storage units of the electronic device 3. The memory 31 can be used not only to store application software and various types of data installed on the electronic device 3, such as code for synthesizing cross-modal medical images, but also to temporarily store data that has been output or will be output.
[0103] In some embodiments, processor 32 may be composed of integrated circuits, such as a single packaged integrated circuit or multiple integrated circuits packaged with the same or different functions, including combinations of one or more central processing units (CPUs), microprocessors, digital processing chips, graphics processors, and various control chips. Processor 32 is the control unit of electronic device 3, connecting various components of the entire electronic device 3 via various interfaces and lines. It executes programs or modules stored in memory 31 (such as a cross-modal medical image synthesis program) and calls data stored in memory 31 to perform various functions of electronic device 3 and process data.
[0104] The processor 32 executes the operating system of the electronic device 3 and various installed applications. The processor 32 executes the applications to implement the steps in the above-described method for synthesizing cross-modal medical images.
[0105] For example, a computer program may be divided into one or more modules, one or more of which are stored in memory 31 and executed by processor 32 to complete this application. One or more modules may be a series of computer program instruction segments capable of performing specific functions, which describe the execution process of the computer program in electronic device 3. For example, the computer program may be divided into an image acquisition module 110, a forward diffusion module 120, and a backward diffusion module 130.
[0106] The integrated units implemented as software functional modules described above can be stored in a computer-readable storage medium, which can be non-volatile or volatile. The software functional modules stored in the storage medium include several instructions to cause a computer device (which may be a personal computer, computer equipment, or network device, etc.) or processor to execute some functions of the cross-modal medical image synthesis methods of the various embodiments of this application.
[0107] The above embodiments are merely illustrative of the principles and effects of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or alter the above embodiments without departing from the spirit and scope of the present invention. Therefore, all equivalent modifications or alterations made by those skilled in the art without departing from the spirit and technical concept disclosed in the present invention should still be covered by the claims of the present invention.
Claims
1. A method for synthesizing cross-modal medical images, characterized in that, The method includes: Acquire source and target modal medical images of the same organ; The target modality medical image and the source modality medical image are input together into the forward diffusion branch of the diffusion model. Linear interpolation is performed on the target modality medical image and the source modality medical image according to a preset time step. Based on the interpolation result, a noisy image for the corresponding time step is generated. Random noise corresponding to the noise scheduling parameter of the current time step is gradually added to the interpolation result, so that the target modality gradually forms a noisy image with different noise intensities while maintaining its association with the source modality. The source modal medical images and the noisy images at each time step are input into the back diffusion branch of the diffusion model, and denoising is performed step by step starting from the last noisy image to generate synthetic medical images. Specifically, the i-th noisy image is processed as follows: the corrected intermediate results of the source modal medical image, the i-th noisy image, and the (i-1)-th noisy image are input into the backdiffusion branch to generate the intermediate result of the current noisy image. The intermediate result is then aligned based on the optimal transmission algorithm. Distribution correction information is generated based on the alignment result, and the intermediate result is corrected based on the distribution correction information to obtain the corrected intermediate result. For the last noisy image, the corrected intermediate result of its corresponding (i-1)-th noisy image has an initial preset value, and the last corrected intermediate result is used as the composite image. For the i-th time step, the source modal medical images are... The noisy image at the current i-th time step The corrected result from the previous time step is input into the trained back-diffusion branch, which is then processed by the conditional generator. Provide the intermediate results at the current time step. : Using a trained temporal awareness latent network to analyze noisy images and intermediate results The potential distribution differences between them are measured, and the distribution correction information for the current time step is calculated in real time according to the optimal transmission loss construction method. This includes the noisy image at the i-th time step. The input to the time-aware latent network, together with the difference term of the intermediate results, constitutes the optimal transmission loss: ;in, The noisy image at the current i-th time step and intermediate results The distance between them for Expectations For the time-aware latent network, based on the noisy image at the i-th time step The generated distribution correction information, This represents the optimal transmission loss.
2. The method for synthesizing cross-modal medical images according to claim 1, characterized in that, The steps for acquiring source modality medical images and target modality medical images of the same organ include: Acquire initial source modal medical images and initial target modal medical images of the same organ; The initial source modal medical image and the initial target modal medical image are spatially registered and intensity normalized to obtain standardized source modal medical images and target modal medical images.
3. The method for synthesizing cross-modal medical images according to claim 1, characterized in that, The method further includes: inputting the source modality medical image, the noisy image sequence, and the noise mask into the back diffusion branch of the diffusion model, and performing time-step denoising processing starting from the last noisy image to generate a synthetic medical image, wherein the noise mask is obtained through the following process: The source modality medical image is subjected to local signal-to-noise ratio detection to generate a noise mask.
4. The method for synthesizing cross-modal medical images according to claim 3, characterized in that, The step of inputting the source modality medical image, the noisy image sequence, and the noise mask into the backdiffusion branch of the diffusion model, and performing time-step denoising processing starting from the last noisy image to generate a synthetic medical image includes: The following processing is performed on the i-th noisy image: the corrected intermediate results of the source modal medical image, the i-th noisy image, and the (i-1)-th noisy image are input into the back diffusion branch to generate the intermediate result of the current noisy image; The intermediate results are aligned based on the optimal transmission algorithm, and distribution correction information is generated based on the alignment results. Under the constraint of the noise mask, the intermediate result is corrected based on the distribution correction information to obtain the corrected intermediate result.
5. The method for synthesizing cross-modal medical images according to claim 4, characterized in that, The step of correcting the intermediate result based on the distribution correction information under the constraint of the noise mask to obtain the corrected intermediate result includes: The intermediate results are corrected for distribution consistency based on the distribution correction information to obtain a first correction result; At multiple preset resolution levels, the structural differences between the first correction result and the source modal medical image are calculated along the horizontal direction, the vertical direction and the two diagonal directions, respectively, to obtain a structural change map that characterizes structural changes in different directions. Based on the noise region indicated by the noise mask, the structural difference values corresponding to the noise region in the structural change map are weighted and attenuated to obtain a weighted structural change map. Based on the weighted structural change diagram, structural correction is performed on the first correction result to obtain a standardized intermediate correction result.
6. The method for synthesizing cross-modal medical images according to claim 5, characterized in that, The step of performing distribution consistency correction on the intermediate results based on the distribution correction information to obtain the first correction result includes: Based on the distribution correction information, the intermediate results are corrected for distribution consistency to obtain an initial first correction result; The gradient update magnitude of the i-th noisy image is dynamically adjusted based on the difference between the initial first correction result and its posterior estimate. The initial first correction result is updated by gradient based on the difference to obtain a standardized first correction result; wherein, the posterior estimate is generated by the backdiffusion branch in the i-th noisy image.
7. The method for synthesizing cross-modal medical images according to claim 1, characterized in that, After generating the synthetic medical image, the process also includes: Wavelet decomposition is performed on the synthetic medical image and the source modality medical image respectively to extract the synthetic global information and synthetic local information from the synthetic medical image and the source modality global information and source modality local information from the source modality medical image respectively. Modal appearance consistency constraint fusion is performed on the synthesized global information and the source modal global information. Obtain integrated global information; Based on the synthesized local information and the source modality local information, a saliency map is constructed; Based on the saliency map, the synthesized local information and the source modality local information are adaptively weighted and fused to obtain fused local information; The fused local information and the fused global information are reconstructed by inverse wavelet transform to obtain the wavelet fused image. Based on the source modality medical images, the wavelet fusion images are corrected for anatomical structures to obtain standardized synthetic medical images.
8. A cross-modal medical image synthesis system, characterized in that, The system includes: The image acquisition module is used to acquire source modal medical images and target modal medical images of the same organ; The forward diffusion module is used to input the target modality medical image and the source modality medical image into the forward diffusion branch of the diffusion model. According to a preset time step, the target modality medical image and the source modality medical image are linearly interpolated together, and a noisy image for the corresponding time step is generated based on the interpolation result. Random noise corresponding to the noise scheduling parameter of the current time step is gradually added to the interpolation result, so that the target modality gradually forms a noisy image with different noise intensities while maintaining its association with the source modality. The back diffusion module is used to input the source modal medical image and the noisy images at each time step into the back diffusion branch of the diffusion model, and perform time-step denoising processing starting from the last noisy image to generate a synthetic medical image. Specifically, the i-th noisy image is processed as follows: the corrected intermediate results of the source modal medical image, the i-th noisy image, and the (i-1)-th noisy image are input into the backdiffusion branch to generate the intermediate result of the current noisy image. The intermediate result is then aligned based on the optimal transmission algorithm. Distribution correction information is generated based on the alignment result, and the intermediate result is corrected based on the distribution correction information to obtain the corrected intermediate result. For the last noisy image, the corrected intermediate result of its corresponding (i-1)-th noisy image has an initial preset value, and the last corrected intermediate result is used as the composite image. For the i-th time step, the source modal medical images are... The noisy image at the current i-th time step The corrected result from the previous time step is input into the trained back-diffusion branch, which is then processed by the conditional generator. Provide the intermediate results at the current time step. : Using a trained temporal awareness latent network to analyze noisy images and intermediate results The potential distribution differences between them are measured, and the distribution correction information for the current time step is calculated in real time according to the optimal transmission loss construction method. This includes the noisy image at the i-th time step. The input to the time-aware latent network, together with the difference term of the intermediate results, constitutes the optimal transmission loss: ;in, The noisy image at the current i-th time step and intermediate results The distance between them for Expectations For the time-aware latent network, based on the noisy image at the i-th time step The generated distribution correction information, This represents the optimal transmission loss.
9. An electronic device, characterized in that, The electronic device includes: One or more processors; A storage device for storing one or more programs, which, when executed by the one or more processors, cause the electronic device to implement the method for synthesizing cross-modal medical images as described in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, It stores a computer program that, when executed by the computer's processor, causes the computer to perform the method for synthesizing cross-modal medical images as described in any one of claims 1 to 7.