An image artifact repairing method and system based on a counterfactual diffusion model

By employing an image artifact restoration method based on a counterfactual diffusion model, texture and structure denoising networks are trained using artifact-free data. This solves the dependence on paired data in image artifact restoration, achieving efficient image quality improvement and cost reduction, and promoting the development of smart healthcare.

CN119941579BActive Publication Date: 2026-06-26XINXIANG MEDICAL UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
XINXIANG MEDICAL UNIV
Filing Date
2025-01-08
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing image artifact restoration techniques rely on paired data, which makes it difficult to effectively handle real artifacts, and traditional methods have limited effectiveness when dealing with artifacts in complex structures.

Method used

An image artifact restoration method based on a counterfactual diffusion model is adopted. By acquiring multimodal data, performing data filtering and preprocessing, constructing a standardized dataset, and using a texture denoising network, a structure denoising network, and a discriminator network, artifact restoration is achieved by training only on artifact-free data.

Benefits of technology

It can effectively improve image quality, maintain high resolution and detail, reduce repetitive operations and costs in imaging examinations, promote the industrialization of intelligent image processing systems, improve the accuracy of disease diagnosis, and reduce the demand for medical resources.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119941579B_ABST
    Figure CN119941579B_ABST
Patent Text Reader

Abstract

The application discloses an image artifact repairing method and system based on a counterfactual diffusion model, relates to the technical field of image processing, and comprises the following steps: acquiring multi-modal data, performing data screening on the multi-modal data; performing preprocessing on the screened data, and constructing a standardized data set; constructing an image artifact repairing model based on a counterfactual diffusion model; training the image artifact repairing model through the standardized data set; and repairing artifacts of an image image contaminated by artifacts through the trained image artifact repairing model. In the training stage, only artifact-free data is used, so that the limitation of lacking paired data on the current artifact repairing technology is solved; in the repairing stage, the method is not limited to a single part of a single mode, and when different modal images are processed, the image quality after repairing can still be guaranteed, the high resolution and details of the image are maintained, the image quality can be effectively improved, the key details in the image are retained, and the accuracy of the image is improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and more specifically to an image artifact repair method and system based on a counterfactual diffusion model. Background Technology

[0002] Imaging is an essential component of modern medicine, playing a crucial role in assisting clinical diagnosis and treatment. Imaging provides a clear and intuitive understanding of a patient's internal tissues and structures, helping doctors quickly locate and confirm lesions and formulate safer and more effective surgical plans. Therefore, the quality of imaging images is extremely important. Although the acquisition speed and accuracy of imaging images are continuously optimized with hardware and technological upgrades, the complexity of the imaging process makes it susceptible to various factors that can affect noise and artifacts, impacting the subsequent use of the images.

[0003] Traditional image artifact restoration methods typically rely on image processing techniques such as filtering, iterative reconstruction algorithms, projection interpolation, hybrid methods, and model correction. While these traditional methods perform well in certain specific scenarios and can improve image quality to some extent, they often depend on manually designed features and parameters, lacking the ability to automatically adapt to complex image content. Therefore, traditional methods have limited effectiveness in handling nonlinear and complex artifacts and are difficult to generalize to various imaging modes and artifact types.

[0004] In existing technologies, research on image artifact removal is primarily based on deep learning methods. Deep learning methods have demonstrated performance improvements and reduced runtime compared to traditional methods. However, most deep learning methods for image artifact restoration are based on supervised learning. Due to the difficulty in obtaining paired data of artifact-free and artifact-damaged images, these methods typically use simulated images with image artifacts to train the network. Therefore, it is difficult to apply them to real-world artifact data restoration. To overcome the limitations of deep learning methods based on synthetic data, deep learning methods using unpaired data have also been explored. Some methods treat the image artifact removal problem as a transformation between two image domains (artifact-free domain and artifact domain) and solve it based on recurrent generative adversarial networks. While they utilize real-world image artifact data, the performance of these algorithms is often limited due to the lack of an explicit image artifact removal mechanism.

[0005] Therefore, how to propose an image artifact restoration method and system based on a counterfactual diffusion model, using only artifact-free data during the training phase, to overcome the limitations of current artifact restoration techniques due to the lack of paired data and effectively improve image quality is a problem that urgently needs to be solved by those skilled in the art. Summary of the Invention

[0006] In view of this, the present invention provides an image artifact restoration method and system based on a counterfactual diffusion model. During the training phase, only artifact-free data is used, overcoming the limitation of current artifact restoration techniques due to the lack of paired data, and effectively improving image quality. To achieve the above objectives, the present invention adopts the following technical solution:

[0007] An image artifact restoration method based on a counterfactual diffusion model includes:

[0008] Acquire multimodal data and perform data filtering on the multimodal data;

[0009] The filtered data is preprocessed to construct a standardized dataset;

[0010] An image artifact restoration model is constructed based on a counterfactual diffusion model.

[0011] The image artifact restoration model was trained using a standardized dataset.

[0012] The image artifact restoration model, trained to repair artifacts, is used to restore images contaminated by artifacts.

[0013] Optionally, the data filtering includes: using image annotation to ensure that the filtered image data is artifact-free.

[0014] Optionally, the preprocessing of the filtered data includes:

[0015] Data normalization is performed on data with different dimensions, modalities, and intensity values.

[0016] The 3D image is split along three axes, and images with extreme aspect ratios are discarded;

[0017] The distribution of input data is standardized through normalization and resizing operations.

[0018] Optionally, the image artifact restoration model includes a texture denoising network, a structure denoising network, and a discriminator network. The original image is gradually denoised through a forward diffusion process, and the image is gradually restored through a reverse denoising process. The texture denoising network adopts a UNet-based denoising network to restore the image by adding and removing noise from the input image. The structure denoising network is used to guide the texture denoising process, and the discriminator network is used to measure the semantic relevance between the structure denoising result and the texture denoising result.

[0019] Optionally, training the image artifact restoration model using a standardized dataset includes learning the anatomical structure and contextual information of artifact-free images by training only on artifact-free images.

[0020] Optionally, the step of learning the anatomical structure and contextual information of artifact-free images by training only artifact-free images includes: inputting the selected artifact-free images into the training phase to train the texture denoising network, the structural denoising network, and the discriminator network; training the texture denoising network and the structural denoising network separately through a joint optimization strategy; during the forward pass of the texture denoising network, Gaussian noise is gradually injected into the artifact-free images; during the reverse pass, the denoising results of the structural denoising network are used to guide the texture denoising network to predict noise and gradually reconstruct the image; and the discriminator network acts as a supervisory mechanism to optimize the semantic consistency between the two.

[0021] Optionally, the training process of the discriminator network is as follows:

[0022] The discriminator network D was trained to compute the result y of the structure denoising network. t-1 The result of the texture denoising network x t-1 The semantic relevance scores between them are calculated, and a discriminator loss L is used. dis and triplet loss L tri To optimize the discriminator network;

[0023] An adaptive resampling strategy is used during training, when y t-1 and x t-1 When the semantic relevance score between them is lower than a certain threshold Δ, firstly by analyzing x... t-1 Add noise generation Then through the Denoising was performed to obtain the updated result. Then use To guide the texture denoising network in generating updated versions Subsequently, the assessment and To determine the semantic correlation between the original image and the structure, repeat the above steps to adaptively adjust the semantic correlation between them.

[0024] Optionally, the artifact restoration of the image contaminated by artifacts using the trained image artifact restoration model includes: selectively performing denoising resampling only in the artifact regions to preserve the original texture and anatomical structure of the artifact-free regions to the greatest extent possible.

[0025] Optionally, the artifact repair includes:

[0026] The counterfactual probability is determined based on the objective of artifact restoration, and the counterfactual probability is re-determined based on the fact that the impact of the artifact is limited to a specific area while other areas remain unchanged.

[0027] During the inference phase, only the artifact regions are subjected to backdiffusion resampling, while the artifact-free regions retain their original texture. The input is the artifact image to be repaired and the corresponding artifact label image. The sum of the artifact-free regions at time t-1 and the artifact regions from the denoising resampling result at time t is used as the repair result at time t-1. The repair result at time t-1 is used as the input for the next denoising step, i.e., from t-1 to t-2. The repair result obtained at each time step is used as the input for the next time step.

[0028] Optionally, an image artifact restoration system based on a counterfactual diffusion model includes:

[0029] Data acquisition module: used to acquire multimodal data and perform data filtering on the multimodal data;

[0030] Preprocessing module: Used to preprocess the filtered data and build a standardized dataset;

[0031] Model building module: used to build image artifact restoration models based on counterfactual diffusion models;

[0032] Model training module: used to train the image artifact restoration model using a standardized dataset;

[0033] Artifact Repair Module: Used to repair artifacts in images contaminated by artifacts using a trained image artifact repair model.

[0034] As can be seen from the above technical solution, compared with the prior art, the present invention discloses an image artifact repair method and system based on a counterfactual diffusion model, which has the following beneficial effects:

[0035] This invention proposes an image artifact repair method based on a counterfactual diffusion model, comprising: acquiring multimodal data and filtering the multimodal data; preprocessing the filtered data to construct a standardized dataset; constructing an image artifact repair model based on the counterfactual diffusion model; training the image artifact repair model using the standardized dataset; and repairing the artifact-contaminated image using the trained image artifact repair model. (1) This invention solves the repair of most imaging image artifacts and is not limited to a single part of a single modality. At the same time, this invention can still ensure the image quality after repair when processing images of different modalities, maintaining the high resolution and detail of the image; (2) This invention not only improves the technical performance of image repair but also brings significant economic benefits, significantly reducing repetitive operations and repair steps in imaging examinations, reducing the time and cost for hospitals in the image repair process, and saving costs for hospitals. At the same time, the application of the technology of this invention drives the industrialization process of the next generation of medical image processing systems and provides strong technical support for the commercialization of related technologies. Especially in the fields of intelligent image processing equipment and remote medical diagnosis systems, it is expected to drive the market to continue to expand; (3) Artifact repair can effectively improve image quality, especially in medical images. After artifact repair, the key details in the image are preserved, which enhances the accuracy of doctors' judgment of diseases and enables patients to discover their condition in time, providing strong support for the diagnosis and treatment of patients' diseases; (4) The technology of this invention can promote the application and popularization of imaging artificial intelligence technology, promote the development of intelligent medical care, and reduce the demand for medical resources, especially in resource-scarce or remote areas, which has important promotion significance. Attached Figure Description

[0036] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.

[0037] Figure 1 The present invention provides a flowchart of an image artifact repair method based on a counterfactual diffusion model.

[0038] Figure 2 This is a schematic diagram of the training phase of the counterfactual diffusion model provided by the present invention.

[0039] Figure 3 This is a schematic diagram of the reasoning stage based on the counterfactual diffusion model provided by the present invention.

[0040] Figure 4This is a schematic diagram showing the before and after artifact repair provided by the present invention. Detailed Implementation

[0041] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0042] This invention discloses an image artifact repair method based on a counterfactual diffusion model, comprising:

[0043] Acquire multimodal data and perform data filtering on the multimodal data;

[0044] The filtered data is preprocessed to construct a standardized dataset;

[0045] An image artifact restoration model is constructed based on a counterfactual diffusion model.

[0046] The image artifact restoration model was trained using a standardized dataset.

[0047] The image artifact restoration model, trained to repair artifacts, is used to restore images contaminated by artifacts.

[0048] Furthermore, the data filtering includes: using image annotation to ensure that the filtered image data is artifact-free.

[0049] Furthermore, the preprocessing of the filtered data includes:

[0050] Data normalization is performed on data with different dimensions, modalities, and intensity values.

[0051] The 3D image is split along three axes, and images with extreme aspect ratios are discarded;

[0052] The distribution of input data is standardized through normalization and resizing operations.

[0053] Furthermore, the image artifact restoration model includes a texture denoising network, a structure denoising network, and a discriminator network. It gradually adds noise to the original image through a forward diffusion process and gradually restores the image through a reverse denoising process. The texture denoising network adopts a UNet-based denoising network to restore the image by adding and removing noise from the input image. The structure denoising network is used to guide the texture denoising process, and the discriminator network is used to measure the semantic relevance between the structure denoising result and the texture denoising result.

[0054] Furthermore, training the image artifact restoration model using a standardized dataset includes learning the anatomical structure and contextual information of artifact-free images by training only on artifact-free images.

[0055] Furthermore, the method of learning the anatomical structure and contextual information of artifact-free images by training only artifact-free images includes: inputting the selected artifact-free images into the training phase to train the texture denoising network, the structural denoising network, and the discriminator network; training the texture denoising network and the structural denoising network separately through a joint optimization strategy; during the forward process of the texture denoising network, Gaussian noise is gradually injected into the artifact-free images; during the reverse process, the denoising results of the structural denoising network are used to guide the texture denoising network to predict noise and gradually reconstruct the image; and the discriminator network acts as a supervisory mechanism to optimize the semantic consistency between the two.

[0056] Furthermore, the training process of the discriminator network is as follows:

[0057] The discriminator network D was trained to compute the result y of the structure denoising network. t-1 The result of the texture denoising network x t-1 The semantic relevance scores between them are calculated, and a discriminator loss L is used. dis and triplet loss L tri To optimize the discriminator network;

[0058] An adaptive resampling strategy is used during training, when y t-1 and x t-1 When the semantic relevance score between them is lower than a certain threshold Δ, firstly by analyzing x... t-1 Add noise generation Then through the Denoising was performed to obtain the updated result. Then use To guide the texture denoising network in generating updated versions Subsequently, the assessment and To determine the semantic correlation between the original image and the structure, repeat the above steps to adaptively adjust the semantic correlation between them.

[0059] Furthermore, the artifact restoration of the image contaminated by artifacts using the trained image artifact restoration model includes: selectively performing denoising resampling only in the artifact regions to preserve the original texture and anatomical structure of the artifact-free regions to the greatest extent possible.

[0060] Furthermore, the artifact repair includes:

[0061] The counterfactual probability is determined based on the objective of artifact restoration, and the counterfactual probability is re-determined based on the fact that the impact of the artifact is limited to a specific area while other areas remain unchanged.

[0062] During the inference phase, only the artifact regions are subjected to backdiffusion resampling, while the artifact-free regions retain their original texture. The input is the artifact image to be repaired and the corresponding artifact label image. The sum of the artifact-free regions at time t-1 and the artifact regions from the denoising resampling result at time t is used as the repair result at time t-1. The repair result at time t-1 is used as the input for the next denoising step, i.e., from t-1 to t-2. The repair result obtained at each time step is used as the input for the next time step.

[0063] In a specific implementation, an image artifact restoration system based on a counterfactual diffusion model includes:

[0064] Data acquisition module: used to acquire multimodal data and perform data filtering on the multimodal data;

[0065] Preprocessing module: Used to preprocess the filtered data and build a standardized dataset;

[0066] Model building module: used to build image artifact restoration models based on counterfactual diffusion models;

[0067] Model training module: used to train the image artifact restoration model using a standardized dataset;

[0068] Artifact Repair Module: Used to repair artifacts in images contaminated by artifacts using a trained image artifact repair model.

[0069] In a specific implementation, an image artifact repair method based on a counterfactual diffusion model includes the following steps:

[0070] S1. Data Acquisition Steps: Acquire a large amount of medical imaging data and filter it to obtain high-quality, artifact-free data;

[0071] S2. Data preprocessing steps: Normalize and process the collected modal data;

[0072] S3. Construct an image artifact restoration model based on a counterfactual diffusion model. The model consists of three parts: a texture denoising network, a structural denoising network to guide the texture denoising process, and a discriminator network to measure the semantic correlation between the structural denoising result and the texture denoising result.

[0073] S4. Training steps: Utilize artifact-free images to learn the ability to generate local anatomical representations from contextual information;

[0074] S5. Inference steps: Denoising resampling is selectively performed only in artifact regions to preserve the original texture and anatomical structure of artifact-free regions to the maximum extent.

[0075] Furthermore, S1 specifically includes the following steps:

[0076] S101. The first stage of constructing the dataset is to collect candidate images. Images were searched on various websites, such as TCIA, OpenNeuro, Grand Challenge, Synapse, GitHub, etc. After searching through open resources related to medicine, a large image dataset was obtained.

[0077] S102. Filter the data collected in S101. During the data filtering process, based on the annotation method of radiology experts, it is ensured that the filtered image data only contains artifact-free images, providing reliable training samples for subsequent artifact restoration tasks.

[0078] Furthermore, step S2 specifically includes the following steps:

[0079] S201. Normalize data with different dimensions, modalities, and intensity values. Normalize the voxel or pixel values ​​that vary greatly due to different modalities and acquisition methods into a unified format, including first normalizing the original image to [0, 1], and then multiplying by 255 to take the upper limit value.

[0080] S202. Split all 3D images along the three axes and discard images with extreme aspect ratios. Specifically, slices with the shortest side length less than half the longest side length are discarded to prevent the target area from becoming extremely blurry after resizing.

[0081] S203. By normalizing and adjusting the size, the distribution of input data is unified, reducing noise interference that may occur during the model learning process.

[0082] Furthermore, S3 specifically includes the following steps:

[0083] S301. An image artifact restoration model based on a counterfactual diffusion model is constructed. The counterfactual diffusion model proposed in this invention is based on the theoretical framework of the diffusion model.

[0084] The diffusion model progressively adds noise to the original image through a forward diffusion process and then progressively restores the image through a reverse denoising process. During the forward diffusion process, the medical image x0 undergoes T time steps of noise perturbation, gradually generating a Gaussian-distributed image, the conditional probability distribution of which is described by the following formula:

[0085]

[0086] Where N represents the normal distribution, I is the identity matrix, and α t These are hyperparameters in the diffusion process. This is a quantization parameter that evolves over time step t, often referred to as the "cumulative noise figure." In the inverse denoising process, a denoising network f is used. θ (x t |t) learns the conditional probability distribution P(x) t-1 |x t By gradually optimizing P(x) t-1 |x t This allows for the gradual repair of artifact areas.

[0087] 302. The counterfactual diffusion-based model constructed in this invention comprises three parts and two stages. The three main parts of the model are a texture denoising network, a structure denoising network, and a discriminator network. The texture denoising network employs a UNet-based denoising network, achieving the purpose of restoration by adding and removing noise from the input image. The structure denoising network guides the texture denoising process, and the discriminator network measures the semantic relevance between the structure denoising result and the texture denoising result. The proposed structure-guided denoising network demonstrates superior ability in learning the anatomical representation of artifact-free images. The two main stages of the model are training and inference. In the training stage, the model learns the anatomical structure and contextual information of artifact-free images by training only on artifact-free images. In the inference stage, denoising resampling is selectively performed only in artifact regions to preserve the original texture and anatomical structure of artifact-free regions to the maximum extent.

[0088] Furthermore, step S4 specifically includes the following steps:

[0089] S401. This invention requires no paired artifact images or artifact-free images during the training phase; model training can be completed solely based on artifact-free images. Compared to the strong dependence on paired images in traditional methods, this invention significantly reduces the requirements for dataset construction while improving the model's generalization ability to complex artifacts in real-world scenes.

[0090] S402. During the training phase, input the artifact-free images selected in step S1, and then train the three networks mentioned in S3 respectively. The texture denoising network and the structure denoising network are trained respectively through a joint optimization strategy, and the discriminator network is used as a supervision mechanism to optimize the semantic consistency between the two.

[0091] S403. First, the texture denoising network is trained. In the forward pass, Gaussian noise is gradually injected into the artifact-free image. In the reverse pass, the denoising results of the structural denoising network are used to guide the texture denoising network to predict noise and gradually reconstruct the image. The structural denoising network introduced in this invention, guided by structure, improves the global semantic integrity and reliability of the restoration results.

[0092] The specific details of training the structure-based denoising network mentioned in S404 and S403 are as follows:

[0093] The structural denoising network takes the artifact-free image from S402 as input and progressively adds noise. During this noise addition, the semantics of the structure become increasingly sparse over time. Noise scheduling gradually degenerates the input image into a combination of structurally sparse edge maps and Gaussian noise. The noise-added result is then denoised using a denoising network, and this denoising result guides the texture denoising process at that time step. This invention addresses the shortcomings of traditional diffusion models in preserving global structure by introducing a structural denoising network.

[0094] S405, Training the discriminator D, details are as follows:

[0095] To ensure the effectiveness of the structure denoising network in guiding the texture denoising network, a discriminator network D was trained according to S404 to calculate the result y of the structure denoising network. t-1 The result of the texture denoising network x t-1 The semantic relevance score D(y) between them t-1 ,x t-1 ,t-1)(abbreviated as D(y t-1 ,x t-1 This ensures the semantic relevance between structure and texture. Simultaneously, a discriminator loss L is used. dis and triplet loss L tri To optimize D.

[0096] S406. An adaptive resampling strategy was employed during training, adjusting the semantic relevance between the original image and the structure based on the scores provided by the discriminator mentioned in S405. Specifically, when y t-1 and x t-1 When the semantic relevance score between them is lower than a certain threshold Δ, firstly by analyzing x... t-1 Add noise generation Then through the Denoising was performed to obtain the updated result. Then use To guide the texture denoising network in generating updated versions Subsequently, the assessment and To determine the semantic correlation between the original image and the structure, repeat the above steps to adaptively adjust the semantic correlation between them.

[0097] S407. The diffusion model has high computational cost. To reduce the computational cost of the diffusion model during inference, this invention employs a progressive distillation method during training, aiming to accelerate the inference process by reducing the denoising steps of the diffusion model. The specific implementation is as follows:

[0098] First, the denoising network f, trained through T denoising steps... θ (x t |t,m) serves as the teacher model, upon which a student model is constructed. The student model inherits the parameters from the teacher model as initialization, but its denoising steps are reduced to... This reduces the computational cost of each inference to half. By minimizing the distillation loss function, the output of the student model is made consistent with the output of the teacher model at time step t-1, thus completing the distillation training. The objective function for training is as follows:

[0099]

[0100] Where, x t L represents a noisy image with t time steps. Distill The knowledge distillation loss is defined as m, where m is the label indicating the artifact region. This loss is achieved by performing n distillations on the student model (i.e., each distillation step reduces the denoising value to the original value). This resulted in a student model with significantly improved computational efficiency.

[0101] Furthermore, S5 specifically includes the following steps:

[0102] S501, The goal of artifact restoration is to remove artifacts from the image x. a Restored to its ideal artifact-free state This is a counterfactual state, unobservable in reality. Its counterfactual probability can be expressed as:

[0103]

[0104] Where, do(C=c f () is the intervention operator in causal inference, indicating that condition C is set to an artifact-free state through external intervention. f x a c represents an image contaminated with artifacts. f Represents the ideal state of no artifacts. This represents an ideal image under artifact-free conditions.

[0105] Specifically, in artifact restoration tasks, the impact of artifacts is limited to specific areas, while other areas remain unchanged. Therefore, a label 'm' is introduced to indicate the artifact region, distinguishing it from unaffected areas. Specifically, the known image information can be represented as:

[0106] X known =x a ⊙(1-m);

[0107] Where m is a binary image containing only 0s and 1s, where 1 represents a region with artifacts (i.e., the region that needs to be repaired), and 0 represents a region without artifacts (i.e., the region with normal anatomical structures that need to be preserved). ⊙ is a pixel-level multiplication operator. The counterfactual probability of repair in this case can be further expressed as:

[0108]

[0109] Here, M = m represents the artifact region label, specifying the application area of ​​the do operator. Through counterfactual reasoning, images can be modeled using the assumption of an artifact-free state (i.e., an ideal state), thereby significantly reducing the impact of artifacts on subsequent image analysis tasks.

[0110] S502. During the inference phase, only the artifact-free regions are subjected to backdiffusion resampling, while the original texture of the artifact-free regions remains unchanged, thereby maximizing detail preservation and improving restoration efficiency. The input consists of the artifact image to be restored and the corresponding artifact marker image. Finally, the artifact-free regions at time t-1 are analyzed. Artifact regions of the denoised resampling results at time t The sum is the result of the repair at time t-1. t-1 .

[0111] S503. As described in S502, the inference stage first adds noise to the input image y to obtain the noise-added result at time t-1. Next, the noise-added results will be analyzed. The calculation yields the diffusion artifact-free region at time t-1.

[0112] S504. Next, the repair results at time t will be presented. Record Denoising is performed under the guidance of the denoising results from the structure-guided network. The denoised resampling results from t to t-1 The artifact region of the denoised resampling result at time t is obtained. Finally, this result is added to the result of the diffusion-free artifact region at time t-1 obtained in S502 to obtain the restoration result at time t-1.

[0113] S505, Based on the repair results obtained in S504 It will be used as the input for the next denoising step, i.e., t-1 to t-2. That is, the repair results obtained at each time step will be used as the input for the next time step.

[0114] In a specific implementation, an image artifact repair method based on a counterfactual diffusion model, such as... Figure 1 As shown, the specific steps are as follows:

[0115] Step S1: Data Acquisition: Acquire a large amount of medical imaging data and filter it to obtain high-quality, artifact-free data;

[0116] Step S2, Data Preprocessing: Normalize and process the collected modal data;

[0117] Step S3: Model Construction: Construct an image artifact restoration model based on the counterfactual diffusion model;

[0118] Step S4, Training: Without paired data, the system learns the ability to generate representations of local anatomical results from contextual information using only artifact-free images;

[0119] Step S5, Reasoning: Perform denoising resampling selectively only in artifact regions to preserve the original texture and image style of artifact-free regions to the greatest extent possible.

[0120] In a specific embodiment, step S2 specifically includes the following steps:

[0121] Step S201: For 2D images, uniformly normalize the voxel or pixel values ​​to a consistent image format. Specifically, first, normalize the original image using... Normalize to [0,1], then multiply by 255 to get the upper limit value. Where x min and x max These are the minimum and maximum values ​​in the image, respectively.

[0122] Step S202: For 3D images, split all 3D images along three axes. For each axis of the 3D image, retain only the 30 slices located in the middle. By selecting a portion of the slices in the middle, the core area of ​​the anatomical structure can usually be captured, avoiding noise interference from the edge areas and achieving a more efficient processing effect. At the same time, discard images with extreme aspect ratios. Specifically, slice images with the shortest side length less than half the longest side length are discarded to prevent the target area from becoming extremely blurry during subsequent scaling. Afterwards, scale all processed images to 256 images and save them.

[0123] Step S203: For the artifact images used in the inference stage, the corresponding artifact regions need to be marked. The marked image is a binary image containing only 0 and 1. 1 represents that the current pixel is an artifact region, and 0 represents that the current pixel is an artifact-free region. The marking of artifacts is done by radiologists with rich experience.

[0124] In a specific embodiment, step S3 includes the following steps:

[0125] Step S301: An image artifact restoration model based on a counterfactual diffusion model is constructed. The counterfactual diffusion model proposed in this invention is based on the theoretical framework of the diffusion model. The diffusion model gradually adds noise to the original image through a forward diffusion process and gradually restores the image through a reverse denoising process.

[0126] Step S302, the image artifact restoration model based on the counterfactual diffusion model constructed in step S301, comprises three parts and two stages. The three main parts of the model are a texture denoising network, a structure denoising network, and a discriminator network. The two main stages of the model are training and inference. In the training stage, the model learns the anatomical structure and contextual information of artifact-free images by training only on artifact-free images. In the inference stage, denoising resampling is performed only in artifact regions to preserve the original texture and anatomical structure of artifact-free regions to the greatest extent possible.

[0127] Step S303: The texture denoising network is based on the UNet architecture, with a basic encoder-decoder structure. The input image passes through the encoder's convolutional layers and downsampling to extract high-dimensional information, and then through the decoder's upsampling layers to gradually restore the image's spatial resolution. The final output image size is consistent with the input image. In texture denoising, the network's basic channel count is set to 64, and the network depth is set to 4. The network extracts and restores features at multiple levels by combining the texture information of the input color image with conditional input. The encoding stage extracts texture features layer by layer, and the decoding stage restores the image's texture details layer by layer with the conditional input.

[0128] Step S304: The structural denoising network in this embodiment uses the same denoising network as the texture denoising network. In each layer, the input feature map undergoes multiple levels of convolution and pooling operations to extract high-dimensional features. At the same time, the network weights are adjusted through conditional input (edge ​​map) so that the network can make full use of the edge features of the input image. The encoding stage extracts features layer by layer, and the decoding stage restores the spatial resolution of the image layer by layer, finally outputting the denoised original image consistent with the input.

[0129] Step S305: The discriminator network in this embodiment uses the same UNet architecture as the structure denoising network. Simultaneously, the discriminator loss L is used. dis and triplet loss L tri To optimize D.

[0130] In a specific implementation, step S4 includes the following steps:

[0131] Step S401: All algorithms in this embodiment are implemented on an NVIDIA GeForce RTX4090 GPU with 24GB of memory using PyTorch (version 1.13.0) and Python (version 3.8.20).

[0132] Step S402: In the embodiments disclosed herein, when training the texture denoising network, the structure denoising network, and the discriminator network, the Adam optimizer is used to accelerate the convergence process, with beta1 set to 0.9, beta2 set to 0.99, and the batch size set to 16. The learning rate is 1×10⁻⁶. -4 The learning rate update decay factor is 0.5, the diffusion model time step T = 400, and the input image size is 256×256. The maximum noise intensity in the denoising stage is set to 30, and the minimum noise intensity is set to 0.005 to ensure that noise is almost completely eliminated in the later stages of the diffusion process, thereby generating a high-quality image.

[0133] Step S403: In the structural denoising stage of this embodiment, the input image is the image to be repaired, and the final state of the denoising is the edge map of the image to be repaired plus Gaussian noise. The edge map is obtained based on the input image using the Canny algorithm for edge detection. The Canny algorithm achieves edge extraction through gradient magnitude calculation and double-threshold edge connection, as shown in the following formula:

[0134]

[0135] in, and Let x and y represent the gradients in the x and y directions, respectively. Through non-maximum suppression and double thresholding, the final edge image E is generated. edge .

[0136] Step S404, the specific details of the structural denoising in step S403 are as follows:

[0137] The input data includes artifact-free images and their corresponding edge maps. The edge maps are extracted from the input images using the Canny algorithm and used as conditional input to guide the network's inpainting process. During training of the structured denoising network, the input image size is 256×256. After the initial convolutional layer, the feature map size remains 256×256, and the number of channels increases to 64. After the first downsampling operation, the feature map size becomes 128×128, and the number of channels increases to 128. After the second downsampling operation, the feature map size becomes 64×64, and the number of channels increases to 256. After the third downsampling operation, the feature map size becomes 32×32, and the number of channels increases to 512. The feature map maintains a size of 32×32 and 512 channels in the intermediate layers, and global features are further extracted through an attention module. In the upsampling stage, the feature map size is restored layer by layer, and multi-scale features are fused through skip connections, ultimately restoring the original resolution of 256×256. Mean squared error (MSE) is used as the loss function to optimize the difference between the generated image and the target image. The weight parameter of the conditional input is set to 0.1 to dynamically adjust the influence of the conditional input in structural denoising. For example... Figure 2 As shown, the input for structural denoising is an artifact-free image. During the forward process, noise is gradually added to the input image. As noise is added, the structure of the input image becomes sparser, and finally, the corresponding edge map with added noise x is obtained. t Next, the UNet denoising network is used to predict the noise, and the denoising result x is obtained. t-1 .

[0138] Step S405: During the training of the texture denoising network, the input data consists of an artifact-free image and the restoration result image from the structural denoising stage. The restoration result from the structural denoising stage will guide texture denoising, enabling the model to better learn the anatomical structure of the artifact-free image during training. The weight parameter of the conditional input is set to 0.1 to dynamically adjust the influence of the conditional input on texture restoration. Mean squared error (MSE) is used as the loss function to optimize the error between the generated image and the target texture image. The changes in network structure and feature maps during the training stage of texture denoising are consistent with the changes in structural denoising in step S404. Specific details of structural denoising are as follows... Figure 2 As shown, the input is an artifact-free image. Next, noise is added to the image to obtain the noise-added result y. t Then, the structure x is used for structure denoising. t-1 This guides the texture denoising process, resulting in the texture denoising result y. t-1 .

[0139] Step S406: To ensure the effectiveness of structure-guided texture denoising semantics, this embodiment introduces a discriminator network D to calculate the semantic relevance score D(y) between structure and texture. t-1 ,x t-1 During training, the discriminator uses the texture denoising result y. t-1 Denoising result of the current time step structure x t-1 Use it as input to optimize the semantic relevance between the two.

[0140] Step S407: The discriminator network mentioned in step S406 adopts a convolutional neural network (CNN) structure, combined with temporal embedding, and gradually extracts features through the encoding layer, finally generating a semantic relevance score for the input image. Its input is the current time step t and the texture denoising result y at the current time step. t-1 and structural denoising results x t-1 The discriminator mainly consists of three parts: a time-step embedding layer, a convolutional coding layer, and a feature standard deviation module. The discriminator introduces a time-step embedding module, which transforms the input time step t into a fixed-dimensional embedding vector t. emdedThe image is used as a conditional input. The embedding dimension is 128, and after activation function processing, it participates in convolutional layer calculations. Subsequently, multiple encoding layers are used to extract features from the input image and the artifact-optimized image. The initial convolutional layer adjusts the number of channels in the input image to the base number of channels for the convolutional features. Each convolutional block in the multi-level downsampling convolutional blocks contains multiple convolutional layers, and combined with temporal step embedding, progressively reduces the image resolution and extracts deeper features. The first downsampling layer increases the number of channels to 128, the second to 256, the third to 512, and the final layer maintains 512 channels. In the final stage of the discriminator, the feature standard deviation module enhances the perception of global features by calculating the standard deviation of the feature map. This module improves the discriminator's discriminative ability by concatenating the standard deviation features into the final feature map.

[0141] Step S408: The discriminator's loss function consists of the following two parts: the first is the discriminator loss.

[0142]

[0143] in, Represents the variable y t Expected value Represents the variable y t-1 The expected value of y. t-1 This represents the texture denoising result at the current time step, y. t x represents the image before texture denoising. t-1 The first represents the structural denoising result, and the second is the triplet loss:

[0144]

[0145] in, Let M represent the combination of the artifact-free region before denoising and the artifact region after denoising, where M represents the artifact region, α is a boundary parameter used to control the convergence of the loss, and ||·||2 represents the L2 norm. The total loss of the discriminator is:

[0146] L total =L dis +λ tri L tri ;

[0147] Where, λ tri Set to 1 to balance the two types of loss.

[0148] Step S409: In this embodiment, an adaptive resampling strategy is used during training to adjust the semantic relevance between the original image and the structure based on the score provided by the discriminator mentioned in step S407. Specifically, when y t-1 and x t-1When the semantic relevance score between them is lower than a certain threshold Δ, firstly by analyzing x... t-1 Add noise generation Then through the Denoising was performed to obtain the updated result. Then use To guide the texture denoising network in generating updated versions Subsequently, the assessment and To determine the semantic correlation between the original image and the structure, repeat the above steps to adaptively adjust the semantic correlation between them.

[0149] Step S410: To reduce the computational cost of the diffusion model during inference, this embodiment employs a progressive distillation method during training, aiming to accelerate the inference process by reducing the denoising steps of the diffusion model. The specific implementation is as follows:

[0150] First, the denoising network f, trained through T denoising steps... θ (x t |t,c,m) serves as the teacher model, upon which a student model is constructed. The student model inherits the parameters from the teacher model as initialization, but its denoising steps are reduced to... This reduces the computational cost of each inference to half. By minimizing the distillation loss function, the output of the student model is made consistent with the output of the teacher model at time step t-1, thus completing the distillation training. The training objective function is as follows:

[0151]

[0152] Where, x t Let represent a noisy image with t time steps. This is achieved by performing n distillations on the student model (i.e., reducing the denoising step each time to the original value). This resulted in a student model with significantly improved computational efficiency.

[0153] Step S501: The diffusion model time step T = 400, and the input image size is 256×256. The maximum noise intensity in the denoising stage is set to 30, and the minimum noise intensity is set to 0.005 to ensure that the noise is almost completely eliminated in the later stage of the diffusion process, thereby generating a high-quality image.

[0154] Step S502: In the inference phase, the inference environment needs to be initialized first. This involves configuring the distributed training framework, determining the computing devices required for inference, and setting up GPU devices to ensure operational efficiency. The inference configuration file is read to obtain model parameters, file paths, and dataset settings. Subsequently, a path is created to save the results, ensuring that the results generated by inference can be stored according to the preset path.

[0155] Step S503, loading the test dataset and the labeled image dataset of artifacts, is a crucial step in the inference phase. The test dataset is used to evaluate the model's performance in real-world artifact removal tasks, while the labeled image dataset of artifacts provides annotation information for artifact regions in the images. The dataset loading mechanism supports the cyclic loading of labeled artifact images, ensuring a stable supply of labeled artifact image data during the inference phase.

[0156] Step S504: The inference stage requires loading pre-trained models, specifically the texture denoising model, the structure denoising model, and the discriminator model. The weights of the pre-trained texture denoising and structure denoising models are loaded to ensure the models have the ability to recover texture details and structural information. The discriminator D is loaded to evaluate the semantic consistency of the generated results.

[0157] Step S505: During the inference phase, only the artifact region is subjected to backdiffusion resampling, while the artifact-free region retains its original texture, thereby maximizing detail preservation and improving restoration efficiency. The input consists of the artifact image to be restored and the corresponding artifact marker image. Finally, the artifact-free region at time t-1 is... Artifact regions of the denoised resampling results at time t The sum is the result of the repair at time t-1. t-1 Where m is a binary image containing only 0 and 1, 1 represents that the current pixel is an artifact region, which is the region that needs to be repaired, and 0 represents that the current pixel is an artifact-free region, which is the region of normal anatomical structure that needs to be preserved, and ⊙ is a pixel-level multiplication operator.

[0158] Step S506, as follows Figure 3 As shown, the inference stage first adds noise to the input image y to obtain the noise-added result at time t-1. Next, the noise-added results will be analyzed. The calculation yields the diffusion-free artifact-free region at time t-1. Next, the restoration results at time t will be presented. Record Denoising is performed under the guidance of the denoising results from the structure-guided network. The denoised resampling results from t to t-1 The artifact region of the denoised resampling result at time t is obtained. Finally, this result is added to the previously obtained result of the diffusion-free artifact region at time t to obtain the restoration result at time t-1.

[0159] Step S507: Finally, save the repaired image result to the specified file path.

[0160] In a specific embodiment, MRI scans of 100 patients are axially sectioned to obtain 2D slices. Only the middle 30 slices from each MRI are taken, resulting in a total of 3000 MRI slices as the training set. MRI scans of two more patients are then axially sectioned after adding motion artifacts, again with 30 slices taken from each, for a total of 60 slices as the test set.

[0161] The image artifact restoration model based on the counterfactual diffusion model constructed in step S3 is used for training to obtain a pre-trained model.

[0162] The pre-trained model is used to infer motion artifacts on a test set with motion artifacts using the steps in S5, resulting in artifact-free data after repair. Figure 4 The diagram shows a comparison of artifact repair before and after the present invention. The left side is an MRI with motion artifacts, and the right side is the repaired image. This embodiment can effectively improve image quality through artifact repair. After artifact repair, key details in the image are preserved, which enhances the accuracy of doctors' judgment of diseases and enables patients to detect their condition in a timely manner, providing strong support for the diagnosis and treatment of patients' diseases.

[0163] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the apparatus disclosed in the embodiments, since they correspond to the methods disclosed in the embodiments, the description is relatively simple; relevant parts can be referred to the method section.

[0164] The above description of the disclosed embodiments enables those skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An image artifact restoration method based on a counterfactual diffusion model, characterized in that, include: Acquire multimodal data and perform data filtering on the multimodal data; The filtered data is preprocessed to construct a standardized dataset; An image artifact restoration model is constructed based on a counterfactual diffusion model. The image artifact restoration model was trained using a standardized dataset. The process of training the image artifact restoration model using a standardized dataset includes: learning the anatomical structure and contextual information of artifact-free images by training only on artifact-free images; The trained image artifact restoration model is used to restore images contaminated with artifacts. The artifact restoration process using a trained image artifact restoration model includes: performing denoising and resampling in the artifact region to preserve the original texture and anatomical structure of the artifact-free region.

2. The image artifact repair method based on a counterfactual diffusion model according to claim 1, characterized in that, The data filtering process includes: using image annotation to ensure that the filtered image data is artifact-free.

3. The image artifact repair method based on a counterfactual diffusion model according to claim 1, characterized in that, The preprocessing of the filtered data includes: Data normalization is performed on data with different dimensions, modalities, and intensity values. The 3D image is split along three axes, and images with extreme aspect ratios are discarded; The distribution of input data is standardized through normalization and resizing operations.

4. The image artifact repair method based on a counterfactual diffusion model according to claim 1, characterized in that, The image artifact restoration model includes a texture denoising network, a structure denoising network, and a discriminator network. It gradually adds noise to the original image through a forward diffusion process and gradually restores the image through a reverse denoising process. The texture denoising network adopts a UNet-based denoising network to restore the image by adding and removing noise from the input image. The structure denoising network is used to guide the texture denoising process, and the discriminator network is used to measure the semantic relevance between the structure denoising result and the texture denoising result.

5. The image artifact repair method based on a counterfactual diffusion model according to claim 1, characterized in that, The method of learning the anatomical structure and contextual information of artifact-free images by training only artifact-free images includes: inputting the selected artifact-free images into the training phase to train the texture denoising network, the structural denoising network, and the discriminator network; training the texture denoising network and the structural denoising network separately through a joint optimization strategy; gradually injecting Gaussian noise into the artifact-free images during the forward process of the texture denoising network; and using the denoising results of the structural denoising network to guide the texture denoising network to predict noise and gradually reconstruct the image during the reverse process. The discriminator network acts as a supervision mechanism to optimize the semantic consistency between the two.

6. The image artifact repair method based on a counterfactual diffusion model according to claim 5, characterized in that, The training process of the discriminator network is as follows: The results of the structure denoising network are calculated by training the discriminator network D. Results of texture denoising network Semantic relevance scoring between them, while using discriminator loss. and triplet loss To optimize the discriminator network; An adaptive resampling strategy is used during training, when and The semantic relevance score between them is lower than a certain threshold At that time, firstly through the Add noise generation Then through the Denoising was performed to obtain the updated result. Then utilize To guide the texture denoising network in generating updated versions Subsequently, an assessment was conducted. and To determine the semantic correlation between the original image and the structure, repeat the above steps to adaptively adjust the semantic correlation between them.

7. The image artifact repair method based on a counterfactual diffusion model according to claim 1, characterized in that, The artifact repair process includes: The counterfactual probability is determined based on the objective of artifact restoration, and the counterfactual probability is re-determined based on the fact that the impact of the artifact is limited to a specific area while other areas remain unchanged. During the inference phase, only the artifact regions are subjected to backdiffusion resampling, while the artifact-free regions retain their original texture. The input is the artifact image to be repaired and the corresponding artifact label image. The sum of the artifact-free regions at time t-1 and the artifact regions from the denoising resampling result at time t is used as the repair result at time t-1. The repair result at time t-1 is used as the input for the next denoising step, i.e., from time t-1 to time t-2. The repair result obtained at each time step is used as the input for the next time step.

8. An image artifact restoration system based on a counterfactual diffusion model, characterized in that, include: Data acquisition module: used to acquire multimodal data and perform data filtering on the multimodal data; Preprocessing module: Used to preprocess the filtered data and build a standardized dataset; Model building module: used to build image artifact restoration models based on counterfactual diffusion models; Model training module: used to train the image artifact restoration model using a standardized dataset; The process of training the image artifact restoration model using a standardized dataset includes: learning the anatomical structure and contextual information of artifact-free images by training only on artifact-free images; Artifact Repair Module: Used to repair artifacts in image images contaminated by artifacts using a trained image artifact repair model; the artifact repair of image images contaminated by artifacts using the trained image artifact repair model includes: performing denoising resampling in artifact regions to preserve the original texture and anatomical structure of artifact-free regions.