Metal artifact removal and soft tissue reconstruction of medical images and model training method
By combining a sinusoidal graph-aware self-attention module and a radial artifact removal model with generative adversarial network training, the problem of rapid removal of metal artifacts and soft tissue reconstruction in medical imaging was solved, improving image quality and diagnostic reliability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- THE FIRST AFFILIATED HOSPITAL OF WENZHOU MEDICAL UNIV
- Filing Date
- 2026-03-11
- Publication Date
- 2026-06-12
Smart Images

Figure CN122199745A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of medical imaging, and relates to, but is not limited to, a method for removing metal artifacts and reconstructing soft tissues in medical images, as well as for model training. Background Technology
[0002] In the field of medical imaging, especially in computed tomography (CT) scans, metal artifacts are image distortions caused by excessive attenuation of X-rays and photon scattering due to high-density implants (such as metal clips in brain aneurysm surgery). These artifacts manifest as streaks or distortions, severely degrading image quality, affecting the visibility of key neuroanatomical structures, and leading to diagnostic uncertainty and the risk of misdiagnosis.
[0003] In related technologies, filtering techniques are used to suppress artifacts, combined with iterative methods in the sinusoidal domain, and projection data is used for image reconstruction. However, while these methods can suppress artifacts, they lead to blurred details, loss of fine soft tissue structures, high computational complexity, sensitivity to metal geometry and location, poor generalization ability, and difficulty in adapting to diverse clinical scenarios.
[0004] Therefore, how to quickly and accurately remove artifacts from images and adapt to diverse clinical scenarios has become an urgent problem to be solved. Summary of the Invention
[0005] In view of this, embodiments of the present invention provide a method for removing metal artifacts and reconstructing soft tissue in medical images, which at least solves the problem that related technologies cannot quickly and accurately remove artifacts in images and cannot adapt to diverse clinical scenarios.
[0006] According to a first aspect of the present invention, a method for removing metal artifacts and reconstructing soft tissue in medical images is provided, comprising: The first medical image carrying metal artifacts is input into the initial artifact removal model for feature extraction to obtain the first feature map; and the first feature map is input into the sinusoidal graph perception self-attention module of the initial artifact removal model to obtain the target feature map; the sinusoidal graph perception self-attention module includes a feature filtering layer, a frequency domain transformation layer, a feature enhancement layer, a spatial domain transformation layer, a Radon transform layer, and a fusion layer; A second medical image is obtained based on the target feature map, and the second medical image is input into the radial artifact removal model to obtain a target medical image without metal artifacts. The radial artifact removal model includes the sine wave perception self-attention module, and the second medical image is the image after initial artifact removal and soft tissue reconstruction.
[0007] According to a second aspect of the present invention, a method for training a model for removing metal artifacts and reconstructing soft tissue in medical images is provided, comprising: Random noise is input into the first generator to be trained to obtain the first metal artifact image sample in the sinusoidal domain. The first metal artifact image sample and the first medical image sample without metal artifacts are fused to obtain a second medical image sample with metal artifacts. The second medical image sample is then input into the second generator to be trained to obtain an initial third medical image sample without metal artifacts. The initial artifact removal model to be trained consists of the second generator and the second discriminator. The third medical image sample is the image sample after initial artifact removal and soft tissue reconstruction. The robust loss and perceptual loss are calculated based on the first medical image sample, the third medical image sample, and the second discriminator to be trained; The third medical image sample is noise-added to obtain the fourth medical image sample, and the fourth medical image sample is input into the radial artifact removal model to be trained to obtain the fifth medical image sample without metal artifacts. The mean squared error loss, total variation loss, and information divergence are obtained from the third and fifth medical image samples. The trained radial artifact removal model and the initial artifact removal model are obtained based on the anti-loss, the perceptual loss, the mean squared error loss, the total variation loss, and the information divergence.
[0008] According to a third aspect of the present invention, a computer storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements the methods described in the first and second aspects.
[0009] According to the scheme provided in the embodiments of the present invention, a first medical image carrying metal artifacts is input into an initial artifact removal model for feature extraction to obtain a first feature map; the first feature map is then input into a sinusoidal graph-aware self-attention module of the initial artifact removal model to obtain a target feature map; the sinusoidal graph-aware self-attention module includes a feature filtering layer, a frequency domain transformation layer, a feature enhancement layer, a spatial domain transformation layer, a Radon transform layer, and a fusion layer; a second medical image is obtained based on the target feature map, and the second medical image is input into a radial artifact removal model to obtain a target medical image without metal artifacts, wherein the radial artifact removal model includes the sinusoidal graph-aware self-attention module, and the second medical image is the image after initial artifact removal and soft tissue reconstruction. In this process, the sinusoidal graph-aware self-attention module converts the feature map into a sinusoidal graph domain and combines it with a Radon transform to capture the characteristic that metal artifacts appear as high-intensity stripes in the sinusoidal graph domain, thereby more accurately locating and suppressing artifacts while preserving soft tissue details. The feature selection layer focuses on central features to reduce the impact of boundary noise. This design addresses the characteristics of metal artifacts, enhancing the model's attention to artifact regions while avoiding overprocessing of irrelevant areas. The frequency domain transformation layer, feature enhancement layer, spatial domain transformation layer, and Radon transform layer transform the input image to the frequency domain, extract features, suppress high-frequency noise related to metal artifacts, and preserve low-frequency anatomical information. Furthermore, the layer extracts projection slices from the frequency domain and generates sine waves using a 1D inverse fast Fourier transform, reducing computational complexity and significantly improving processing efficiency. Attached Figure Description
[0010] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort, wherein: Figure 1 A schematic flowchart illustrating a method for removing metal artifacts and reconstructing soft tissue in medical images, provided in an embodiment of the present invention; Figure 2 This is a flowchart illustrating a sinusoidal graph perception self-attention module provided in an embodiment of the present invention. Figure 3 This is a schematic diagram of the structure of an electronic device according to an embodiment of the present invention. Detailed Implementation
[0011] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. The following embodiments are used to illustrate the present invention, but are not intended to limit the scope of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0012] In the following description, references are made to “some embodiments,” which describe a subset of all possible embodiments. However, it is understood that “some embodiments” may be the same subset or different subsets of all possible embodiments and may be combined with each other without conflict.
[0013] It should be noted that the terms "first, second, and third" used in the embodiments of the present invention are only used to distinguish similar objects and do not represent a specific ordering of objects. It is understood that "first, second, and third" can be interchanged in a specific order or sequence where permitted, so that the embodiments of the present invention described herein can be implemented in an order other than that illustrated or described herein.
[0014] It will be understood by those skilled in the art that, unless otherwise defined, all terms used herein (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art to which these embodiments of the invention pertain. It should also be understood that terms such as those defined in general dictionaries should be understood to have the same meaning as in the context of the prior art and should not be interpreted in an idealized or overly formal sense unless specifically defined as herein.
[0015] Figure 1 This is a flowchart illustrating a method for removing metal artifacts and reconstructing soft tissue in medical images according to an embodiment of the present invention. The method for removing metal artifacts and reconstructing soft tissue in medical images according to an embodiment of the present invention can be executed by an electronic device, such as a computer or server.
[0016] like Figure 1 As shown, methods for removing metal artifacts and reconstructing soft tissue in medical images include: S101. Input the first medical image carrying metal artifacts into the initial artifact removal model for feature extraction to obtain the first feature map; and input the first feature map into the sinusoidal graph perception self-attention module of the initial artifact removal model to obtain the target feature map; the sinusoidal graph perception self-attention module includes a feature selection layer, a frequency domain transformation layer, a feature enhancement layer, a spatial domain transformation layer, a Radon transform layer, and a fusion layer.
[0017] In embodiments of the present invention, the initial artifact removal model further includes a convolutional residual module, which comprises a convolutional layer, a batch normalization layer, and a leaky ReLU layer. The medical image carrying metal artifacts is input into the convolutional residual module for feature extraction to obtain a first feature map. The sine wave perception self-attention module includes a feature selection layer, a frequency domain transformation layer, a feature enhancement layer, a spatial domain transformation layer, a Radon transform layer, and a fusion layer. Specifically, the feature selection layer is mainly used for feature selection; the frequency domain transformation layer is mainly used to convert features in the spatial domain to those in the frequency domain; the feature enhancement layer is mainly used for feature extraction and enhancement; the spatial domain transformation layer is mainly used to convert features in the frequency domain to those in the spatial domain; the Radon transform layer is mainly used to integrate features along different angles to convert them into a set of projected data; and the fusion layer is mainly used for feature fusion. After the convolutional residual module outputs the first feature map, the first feature map is linearly transformed and then passed through the feature selection layer, frequency domain transformation layer, feature enhancement layer, spatial domain transformation layer, Radon transform layer, and fusion layer to obtain the target feature map.
[0018] The initial artifact removal model primarily removes metallic artifacts, motion artifacts, and some volumetric effect artifacts from the first medical image, while also restoring soft tissue in the first medical image.
[0019] The initial artifact removal model can be an image translation model based on conditional generative adversarial networks, focusing on fine artifact removal and soft tissue restoration. It mainly consists of a generator and a discriminator. The generator adopts a U-Net architecture, where the encoder compresses the input image to a latent representation, and the decoder recovers a high-fidelity output through skip connections, ensuring the integrity of anatomical structures. The discriminator uses a local discriminator strategy to evaluate realism at the image patch level, enhancing sensitivity to subtle artifacts and anatomical features.
[0020] Specifically, the process of obtaining the target feature map using the initial artifact removal model is as follows: the first medical image carrying metal artifacts is first processed by the convolutional residual module (3 modules) to extract features and obtain the first feature map. The first feature map is then input into the sine wave perception self-attention module to obtain the target feature map.
[0021] S102. Obtain a second medical image based on the target feature map, and input the second medical image into the radial artifact removal model to obtain a target image without metal artifacts. The radial artifact removal model includes a sinusoidal graph-aware self-attention module, and the second medical image is the image after initial artifact removal and soft tissue reconstruction.
[0022] In embodiments of the present invention, the initial artifact removal model further includes convolutional layers, upsampling layers, and hyperbolic tangent function layers. The target feature map then passes through multiple sinusoidal graph-aware self-attention modules, convolutional layers, upsampling layers, and hyperbolic tangent function layers to finally obtain a second medical image (the second medical image is the image after initial artifact removal and soft tissue reconstruction). The radial artifact removal model also includes a sinusoidal graph-aware self-attention module, and its working principle is the same as that of the initial artifact removal model. In addition, the radial artifact removal model also includes downsampling layers, upsampling layers, and convolutional layers. After the second medical image passes through the sinusoidal graph-aware self-attention module, downsampling layers, upsampling layers, and convolutional layers, a target medical image without metal artifacts is obtained.
[0023] Radial artifacts refer to linear or stripe-like abnormal patterns radiating outwards from the center in medical images. The architecture of radial artifact removal models is mainly based on the U-Net framework. The encoder extracts high-level features stepwise through convolutional layers, and the decoder restores the original resolution through upsampling and skip connections.
[0024] Specifically, the target feature map is passed through a convolutional residual module, a sine wave perception self-attention module, a convolutional residual module, a sine wave perception self-attention module, and a convolutional residual module again to obtain a third feature map. The third feature map is then passed through a convolutional layer, a sine wave perception self-attention module, an upsampling layer, a sine wave perception self-attention module, an upsampling layer, a sine wave perception self-attention module, an upsampling layer, an upsampling layer, and a hyperbolic tangent function layer to obtain a second medical image after initial artifact removal and soft tissue reconstruction. The local discriminator interprets this image during the subsequent training phase.
[0025] Specifically, the process of obtaining a target image free of metal artifacts using the radial artifact removal model is as follows: The second medical image is first processed to generate a fourth feature map X0-0. Then, the fourth feature map is downsampled based on the downsampling layer in the first-stage pooling to reduce the spatial dimension, generating a fifth feature map. The fifth feature map is input into the sinusoidal graph perception self-attention module 1 to obtain a sixth feature map X1-0. The sixth feature map X1-0 is then processed through the downsampling layer in the second-stage pooling, the sinusoidal graph perception self-attention module 2, the downsampling layer in the third-stage pooling, the feature enhancement and fusion module 3, the downsampling layer in the fourth-stage pooling, and the sinusoidal graph perception self-attention module 4. The feature map output by the sinusoidal graph perception self-attention module 4 is then gradually restored to its original size through a series of upsampling layer operations. After each upsampling, it is fused with the feature maps from the previous stage (such as X3-0, X2-0, X1-0, X0-0) (through addition operations) to retain detailed information at different levels. Each fused feature map is then further processed and adjusted through the corresponding convolutional layer. For example, the feature map output by the sinusoidal graph perception self-attention module 4 is upsampled to obtain feature map X3-1. Feature map X3-1 and feature map X3-0 are fused and then input into the convolutional layer. The feature map output by the convolutional layer is upsampled to obtain feature map X2-1. Feature map X2-1, feature map X2-0 and feature map X2-2 obtained by the upsampled layer are fused and input into the convolutional layer again. This process is repeated until a target medical image without metal artifacts is obtained.
[0026] Understandably, in the embodiments of the present invention, a first medical image carrying metal artifacts is input into an initial artifact removal model for feature extraction to obtain a first feature map; the first feature map is then input into a sinusoidal graph-aware self-attention module of the initial artifact removal model to obtain a target feature map; the sinusoidal graph-aware self-attention module includes a feature filtering layer, a frequency domain transformation layer, a feature enhancement layer, a spatial domain transformation layer, a Radon transform layer, and a fusion layer; a second medical image is obtained based on the target feature map, and the second medical image is input into a radial artifact removal model to obtain a target medical image without metal artifacts, wherein the radial artifact removal model includes the sinusoidal graph-aware self-attention module, and the second medical image is the image after initial artifact removal and soft tissue reconstruction. In this process, the sinusoidal graph-aware self-attention module converts the feature map into a sinusoidal graph domain and, combined with a Radon transform, captures the characteristic that metal artifacts appear as high-intensity stripes in the sinusoidal graph domain, thereby more accurately locating and suppressing artifacts while preserving soft tissue details. The feature filtering layer focuses on central features to reduce the influence of boundary noise. This design addresses the characteristics of metal artifacts, enhancing the model's focus on artifact regions while avoiding overprocessing of irrelevant areas. The frequency domain transformation layer, feature enhancement layer, spatial domain transformation layer, and Radon transform layer transform the input image to the frequency domain, extracting features, suppressing high-frequency noise associated with metal artifacts, preserving low-frequency anatomical information, and extracting projected slices from the frequency domain and generating sine waves using a 1D inverse fast Fourier transform, reducing computational complexity and significantly improving processing efficiency.
[0027] In some embodiments of the present invention, the input of the first feature map into the sinusoidal graph-aware self-attention module of the initial artifact removal model in S101 to obtain the target feature map can be achieved through S1011 to S1013, as described in the following steps.
[0028] S1011. Input the first feature map into the sparse convolutional layer to obtain the second feature map, and perform a linear transformation on the second feature map to obtain the first query vector, the first key vector, and the first value vector.
[0029] S1012. Divide the first query vector and the first key vector into blocks to obtain multiple block-shaped second query vectors and second key vectors. Input the second query vectors and second key vectors into the feature filtering layer to obtain multiple circular third query vectors and third value vectors.
[0030] S1013. Input the third query vector and the third key vector into the frequency domain transformation layer to obtain the fourth query vector and the fourth key vector in the frequency domain; and obtain the target feature map based on the fourth query vector, the fourth key vector, the feature enhancement layer, the spatial domain transformation layer, the Radon transform layer and the fusion layer.
[0031] In some embodiments of the present invention, a first feature map is input into a sparse convolutional layer for feature extraction to obtain a second feature map. The second feature map is then linearly transformed to obtain a first query vector, a first key vector, and a first value vector. The first query vector is divided into blocks (up to 8 blocks) to obtain multiple block-shaped second query vectors. Simultaneously, the first key vector is also divided into blocks (up to 8 blocks) to obtain multiple block-shaped second key vectors. These multiple block-shaped second key vectors and second query vectors are input into a feature filtering layer to obtain multiple circular third query vectors and third key vectors, achieving the purpose of feature filtering. Further, these multiple circular third query vectors and third key vectors are input into a frequency domain transformation layer to obtain a fourth query vector and a fourth key vector transformed from the spatial domain to the frequency domain. Finally, the target feature map is obtained based on the fourth query vector, the fourth key vector, the feature enhancement layer, the spatial domain transformation layer, the Radon transform layer, and the fusion layer.
[0032] The sinusoidal graph-aware self-attention module also includes linear transformation and block layers for performing S1011 to S1012.
[0033] In some embodiments of the present invention, S1013 can be implemented by S201 to S204, as described in the following steps.
[0034] S201. Input the fourth query vector and the fourth key vector into the feature enhancement layer for feature extraction and enhancement to obtain the first feature and the second feature. Then, input the first feature and the second feature into the spatial domain transformation layer to obtain the fifth query vector and the fifth key vector in the spatial domain.
[0035] In some embodiments of the present invention, the fourth query vector and the fourth key vector are respectively input into the feature enhancement layer for feature extraction. Specifically, feature extraction can be performed through a convolutional neural network containing a 3×3 convolutional layer, a modified linear activation function, and a 1×1 convolutional layer. The extracted features can be further processed by a frequency attention extractor to obtain the first feature and the second feature. Then, the first feature and the second feature are respectively input into the spatial domain transformation layer and subjected to a two-dimensional inverse fast Fourier transform to obtain the fifth query vector and the fifth key vector in the spatial domain.
[0036] S202. Input the fifth query vector and the fifth key vector into the Radon transform layer respectively, and perform a two-dimensional fast Fourier transform to obtain the sixth query vector and the sixth key vector in the frequency domain.
[0037] S203. Extract frequency slices from the sixth query vector and the sixth key vector from different projection angles to obtain the seventh query vector and the seventh key vector in one-dimensional projection form. Perform one-dimensional inverse fast Fourier transform on the seventh query vector and the seventh key vector to obtain the eighth query vector and the eighth key vector in the sinusoidal domain.
[0038] S204. Obtain the target feature map based on the eighth query vector, the eighth key vector, and the fusion layer.
[0039] In some embodiments of the present invention, the fifth query vector and the fifth key vector are input into a Radon transform layer for a two-dimensional fast Fourier transform to obtain a sixth query vector and a sixth key vector in the frequency domain. Then, a one-dimensional projection (along different angles) is performed on these frequency domain sixth query vectors and sixth key vectors to extract frequency slice information, resulting in a seventh query vector and a seventh key vector in one-dimensional projection form. These are further subjected to a one-dimensional inverse fast Fourier transform to obtain an eighth query vector and an eighth key vector in the sine wave domain. Finally, the target feature map is obtained based on the eighth query vector, the eighth key vector, and the fusion layer.
[0040] In some embodiments of the present invention, S204 can be implemented by S2041 to S2043, as described in the following steps.
[0041] S2041. Input the eighth query vector into the fusion layer for fusion to obtain the target query vector, and input the eighth key vector into the fusion layer for fusion to obtain the target key vector.
[0042] In some embodiments of the present invention, multiple circular eighth query vectors are input into a fusion layer for fusion to obtain the final target query vector. Simultaneously, multiple circular eighth key vectors are input into a fusion layer for fusion to obtain the target key vector.
[0043] S2042. Perform dot product, scaling, and normalization on the target query vector and target key vector to obtain the attention weights.
[0044] S2043. Reshape the attention weights to obtain the reshaped attention weights, and obtain the target feature map based on the reshaped attention weights and the first value vector.
[0045] In some embodiments of the present invention, similarity is calculated between the target query vector and the target key vector in the form of a dot product. In order to prevent the dot product result from being too large and causing the softmax gradient to disappear, a scaling process is performed. The scaled score is then converted into a probability distribution, i.e., attention weights, by applying the softmax function. The attention weights are then reshaped to change the dimension. Finally, the target feature map is obtained based on the reshaped attention weights and the first value vector.
[0046] like Figure 2 As shown, Figure 2 This is a flowchart illustrating a sinusoidal graph-sensing self-attention module provided in an embodiment of the present invention. Figure 2In this process, the first feature map is input into a sparse convolutional layer to obtain an intermediate feature map. This intermediate feature map is then input into another sparse convolutional layer to obtain a second feature map. The second feature map undergoes a linear transformation to obtain a first query vector, a first value vector, and a first key vector. The first query vector and the first key vector are then divided into blocks to obtain eight blocks of second query vectors (8 * query blocks) and eight blocks of second key vectors (8 * key blocks). The second query vectors and the second key vectors undergo multi-layer processing. Specifically, the second query vectors and the second key vectors are used as input blocks and input into a feature filtering layer for filtering, resulting in multiple circular third query vectors and third key vectors. These circular third query vectors and third key vectors are then input into a frequency-enhancing layer. The Radon transform layer (this frequency-enhanced Radon transform layer is a composite layer consisting of a frequency domain transformation layer, a feature enhancement layer, a spatial domain transformation layer, a Radon transform layer, and a fusion layer) performs a series of processes to obtain the query vector and key vector in the sinusoidal graph domain. Specifically, the circular third query vector and third key vector are input into the frequency domain transformation layer, feature enhancement layer, spatial domain transformation layer, Radon transform layer, and fusion layer for processing to obtain the query vector and key vector in the sinusoidal graph domain (blue and green circles in the figure). Dot product, scaling, normalization, and deformation processing are then performed to obtain the reshaped attention weights. The reshaped attention weights and the first value vector are weighted and summed, and then passed through a convolutional layer to obtain the target feature map.
[0047] The Singram-Aware Self-Attention (SASS) module enhances metal artifact suppression and soft tissue restoration in medical imaging, primarily by processing data in the sinogram domain. SASS transforms the feature map into the sinogram domain and utilizes the Radon transform formula to capture the characteristic of metal artifacts appearing as high-intensity stripes in the sinogram domain, thus more accurately locating and suppressing artifacts while preserving soft tissue details. The SASS module reduces complexity by dividing the first query vector and the first key vector into 8*8 patches and computing attention in parallel on each patch. .in, For image height, For image width, The number of image segments. This is used to describe the growth order of the function. Furthermore, the SASS module extracts projected slices from the frequency domain and generates a sine wave using a 1D inverse fast Fourier transform, further reducing complexity and significantly improving computational efficiency. The SASS module transforms block-shaped query and key vectors into circular ones, focusing on central features and reducing the impact of boundary noise. This design addresses the characteristics of metal artifacts, enhancing the model's focus on artifact regions while avoiding overprocessing of irrelevant areas.
[0048] In an embodiment of the present invention, a method for training a model for removing metal artifacts and reconstructing soft tissue in medical images is proposed, which is implemented through the following steps.
[0049] S301. Input random noise into the first generator to be trained to obtain the first metal artifact image sample in the sinusoidal domain.
[0050] S302. The first metal artifact image sample and the first medical image sample without metal artifacts are fused to obtain a second medical image sample with metal artifacts. The second medical image sample is then input into the second generator to be trained to obtain an initial third medical image sample without metal artifacts. The initial artifact removal model to be trained consists of the second generator and the second discriminator. The third medical image sample is the image sample after initial artifact removal and soft tissue reconstruction.
[0051] S303. Calculate the anti-loss and perceptual loss based on the first medical image sample, the third medical image sample, and the second discriminator to be trained.
[0052] In some embodiments of the present invention, the initial artifact removal model to be trained consists of a second generator and a second discriminator, specifically a second generator and a second local discriminator. Random noise is input into the first generator to be trained to generate a first metal artifact image sample in the sinusoidal domain. The first medical image sample without metal artifacts is a clean medical image sample. After fusing the generated artifacts, i.e., the first metal artifact image sample and the first medical image sample, a second medical image sample with metal artifacts is obtained. The second medical image sample is further input into the second generator for artifact removal to obtain an initial third medical image sample without metal artifacts. Then, the third medical image sample and the first medical image sample are input into the second discriminator to calculate the adversarial loss, and the perceptual loss is obtained by inputting the first medical image sample and the third medical image sample into a trained deep convolutional neural network.
[0053] Specifically, the first and third medical image samples are input into the second discriminator. After passing through two convolutional layers, a batch normalization layer, another convolutional layer, another batch normalization layer, a Leaky ReLU activation function layer, another convolutional layer, and a sigmoid function, a probability value between 0 and 1 is generated. Further, robust loss and perceptual loss are calculated based on this probability value. The robust loss and perceptual loss are then used to train the second generator and the second discriminator, resulting in the initial trained artifact removal model. The second discriminator is a local discriminator; instead of judging the entire image as real or fake, it evaluates local regions of the image, which helps generate images with richer details.
[0054] S304. Add noise to the third medical image sample to obtain the fourth medical image sample, and input the fourth medical image sample into the radial artifact removal model to be trained to obtain the fifth medical image sample without metal artifacts.
[0055] S305. The mean squared error loss, total variation loss, and information divergence are obtained from the third and fifth medical image samples.
[0056] S306. The radial artifact removal model and the initial artifact removal model are obtained after training based on adversarial loss, perceptual loss, mean squared error loss, total variation loss and information divergence.
[0057] In an embodiment of the present invention, noise is first added to the third medical image sample to obtain a fourth medical image sample carrying noise. The fourth medical image sample is then input into the radial artifact removal model to be trained to obtain a fifth medical image sample without metal artifacts. The mean squared error loss and information divergence are obtained through the fifth medical image sample without metal artifacts and the third medical image sample. At the same time, the total variation loss is calculated through the fifth medical image sample. Finally, the trained radial artifact removal model and the initial artifact removal model are obtained based on the adversarial loss, perceptual loss, mean squared error loss, total variation loss and information divergence.
[0058] In an embodiment of the invention, when adding noise to a third medical image sample, a novel approach combines striped radial noise and Gaussian noise to simulate the characteristics of radial artifacts. Noise is added progressively over T=1000 time steps, prioritizing radial noise in the early stages and transitioning to Gaussian noise in the later stages, thus enhancing the model's ability to learn artifact-specific patterns. The training process involves: the forward pass adding noise to the third medical image sample x_0 using a predefined variance scheduling β_t, generating a noisy state x_t; the backward pass predicts the noise distribution using U-Net, progressively denoising and restoring the image.
[0059] In some embodiments of the present invention, S307 can be implemented by S3071 to S3072, as described in the following steps.
[0060] S3071. Calculate the first total loss based on the mean squared error loss, total variation loss and information divergence, and use the first total loss to adjust the radial artifact removal model to be trained until the trained radial artifact removal model is obtained.
[0061] In some embodiments of the present invention, the first total loss is calculated using a first loss calculation formula, mean squared error loss, total variation loss, and information divergence. The first loss calculation formula is as follows: ; In the above formula, The first total loss, For mean square error loss, For total variation loss, The loss function is the information divergence. and These are the hyperparameters used during the training process.
[0062] Furthermore: ; In the above formula, For conditional probability distribution, Indicates time t The third medical image sample at time =0. Represents all possible conditions under a conditional probability distribution. Expected value For the diffusion process t The intermediate state after adding noise to the third medical image sample at any given time, i.e., the fourth medical image sample. This indicates the radial artifact removal model prediction. t The fifth medical image sample at that time.
[0063] Furthermore: ; In the above formula, The fifth medical image sample i row and number j Column pixel values, For the position located at the i line, number j The pixel values below the column pixels. For the position located at the i line, number j The pixel value to the right of the column pixel.
[0064] Furthermore: ; ; In the above formula, For information divergence, The loss function is the information divergence. and Conditional probability distribution and There are two probability distributions. Represents a condition variable, which depends on and .
[0065] Furthermore, after calculating the first total loss, the radial artifact removal model to be trained is adjusted using the first total loss until the trained radial artifact removal model is obtained.
[0066] S3072. Calculate the second total loss based on the adversarial loss and the perceptual loss, and use the second total loss to adjust the initial artifact removal model to be trained until the trained initial artifact removal model is obtained.
[0067] In some embodiments of the present invention, the second total loss is calculated using a second total loss formula, adversarial loss, and perception loss. The second total loss calculation formula is as follows: ; In the above formula, This is the second total loss. To combat the losses, In order to perceive loss, These are the hyperparameters used during the training process.
[0068] Furthermore: ; In the above formula, include and , This is the first medical image sample. This is a third medical image sample. As expected, For the second discriminator pair The discrimination probability. For the second discriminator pair The discrimination probability.
[0069] Furthermore: ; In the above formula, In a deep convolutional neural network, the first... k The number of elements in the layer feature map. For the first deep neural network k Layer feature extraction function.
[0070] Furthermore, after calculating the second total loss, the initial artifact removal model to be trained is adjusted using the second total loss until the trained initial artifact removal model is obtained.
[0071] In some embodiments of the present invention, S301 is followed by S401 to S402, which are described by the following steps.
[0072] S401. The collected image samples with metal artifacts are labeled to obtain the first image sample of the metal artifact region at the marked location; and the first image sample is transformed to obtain the second metal artifact image sample in the sinusoidal domain.
[0073] S402. Input the second metal artifact image sample and the first metal artifact image sample into the first discriminator to be trained to obtain the Earth movement distance loss, and use the Earth movement distance loss to train the first generator and the first discriminator to be trained until the trained first generator and the first discriminator are obtained.
[0074] In some embodiments of the present invention, the training of the first discriminator and the first generator specifically involves: random noise is first processed through a layer containing sparse convolution and a modified linear unit activation function (ReLU activation function), and then through a linear layer, a sequence of six identical modules (each module including a transposed convolution, a sinusoidal graph-aware self-attention module, and a LeakyReLU activation function), and a sparse convolution layer to generate a first metal artifact image sample in the sinusoidal graph domain. The acquired image sample with artifacts is labeled to obtain a first image sample of the metal artifact region at the marked location; the first image sample is then transformed to obtain a second metal artifact image sample in the sinusoidal graph domain; the second metal artifact image sample and the first metal artifact image sample are input into the first discriminator, processed through a layer containing a sparse convolution layer and a ReLU activation function, then through three identical modules (each module including a sparse convolution layer, a normalization layer, and a ReLU activation function) and a hyperbolic tangent activation function layer, and a score is output. The Earth's migration distance loss is calculated based on the score, and the first generator and the first discriminator are trained using the calculated Earth's migration distance loss. To address the instability issue in the training of the first discriminator and the first generator, a heterogeneous frequency update strategy is adopted: the first generator is frequently updated in the early stage to accelerate learning, and the update frequency of the first discriminator is increased in the later stage to provide strict feedback, maintain adversarial balance, and prevent training collapse.
[0075] Among them, the loss of Earth's distance traveled: ; In the above formula, This represents the probability distribution of the second metal artifact image sample. This represents the probability distribution of the first metal artifact image sample. Loss due to distance traveled on Earth This is the distance function.
[0076] Reference Figure 3 The diagram shows a structural schematic of an electronic device according to an embodiment of the present invention. The specific embodiments of the present invention do not limit the specific implementation of the electronic device.
[0077] like Figure 3 As shown, the electronic device may include: a processor 502, a communications interface 504, a memory 506, and a communications bus 508.
[0078] in: The processor 502, communication interface 504, and memory 506 communicate with each other via communication bus 508.
[0079] Communication interface 504 is used to communicate with other electronic devices or servers.
[0080] The processor 502 is used to execute program 510, specifically the relevant steps in the above method embodiments.
[0081] Specifically, program 510 may include program code that includes computer operation instructions.
[0082] Processor 502 may be a central processing unit (CPU), an application-specific integrated circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention. The smart device may include one or more processors of the same type, such as one or more CPUs; or it may include processors of different types, such as one or more CPUs and one or more ASICs.
[0083] Memory 506 is used to store program 510. Memory 506 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk storage device.
[0084] Specifically, program 510 can be used to cause processor 502 to perform the operations corresponding to the methods described in the above method embodiments.
[0085] The specific implementation of each step in program 510 can be found in the corresponding descriptions of the steps and units in the above method embodiments, and will not be repeated here. Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of the devices and modules described above can be referred to the corresponding process descriptions in the foregoing method embodiments, and will not be repeated here.
[0086] It should be noted that, depending on the implementation needs, the various components / steps described in the embodiments of the present invention can be broken down into more components / steps, or two or more components / steps or parts of the operation of components / steps can be combined into new components / steps to achieve the purpose of the embodiments of the present invention.
[0087] The methods described above according to embodiments of the present invention can be implemented in hardware, firmware, or as software or computer code that can be stored in a recording medium (such as a CD-ROM, RAM, floppy disk, hard disk, or magneto-optical disk), or as computer code originally stored on a remote recording medium or a non-transitory machine-readable medium and subsequently stored on a local recording medium, downloaded via a network. Thus, the methods described herein can be processed by software stored on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware (such as an ASIC or FPGA). It is understood that the computer, processor, microprocessor controller, or programmable hardware includes storage components (e.g., RAM, ROM, flash memory, etc.) capable of storing or receiving software or computer code, which, when accessed and executed by the computer, processor, or hardware, implements the methods described herein. Furthermore, when a general-purpose computer accesses code used to implement the methods shown herein, the execution of the code transforms the general-purpose computer into a dedicated computer for executing the methods shown herein.
[0088] Those skilled in the art will recognize that the units and method steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of the embodiments of the present invention.
[0089] The above embodiments are only used to illustrate the embodiments of the present invention, and are not intended to limit the embodiments of the present invention. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present invention. Therefore, all equivalent technical solutions also fall within the scope of the embodiments of the present invention, and the patent protection scope of the embodiments of the present invention should be defined by the claims.
Claims
1. A method for removing metal artifacts and reconstructing soft tissue in medical images, characterized in that, include: The first medical image carrying metal artifacts is input into the initial artifact removal model for feature extraction to obtain the first feature map. The first feature map is then input into the sinusoidal graph perception self-attention module of the initial artifact removal model to obtain the target feature map; the sinusoidal graph perception self-attention module includes a feature filtering layer, a frequency domain transformation layer, a feature enhancement layer, a spatial domain transformation layer, a Radon transform layer, and a fusion layer; A second medical image is obtained based on the target feature map, and the second medical image is input into the radial artifact removal model to obtain a target medical image without metal artifacts. The radial artifact removal model includes the sine wave perception self-attention module, and the second medical image is the image after initial artifact removal and soft tissue reconstruction.
2. The method according to claim 1, characterized in that, The sinusoidal graph-aware self-attention module also includes sparse convolutional layers; The step of inputting the first feature map into the sinusoidal graph-aware self-attention module of the initial artifact removal model to obtain the target feature map includes: The first feature map is input into the sparse convolutional layer to obtain the second feature map, and the second feature map is linearly transformed to obtain the first query vector, the first key vector, and the first value vector. The first query vector and the first key vector are divided into blocks to obtain multiple block-shaped second query vectors and second key vectors. The second query vectors and the second key vectors are then input into the feature filtering layer to obtain multiple circular third query vectors and third key vectors. The third query vector and the third key vector are input into the frequency domain transformation layer to obtain the fourth query vector and the fourth key vector in the frequency domain; and the target feature map is obtained based on the fourth query vector, the fourth key vector, the feature enhancement layer, the spatial domain transformation layer, the Radon transform layer and the fusion layer.
3. The method according to claim 2, characterized in that, The process of obtaining the target feature map based on the fourth query vector, the fourth key vector, the feature enhancement layer, the spatial domain transformation layer, the Radon transform layer, and the fusion layer includes: The fourth query vector and the fourth key vector are respectively input into the feature enhancement layer for feature extraction and enhancement to obtain the first feature and the second feature. The first feature and the second feature are respectively input into the spatial domain transformation layer to obtain the fifth query vector and the fifth key vector in the spatial domain. The fifth query vector and the fifth key vector are respectively input into the Radon transform layer and subjected to two-dimensional fast Fourier transform to obtain the sixth query vector and the sixth key vector in the frequency domain. Frequency slices are extracted from the sixth query vector and the sixth key vector from different projection angles to obtain the seventh query vector and the seventh key vector in one-dimensional projection form. Then, one-dimensional inverse fast Fourier transform is performed on the seventh query vector and the seventh key vector to obtain the eighth query vector and the eighth key vector in the sinusoidal domain. The target feature map is obtained based on the eighth query vector, the eighth key vector, and the fusion layer.
4. The method according to claim 3, characterized in that, The process of obtaining the target feature map based on the eighth query vector, the eighth key vector, and the fusion layer includes: The eighth query vector is input into the fusion layer for fusion to obtain the target query vector, and the eighth key vector is input into the fusion layer for fusion to obtain the target key vector; The target query vector and the target key vector are subjected to dot product, scaling, and normalization to obtain attention weights; The attention weights are reshaped to obtain reshaped attention weights, and the target feature map is obtained based on the reshaped attention weights and the first value vector.
5. A method for training a medical image model for removing metal artifacts and reconstructing soft tissue, characterized in that, include: Random noise is input into the first generator to be trained to obtain the first metal artifact image sample in the sinusoidal domain. The first metal artifact image sample and the first medical image sample without metal artifacts are fused to obtain a second medical image sample with metal artifacts. The second medical image sample is then input into the second generator to be trained to obtain an initial third medical image sample without metal artifacts. The initial artifact removal model to be trained consists of a second generator and a second discriminator; the third medical image sample is an image sample after initial artifact removal and soft tissue reconstruction. The robust loss and perceptual loss are calculated based on the first medical image sample, the third medical image sample, and the second discriminator to be trained; The third medical image sample is noise-added to obtain the fourth medical image sample, and the fourth medical image sample is input into the radial artifact removal model to be trained to obtain the fifth medical image sample without metal artifacts. The mean squared error loss, total variation loss, and information divergence are obtained from the third and fifth medical image samples. The trained radial artifact removal model and the initial artifact removal model are obtained based on the anti-loss, the perceptual loss, the mean squared error loss, the total variation loss, and the information divergence.
6. The method according to claim 5, characterized in that, The radial artifact removal model and the initial artifact removal model trained based on the anti-loss, the perceptual loss, the mean squared error loss, the total variation loss, and the information divergence include: The first total loss is calculated based on the mean squared error loss, the total variation loss, and the information divergence. The first total loss is then used to adjust the radial artifact removal model to be trained until the trained radial artifact removal model is obtained. A second total loss is calculated based on the anti-loss and the perceptual loss, and the second total loss is used to adjust the initial artifact removal model to be trained until the trained initial artifact removal model is obtained.
7. The method according to claim 5, characterized in that, After inputting random noise into the first generator to be trained to obtain the first metal artifact image sample in the sinusoidal domain, the method further includes: The collected image samples with metal artifacts are labeled to obtain the first image sample of the metal artifact region at the marked location; and the first image sample is transformed to obtain the second metal artifact image sample in the sinusoidal domain. The second metal artifact image sample and the first metal artifact image sample are input into the first discriminator to be trained to obtain the Earth movement distance loss. The first generator and the first discriminator to be trained are then trained using the Earth movement distance loss until the trained first generator and the first discriminator are obtained.