Self-supervised low-dose CT denoising method based on decomposition convolutional neural network
By employing a self-supervised method based on decomposition convolutional neural networks, the noise interference problem in low-dose CT image denoising is solved, achieving efficient image decomposition and clear image generation under unlabeled data, supporting clinical diagnosis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SOUTHEAST UNIV
- Filing Date
- 2024-08-01
- Publication Date
- 2026-06-26
Smart Images

Figure CN118887127B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to a self-supervised low-dose CT denoising method based on decomposed convolutional neural networks, belonging to the field of medical image processing technology. Background Technology
[0002] Currently, the application of artificial intelligence technology in the medical field has received widespread attention and achieved significant progress. Utilizing technologies such as deep learning, computers can effectively remove noise and other undesirable phenomena from medical images while preserving important anatomical structural information. This provides assistance and suggestions for clinicians in diagnosis and treatment, and greatly improves the efficiency and accuracy of medical image diagnosis.
[0003] Low-dose computed tomography (LDCT), as an important means of acquiring images with low radiation doses, has been widely used in clinical diagnosis. It significantly reduces the radiation dose received by patients, thereby substantially lowering the potential radiation risks associated with long-term medical imaging examinations. This is particularly important for patients who require repeated imaging examinations, such as cancer patients or patients with chronic diseases requiring long-term follow-up. Although low-dose CT effectively reduces the radiation risk for patients, its images are often affected by noise, posing a challenge to accurate image analysis and diagnosis. Currently, the most publicly available and popular image denoising methods can be divided into four categories: traditional methods, supervised learning methods, unsupervised learning methods, and self-supervised learning methods.
[0004] Traditional methods effectively remove noise, but introduce new artifacts and distortions into the denoised images. They also require numerous parameters that need to be manually fine-tuned, and most importantly, their slow processing speed makes them impractical. Deep learning-based artificial intelligence has played a significant role in medical image processing in recent years. Specifically, supervised learning methods, which require spatially well-matched LDCT and NDCT training datasets, perform best. However, their impractical training dataset requirements make them difficult to apply clinically. On one hand, performing multiple NDCT and LDCT scans on the same patient is not advisable; on the other hand, unavoidable factors such as patient movement and organ motion can cause inconsistencies between NDCT and LDCT scans. Unsupervised networks alleviate the stringent dataset requirements to some extent, not requiring NDCT images corresponding to LDCT scans, but still necessitating high-quality unmatched NDCT data. Self-supervised methods have been widely used in natural image denoising, demonstrating good denoising effects and robustness, but publicly available literature on self-supervised learning-based LDCT denoising methods is limited. To address the aforementioned issues and considering the difficulty in obtaining a large number of accurate labels for low-dose CT images, a self-supervised denoising method based on unlabeled data is designed. This method is of significant value for low-dose CT denoising in image diagnosis and clinical decision-making. Summary of the Invention
[0005] The problem this invention aims to solve is to address the difficulty of acquiring a large number of accurate labels for medical images and overcome the shortcomings of existing technologies by providing a self-supervised low-dose CT denoising method based on decomposition convolutional neural networks. This method decomposes LDCT images into clean images, signal-related noise images, and signal-independent noise images, guiding the network to learn image decomposition in unlabeled data. This separates noise-free CT images from LDCT images, providing technical support for clinicians in their diagnostic work.
[0006] To address the aforementioned technical problems, this invention provides a self-supervised low-dose CT denoising method based on decomposed convolutional neural networks, comprising the following steps:
[0007] (1) LDCT imaging data acquisition and storage: Acquire the patient's LDCT imaging data and store the data in DICOM format;
[0008] (2) Data preprocessing: The values of pixels outside the CT scan boundary in the LDCT imaging data are set to 0 and normalized within the range of [-1024, 3072], and then stored in NPY format;
[0009] (3) Deep learning denoising model training: The LDCT data obtained in step (2) is randomly divided into training set and test set, and then the data is input into the deep learning denoising model for training.
[0010] (4) Low-dose CT denoising: The clean image generator of the denoising model trained in step (3) is used to denoise the LDCT image data. Since the complete training model is not required, the algorithm effectively improves the efficiency of the testing phase.
[0011] Specifically, the deep learning denoising model in step (3) consists of two parts: a decomposition module and a recurrent module. The decomposition module consists of a clean image generator, an encoder, a decoder (decoder 1 and decoder 2), and a signal-guided attention module. The trainable edge detection module is implemented by multiplying four traditional edge detection operators in different directions by a learnable parameter to extract edge information of the image. The four directions are horizontal, vertical, and two different diagonal directions. The clean image generator is a dual-branch module. The main branch consists of eight convolutional blocks to extract feature information, and the auxiliary branch consists of the trainable edge detection module and seven convolutional blocks to extract edge information. The feature map of the auxiliary branch is added to the feature map of the corresponding main branch. Finally, the feature maps of the two branches are added together and then passed through two convolutional blocks to produce a clean image with enhanced details. The encoder consists of ten convolutional blocks to obtain the depth feature map. The encoder and decoder 1 use a signal-guided attention module, which consists of average pooling layers, max pooling layers, and a multilayer perceptron, to calculate the correlation between signal-related noise and the signal. The two decoders have the same structure, containing five convolutional blocks. Decoder 1 receives features from the signal-guided attention module, while decoder 2 receives features from the encoder. The outputs of the two decoders are a signal-related noise image and a signal-independent noise image, respectively. The decomposition module works by first feeding the normalized LDCT data into a clean image generator to generate a noise-free image, then subtracting the noise-free image from the noise image to obtain the noise image. The noise image and the noise-free image are then processed by the encoder and decoder to obtain the signal-related noise image and the signal-independent noise image, respectively. The recurrent module randomly combines the three outputs of the decomposition module (noise-free image, signal-related noise image, and signal-independent noise image) to generate new samples, which are then decomposed again by the decomposition module to improve the accuracy of the decomposition results and thus improve the model's denoising performance. The denoising model is trained by calculating self-supervised loss functions such as Anisotropic Total Variation (ATV) and distance loss function.
[0012] In step (3), the input data to the deep learning denoising model needs to be processed by a clean image generator to obtain a clean image. The data is input into the generator, which consists of two parallel branches (main branch and auxiliary branch). In the main branch, the input data to the network is first processed by a convolutional block to extract a feature map F1. The dimension of the feature map F1 is defined as (W, H, C), where W, H, and C represent the width, height, and number of channels in the image space dimension, respectively. W = H = 128, C = 64. Then, high-dimensional features are extracted by passing through 7 convolutional blocks in sequence, and the dimension of the feature map remains unchanged. The only difference between the auxiliary branch and the main branch is that the trainable edge detection operator is used to replace the starting convolutional layer of the main branch. The auxiliary branch adds the feature maps output by each multi-scale module to the corresponding feature maps of the main branch, thereby enhancing the edge information of the image for each layer of feature maps. After the feature maps of the two branches are added together, the number of image channels is changed to a single channel by passing through two convolutional blocks to output a clean image. Each convolutional block consists of one convolutional layer and one activation function (ReLU).
[0013] In step (3), the signal-guided attention module processes the two features extracted by the encoder to calculate the complex correlation between signal-related noise and the clean image. The specific process can be represented as follows:
[0014] Att channel =σ(MLP(Mp) channel (F clean ))+MLP(Ap channel (F clean )))
[0015] F channel =Att channel ⊙F noise
[0016] Att spatial =σ(Conv(Concat(Mp) spatial (F channel ),Ap spatial (F channel ))))
[0017] F = Att spatial ⊙F noise
[0018] Where Conv represents a convolutional block, Concat represents a feature concatenation operation, ⊙ represents element-wise multiplication, σ(·) represents the sigmoid activation function, and MLP represents a multilayer perceptron. The input to the signal-guided attention module is the feature map F obtained by the encoder from the clean image output by the clean image generator. clean The feature map F is obtained by encoding the noisy image obtained by subtracting the clean image from the LDCT image, and then processing it. noiseThe signal-guided attention module can be divided into two parts: the channel signal-guided attention module and the spatial signal-guided attention module. The channel signal-guided attention module first uses a maximum pooling layer Mp. channel and average pooling layer Ap channel For clean image feature map F clean The spatial dimension is compressed, and then the channel attention map Att is obtained by passing it through a multilayer perceptron and a sigmoid activation function σ(·). channel Then Att channel Through broadcasting mechanism with F noise Element-wise multiplication is performed to aggregate channel features. The feature F aggregated from channel information is then obtained. channel Subsequently, the maximum pooling layer Mp was used respectively. spatial and average pooling layer Ap spatial For clean image feature map F channel The channel dimension is compressed, and then the number of channels is reduced to 2 after concatenation using cat(). The convolutional layer conv(·) and the sigmoid activation function σ(·) then set the number of channels back to 1. Finally, a broadcast mechanism is used to connect it to F. noise Element-by-element multiplication is performed to aggregate spatial features.
[0019] The loop module in step (3) randomly combines the three decomposed images from the decomposition module and then decomposes them again to ensure the accuracy of the decomposition and that none of the three decomposed images contain information from other images. The processing flow of the loop module for the decomposed images is as follows:
[0020]
[0021] LDCT image I n After decomposition module Net decom (·) Output three images, i.e., the decomposition results: clean image Image with signal-related noise and image with noise independent of signal The first image decomposition is completed. The three images from the first decomposition are linearly combined using coefficients [α,β,θ] to generate a new sample I. new Then the new sample I new The input is then fed back into the decomposition module for further decomposition to obtain the corresponding output: as well as
[0022] After passing through the decomposition and recurrent modules of the deep learning denoising model, the LDCT data input to the network is first decomposed into three images, which are then randomly recombined to generate new samples I. new The image is then further decomposed by the decomposition module. For ease of description, the decomposition model decomposes the LDCT image into three images (clean image). Image with signal-related noise Signal-independent noise images The process representation Total loss function L all The specific calculation process is as follows:
[0023]
[0024] Among them, L MSE L2 and L2 represent the Mean Square Error (MSE) loss function and the L2 norm, respectively. |·| refers to taking the absolute value, and max(·,·) takes the maximum of the two input parameters, λ. ATV and λ dis These are the weight parameters of the loss function, and m is a parameter used to control the distance between different sample pairs. Here, h and w represent the rows and columns of the image, respectively, M is the total number of pixels, and H is the number of rows in the image. The decomposition loss function L... decom First, ensure that the sum of the three images from the first decomposition is consistent with the input LDCT image. Simultaneously, ensure that the components from the second decomposition are consistent with the components of the new sample, thus achieving the basic decomposition function. The anisotropic total variation loss function L... ATV This method utilizes image gradients for computation. First, two gradient operators are used to calculate the gradients in the horizontal and vertical directions of the image, respectively, to detect edges and texture directions. Then, the gradient magnitudes are exponentially weighted to adaptively preserve edge information. Finally, the calculated gradient magnitudes are averaged to obtain the average gradient value for the image; the distance loss function L... dis This involves increasing the distance between the clean image after decomposition and the other two noisy images, so that the clean image does not contain information from the other two noisy images.
[0025] Compared to existing technologies, the advantages of this invention are as follows: This invention provides a self-supervised low-dose CT denoising method based on decomposed convolutional neural networks, which can learn rich image information using only LDCT data; it employs a detail-enhancing dual-branch convolutional form to decompose into clean images with enhanced details; it uses an encoder-decoder approach to extract multi-scale features and embeds a signal-guided attention module to calculate the complex nonlinear relationship between clean images and signal-related noise images, thereby better decomposing noise images into signal-related noise images and signal-independent noise images; the model constructs a recurrent module to guide the decomposition module to improve its ability to decompose features of different LDCT signals, and on this basis, constructs a self-supervised loss function, combining anisotropic loss functions and distance loss functions to form a joint loss function, thereby better improving the final effect of the self-supervised deep learning denoising model, helping clinicians obtain clear internal structures of patients, and assisting in the formulation of subsequent diagnosis and treatment plans. In the testing phase, only a pre-trained clean image generator is used, improving model efficiency and reducing resource consumption, significantly optimizing the performance and maintainability of the deep learning system. This method addresses the reality that it is difficult to obtain well-labeled medical images, overcomes the shortcomings of existing technologies, and provides technical support for clinicians to carry out diagnostic work. Attached Figure Description
[0026] Figure 1 This is a schematic diagram of the overall process of the present invention;
[0027] Figure 2 This is a schematic diagram of the deep learning denoising network architecture of the present invention;
[0028] Figure 3 A schematic diagram of a clean image generator structure in a deep learning denoising network;
[0029] Figure 4 A schematic diagram of a trainable edge detection operator structure in a deep learning denoising network;
[0030] Figure 5 A schematic diagram of the signal-guided attention module structure in a deep learning denoising network;
[0031] Figure 6 This is a schematic diagram of the noise reduction result of the present invention. Detailed Implementation
[0032] The embodiments of the present invention will now be described in further detail with reference to the accompanying drawings.
[0033] Example: Figure 1 The self-supervised low-dose CT denoising method based on decomposed convolutional neural networks, as shown, includes the following steps:
[0034] (1) LDCT imaging data acquisition and storage: Acquire the patient's LDCT imaging data and store the data in DICOM format;
[0035] (2) Data preprocessing: The values of pixels outside the CT scan boundary in the LDCT imaging data are set to 0 and normalized within the range of [-1024, 3072], and then stored in NPY format;
[0036] (3) Deep learning denoising model training: The LDCT data obtained in step (2) is randomly divided into training set and test set, and then the data is input into the deep learning denoising model for training.
[0037] (4) Low-dose CT denoising: The clean image generator of the denoising model trained in step (3) is used to denoise the LDCT image data. The complete training model is not required, which effectively improves the efficiency of the testing phase.
[0038] like Figure 2As shown, the deep learning denoising model in step (3) consists of two parts: a decomposition module and a recurrent module. The decomposition module consists of a clean image generator, an encoder, a decoder (decoder 1 and decoder 2), and a signal-guided attention module. The trainable edge detection module is implemented by multiplying four traditional edge detection operators in different directions by a learnable parameter to extract edge information of the image. The four directions are horizontal, vertical, and two different diagonal directions. The clean image generator is a dual-branch module. The main branch consists of eight convolutional blocks to extract feature information, and the auxiliary branch consists of the trainable edge detection module and seven convolutional blocks to extract edge information. The feature map of the auxiliary branch is added to the feature map of the corresponding main branch. Finally, the feature maps of the two branches are added together and then passed through two convolutional blocks to produce a clean image with enhanced details. The encoder consists of ten convolutional blocks to obtain the depth feature map. The encoder and decoder 1 use a signal-guided attention module, which consists of average pooling layers, max pooling layers, and a multilayer perceptron, to calculate the correlation between signal-related noise and the signal. The two decoders have the same structure, containing five convolutional blocks. Decoder 1 receives features from the signal-guided attention module, while decoder 2 receives features from the encoder. The outputs of the two decoders are a signal-related noise image and a signal-independent noise image, respectively. The decomposition module works by first feeding the normalized LDCT data into a clean image generator to generate a noise-free image, then subtracting the noise-free image from the noise image to obtain the noise image. The noise image and the noise-free image are then processed by the encoder and decoder to obtain the signal-related noise image and the signal-independent noise image, respectively. The recurrent module randomly combines the three outputs of the decomposition module (noise-free image, signal-related noise image, and signal-independent noise image) to generate new samples, which are then decomposed again by the decomposition module to improve the accuracy of the decomposition results and thus improve the model's denoising performance. The denoising model is trained by calculating self-supervised loss functions such as Anisotropic Total Variation (ATV) and distance loss function.
[0039] like Figure 3 As shown, in step (3), the input data to the deep learning denoising model needs to be processed by a clean image generator to obtain a clean image. The data is then input into the generator, which consists of two parallel branches (main branch and auxiliary branch). In the main branch, the input data to the network is first processed by a convolutional block to extract a feature map F1, whose dimensions are defined as (W, H, C), where W, H, and C represent the width, height, and number of channels in the image space dimension, respectively, with W = H = 128 and C = 64. Then, high-dimensional features are extracted sequentially through seven convolutional blocks, while the dimensions of the feature map remain unchanged; for example... Figure 3 and Figure 4As shown, the only difference between the auxiliary branch and the main branch of the encoder is that the initial convolutional layer of the main branch is replaced with a trainable edge detection operator. The auxiliary branch adds the feature maps output by each multi-scale module to the corresponding feature maps of the main branch, thereby enhancing the edge information of the image. After the feature maps of the two branches are added together, the image channel number is changed to a single channel through two convolutional blocks, resulting in a clean output image. Each convolutional block consists of one convolutional layer and one activation function (ReLU).
[0040] like Figure 5 As shown, the signal-guided attention module in step (3) processes the two features extracted by the encoder to calculate the complex correlation between signal-related noise and the clean image. The specific process can be represented as follows:
[0041] Att channel =σ(MLP(Mp) channel (F clean ))+MLP(Ap channel (F clean )))
[0042] F channel =Att channel ⊙F noise
[0043] Att spatial =σ(Conv(Concat(Mp) spatial (F channel ),Ap spatial (F channel ))))
[0044] F = Att spatial ⊙F noise
[0045] Where Conv represents a convolutional block, Concat represents a feature concatenation operation, ⊙ represents element-wise multiplication, σ(·) represents the sigmoid activation function, and MLP represents a multilayer perceptron. The input to the signal-guided attention module is the feature map F obtained by the encoder from the clean image output by the clean image generator. clean The feature map F is obtained by encoding the noisy image obtained by subtracting the clean image from the LDCT image, and then processing it. noise The signal-guided attention module can be divided into two parts: the channel signal-guided attention module and the spatial signal-guided attention module. The channel signal-guided attention module first uses a maximum pooling layer Mp. channel and average pooling layer Ap channel For clean image feature map F cleanThe spatial dimension is compressed, and then the channel attention map Att is obtained by passing it through a multilayer perceptron and a sigmoid activation function σ(·). channel Then Att channel Through broadcasting mechanism with F noise Element-wise multiplication is performed to aggregate channel features. The feature F aggregated from channel information is then obtained. channel Subsequently, the maximum pooling layer Mp was used respectively. spatial and average pooling layer Ap spatial For clean image feature map F channel The channel dimension is compressed, and then the number of channels is reduced to 2 after concatenation using cat(). The convolutional layer conv(·) and the sigmoid activation function σ(·) then set the number of channels back to 1. Finally, a broadcast mechanism is used to connect it to F. noise Element-by-element multiplication is performed to aggregate spatial features.
[0046] like Figure 2 As shown, in step (3), the loop module randomly combines and decomposes the three decomposed images from the decomposition module to ensure the accuracy of the decomposition and that none of the three decomposed images contain information from other images. The processing flow of the loop module for the decomposed images is as follows:
[0047]
[0048]
[0049] LDCT image I n After decomposition module Net decom (·) Output three images, i.e., the decomposition results: clean image Image with signal-related noise and image with noise independent of signal The first image decomposition is completed. The three images from the first decomposition are linearly combined using coefficients [α,β,θ] to generate a new sample I. new Then the new sample I new The input is then fed back into the decomposition module for further decomposition to obtain the corresponding output: as well as
[0050] After passing through the decomposition and recurrent modules of the deep learning denoising model, the LDCT data input to the network is first decomposed into three images, which are then randomly recombined to generate new samples I. new It is then further decomposed by the decomposition module, such as Figure 2 As shown. For ease of description, the decomposition model decomposes the LDCT image into three images (clean image). Image with signal-related noise Signal-independent noise images The process representation Total loss function L all The specific calculation process is as follows:
[0051]
[0052] Among them, L MSE L2 and L2 represent the Mean Square Error (MSE) loss function and the L2 norm, respectively. |·| refers to taking the absolute value, and max(·,·) takes the maximum of the two input parameters, λ. ATV and λ dis These are the weight parameters of the loss function, and m is a parameter used to control the distance between different sample pairs. Here, h and w represent the rows and columns of the image, respectively, M is the total number of pixels, and H is the number of rows in the image. The decomposition loss function L... decom First, ensure that the sum of the three images from the first decomposition is consistent with the input LDCT image. Simultaneously, ensure that the components from the second decomposition are consistent with the components of the new sample, thus achieving the basic decomposition function. The anisotropic total variation loss function L... ATV This method utilizes image gradients for computation. First, two gradient operators are used to calculate the gradients in the horizontal and vertical directions of the image, respectively, to detect edges and texture directions. Then, the gradient magnitudes are exponentially weighted to adaptively preserve edge information. Finally, the calculated gradient magnitudes are averaged to obtain the average gradient value for the image; the distance loss function L... dis This involves increasing the distance between the clean image after decomposition and the other two noisy images, so that the clean image does not contain information from the other two noisy images.
[0053] Input the LDCT image data for testing, and use the trained deep learning denoising model as described in step (3) to denoise the LDCT image data. The generated denoised image is as follows. Figure 6 As shown.
[0054] It should be noted that the above embodiments are not intended to limit the scope of protection of the present invention. Equivalent transformations or substitutions made based on the above technical solutions all fall within the scope of protection of the claims of the present invention.
Claims
1. A self-supervised low-dose CT denoising method based on decomposed convolutional neural networks, characterized in that, The method includes the following steps: (1) LDCT imaging data acquisition and storage: Acquire the patient's LDCT imaging data and store the data in DICOM format; (2) Data preprocessing: The values of pixels outside the CT scan boundary in the LDCT imaging data are set to 0 and normalized within the range of [-1024, 3072], and then stored in NPY format; (3) Deep learning denoising model training: The LDCT data obtained in step (2) is randomly divided into training set and test set, and then the data is input into the deep learning denoising model for training. (4) Low-dose CT denoising: Denoising the LDCT image data using the clean image generator in the denoising model trained in step (3); The deep learning denoising model in step (3) consists of two parts: a decomposition module and a recurrent module. The decomposition module is composed of a clean image generator, an encoder, a decoder 1, a decoder 2, and a signal-guided attention module. The trainable edge detection module is implemented by multiplying four traditional edge detection operators in different directions by a learnable parameter to extract edge information of the image. The four directions are horizontal, vertical, and two different diagonal directions. The clean image generator is a dual-branch module. The main branch consists of eight convolutional blocks to extract feature information, and the auxiliary branch consists of a trainable edge detection module and seven convolutional blocks to extract edge information. The feature map of the auxiliary branch is added to the feature map of the corresponding main branch. Finally, the feature maps of the two branches are added together and then passed through two convolutional blocks to generate a clean image with enhanced details. The encoder consists of ten convolutional blocks to obtain depth feature maps. The signal-guided attention module is used between the encoder and decoder 1. It consists of an average pooling layer, a max pooling layer, and a multilayer perceptron to calculate the signal-guided attention module. The model addresses the correlation between noise and signal. Two decoders have the same structure, containing five convolutional blocks. Decoder 1 receives features from the signal-guided attention module, while Decoder 2 receives features from the encoder. The outputs of the two decoders are a noise image correlated with the signal and a noise image independent of the signal. The decomposition module works by first feeding the normalized LDCT data into a clean image generator to produce a noise-free image. Then, the noise-free image is subtracted from the noise-free image to obtain the noise image. The noise-free and noise-noise images are then processed by the encoder and decoder to obtain a noise image correlated with the signal and a noise image independent of the signal, respectively. The recurrent module randomly combines the three outputs of the decomposition module—the noise-free image, the noise-correlated image, and the noise image independent of the signal—to generate new samples. These samples are then decomposed again by the decomposition module to improve the accuracy of the decomposition results, thereby enhancing the model's denoising performance. The denoising model is trained by calculating anisotropic total variation loss function, distance loss function, and self-supervised loss function.
2. The self-supervised low-dose CT denoising method based on decomposed convolutional neural networks according to claim 1, characterized in that: In the clean image generator, data is input into a generator consisting of two parallel main branches and auxiliary branches. In the main branch, the input data to the network first extracts feature maps using a single convolutional block. Its dimension is defined as Where W, H, and C represent the width, height, and number of channels in the image space, respectively. , Then, high-dimensional features are extracted through 7 convolutional blocks in sequence, while the dimension of the feature map remains unchanged. The only difference between the auxiliary branch and the main branch of the encoder is that the initial convolutional layer of the main branch is replaced by a trainable edge detection operator. The auxiliary branch adds the feature maps output by each multi-scale module to the feature maps of the corresponding layers of the main branch, thereby enhancing the edge information of the image for each layer of feature maps. After the feature maps of the two branches are added together, the number of image channels is changed to a single channel through 2 convolutional blocks to output a clean image. Each convolutional block consists of 1 convolutional layer and a ReLU activation function.
3. The self-supervised low-dose CT denoising method based on decomposed convolutional neural networks according to claim 2, characterized in that: The feature processing flow of the signal-guided attention module is as follows: Where Conv represents a convolutional block, and Concat represents a feature concatenation operation. This represents element-wise multiplication. This represents the sigmoid activation function. This represents a multilayer perceptron, where the signal-guided attention module is fed into a clean image (output from a clean image generator) and a feature map (obtained from an encoder). The feature map is obtained by encoding the noisy image obtained by subtracting the clean image from the LDCT image, and then processing it. The signal-guided attention module can be divided into two parts: the channel signal-guided attention module and the spatial signal-guided attention module. The channel signal-guided attention module first uses a max pooling layer. and average pooling layer clean image feature map The spatial dimension is compressed, and then passed through a multilayer perceptron and a sigmoid activation function. Obtain the channel attention map Subsequently Through broadcasting mechanism and Element-wise multiplication is performed to aggregate channel features, resulting in features aggregated from channel information. Then, they were respectively processed through the maximum pooling layer. and average pooling layer clean image feature map Compress the channel dimensions, then stitch them together. The number of channels is 2, while the convolutional layer and sigmoid activation function Set its channel number back to 1, and finally communicate with it via broadcast mechanism. Element-by-element multiplication is performed to aggregate spatial features.
4. The self-supervised low-dose CT denoising method based on decomposed convolutional neural networks according to claim 2, characterized in that: The processing flow of the decomposed image in the loop module is as follows: LDCT images After decomposition module Output three images, i.e., the decomposition results: clean image Image with signal-related noise and image with noise independent of signal The first image decomposition is completed. The three images from the first decomposition are then linearly combined using coefficients [α, β, θ] to generate new samples. Then the new sample The input is then fed back into the decomposition module for further decomposition to obtain the corresponding output: , as well as .
5. The self-supervised low-dose CT denoising method based on decomposed convolutional neural networks according to claim 2, characterized in that: The heterogeneous total variation loss function is processed as follows: in, It is a predicted clean image. and These are the traditional Sobel operators in the horizontal and vertical directions, respectively. It is an exponential function, where (h, w) represents the pixel position. Total variation with anisotropy is an image gradient calculation method. First, two gradient operators are used to calculate the gradient of the image in the horizontal and vertical directions to detect the edges and texture directions in the image. Then, the gradient magnitude is processed with exponential weights to adaptively preserve edge information. Finally, the calculated gradient magnitudes are averaged to obtain the average gradient value of the image.
6. The self-supervised low-dose CT denoising method based on decomposed convolutional neural networks according to claim 2, characterized in that: The distance loss function is processed as follows: in, It is a predicted clean image. It is a predicted image of noise related to the signal. It is a predicted image with noise independent of the signal. It takes the maximum value of the two passed parameters. Let L2 represent the L2 norm, m be a parameter used to control the distance between different sample pairs, and h represent the number of rows in the image. The distance loss function first receives two sets of dissimilar feature vectors, and then uses Euclidean distance to measure their similarity. Specifically, the calculation takes the difference between the Euclidean distance and the margin. If the difference is negative, the loss is zero; if it is positive, the loss is the square of the difference. This guides the model to optimize and maximize the distance between dissimilar samples, effectively guiding the model to deeply understand the differences between samples, thereby improving the model's learning ability and generalization ability.