Medical image colorization method based on improved cycle generative adversarial network

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using an improved Med-BiCycleGAN network, combined with semantic segmentation and edge detection, the problems of color inaccuracy and structural distortion in medical image colorization are solved, achieving high-quality medical image colorization suitable for situations where it is inconvenient to obtain paired data.

CN122199710APending Publication Date: 2026-06-12NANJING INST OF TECH +1

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: NANJING INST OF TECH
Filing Date: 2026-03-04
Publication Date: 2026-06-12

Application Information

Patent Timeline

04 Mar 2026

Application

12 Jun 2026

Publication

CN122199710A

IPC: G06T11/40; G06T11/60; G06T5/60; G06T5/90; G06N3/045; G06N3/0455; G06N3/044; G06N3/0464; G06N3/0475; G06N3/094; G06V20/70; G06V10/26; G06V10/44; G06V10/42; G06V10/80; G06V10/82; G06N3/048

AI Tagging

Application Domain

Image enhancement Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing recurrent generative adversarial networks (RGANs) suffer from problems such as inaccurate color, color mixing, image noise, structural distortion, and semantic confusion in medical image colorization, and data acquisition is also difficult.

⚗Method used

An improved Med-BiCycleGAN network is adopted, which combines semantic segmentation, edge detection and attention mechanism. It realizes the conversion of grayscale medical images to color images through bidirectional recurrent generative adversarial network. Multiple loss functions are used to optimize the training process, including adversarial loss, recurrent consistency loss, perceptual loss, edge loss and TV loss.

🎯Benefits of technology

It improves the medical semantic accuracy and key feature accuracy of medical images, reduces noise, avoids color mixing and structural distortion, expands the scope of application, and does not rely on paired datasets.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122199710A_ABST

Patent Text Reader

Abstract

The application provides a medical image colorization method based on an improved cycle generative adversarial network, obtains preprocessed training data sets by acquiring a gray medical image and a medical color image, constructs a medical bidirectional cycle generative adversarial network, namely a Med-BiCycleGAN network, initializes network parameters of the Med-BiCycleGAN network, inputs the preprocessed training data sets into the Med-BiCycleGAN network, calculates total loss and gradients of the Med-BiCycleGAN network, updates weight thresholds and parameters of each layer in the Med-BiCycleGAN network, obtains a trained Med-BiCycleGAN network, and realizes medical image colorization; and the method improves medical semantic correctness of medical images, accuracy of key features such as organs or pathological edges, and readability of medical images.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to a method for colorizing medical images based on an improved recurrent generative adversarial network, belonging to the field of computer vision. Background Technology

[0002] Medical images such as CT and MRI scans contain grayscale images of human anatomical structures, which can be used for non-invasive clinical diagnosis. Only trained radiologists can visually identify and distinguish anatomical structures in radiometric imaging data. Numerous studies have utilized deep learning techniques for colorization of natural images, but these techniques have performed poorly in the medical field. In recent years, deep learning has provided organ-based automatic structural recognition for color transfer in the medical field and improved the visual experience of human anatomy. Driven by deep learning, generative adversarial networks (GANs) have shown significant potential in medical image processing. Through an adversarial training mechanism between the generator and discriminator, GANs can learn complex data distributions and have been effectively applied to tasks such as image super-resolution reconstruction (e.g., enhancing low-resolution MRI), noise suppression (e.g., low-dose CT denoising), and missing modality synthesis, alleviating data scarcity to some extent and improving image quality.

[0003] However, in practical applications, GANs still face problems such as training instability and model susceptibility to noise, limiting their further application and development in the medical field. Supervised GANs rely on large-scale paired labeled data, limiting clinical applications, while unsupervised methods, although reducing labeling requirements, are prone to structural distortion and blurred boundaries. Furthermore, the acquisition of medical image data is limited by multiple factors: including patient privacy regulations, ethical review norms, health risk concerns, and high examination costs, leading to limitations in data scale, particularly the lack of grayscale-color paired image datasets. CycleGAN (Cycle-consistent adversarial networks) does not rely on paired data and achieves unsupervised cross-modal image transformation through cycle consistency loss. It has been applied to tasks such as the transformation of unregistered medical images of different modalities (e.g., MRI to CT image transformation), effectively solving the problem of difficult acquisition of paired data in medical scenarios. However, CycleGANs used for medical image colorization still suffer from problems such as inaccurate color or color mixing, image noise, structural distortion, and semantic confusion. Summary of the Invention

[0004] The purpose of this invention is to provide a medical image colorization method based on an improved recurrent generative adversarial network to solve the problems of inaccurate color or color mixing, excessive image noise, and low accuracy of medical semantics and key features in the existing technology.

[0005] The technical solution of this invention is:

[0006] A medical image colorization method based on an improved recurrent generative adversarial network includes the following steps: S1. Obtain grayscale medical images and medical color images to obtain a training dataset. Preprocess the training dataset to obtain a preprocessed training dataset. S2. Construct a medical bidirectional recurrent generative adversarial network, namely the Med-BiCycleGAN network. The Med-BiCycleGAN network includes a generator network and a recognition network. The generator network introduces semantic segmentation, edge detection, and attention mechanisms to achieve mutual conversion between grayscale and color images of the input grayscale medical images and medical color images, and to generate color images and grayscale images accordingly. The recognition network includes a recognizer Dx and a recognizer D. Y The identifier Dx outputs the probability that the input grayscale medical image and the grayscale image generated by the generative network are true grayscale images; the identifier D... Y Based on the input medical color image and the color image generated by the generative network, the output is the probability of being a true color image; S3. Initialize the network parameters of the Med-BiCycleGAN network; S4. Input the preprocessed training dataset into the Med-BiCycleGAN network, calculate the forward output of each branch and layer of the Med-BiCycleGAN medical bidirectional recurrent generative adversarial network, and calculate the total loss and gradient of the Med-BiCycleGAN network; where the total loss includes adversarial loss, recurrent consistency loss, perceptual loss, edge loss and TV loss. S5. Update the weight thresholds and parameters of each layer in the Med-BiCycleGAN network; S6. Determine whether the Med-BiCycleGAN network meets the iteration termination condition. If not, go to step S4; otherwise, stop network training and obtain the trained Med-BiCycleGAN network. S7. Input the medical image to be converted into the trained Med-BiCycleGAN network to obtain a colorized image.

[0007] Further, in step S2, the generating network in the Med-BiCycleGAN network includes generator Gx and generator G. YThe network consists of a first semantic segmentation network, a first attention mechanism module, a first edge detection network, a second semantic segmentation network, a second attention mechanism module, a second edge detection network, a third semantic segmentation network, and a third edge detection network. Generator Gx: Used to generate a grayscale image X' from the input raw color image Y; Generator G Y : Used to generate a color image Y' from the input raw grayscale image X; First semantic segmentation network: The original grayscale image X is labeled to guide the generator G. Y After semantically assigning the organ and tissue categories to which the pixels with colors belong, the first semantic mask feature map is output to the first attention mechanism module. The first attention mechanism module: assigns weights to the input color image Y' and the first semantic mask feature map, and after weight adjustment, outputs the feature map. Figure 1 Give generator G Y ; First edge detection network: After performing edge detection on the input original grayscale image X, it outputs the first edge probability map; The second semantic segmentation network: After labeling the input original color image Y to guide the generator Gx to semantically assign the organ and tissue categories to which the pixels of the color belong, it outputs the second semantic mask feature map to the second attention mechanism module; The second attention mechanism module assigns weights to the input grayscale image X' and the second semantic mask feature map, and after weight adjustment, outputs the feature map. Figure 2 Give the generator Gx; Second edge detection network: After performing edge detection on the input original color image Y, it outputs a second edge probability map; The third semantic segmentation network: After labeling the input color image Y' with the organ and tissue categories to which the pixels belong in semantic color assignment to the generator, the output is a third semantic mask feature map; The third edge detection network performs edge detection on the input color image Y' and outputs a third edge probability map.

[0008] Furthermore, in step S3, the initialized parameters include: generator encoder depth, number of generator input layer channels, number of residual blocks, discriminator network depth, number of discriminator input layer channels, maximum training cycle, minimum training batch size, learning rate, gradient decay factor, squared gradient decay factor, model hold frequency, cyclic loss weights, number of test samples, and network loss weights.

[0009] Further, in step S4, the total loss L of the Med-BiCycleGAN network is calculated. Total : , In the formula, Adversarial loss for generator Gx; For generator G Y The losses incurred in the fight against it; The cyclic consistency loss for pixel-feature fusion; To perceive loss; For edge loss; λ1 represents the weighting coefficient of the cycle consistency loss; λ2 represents the weighting coefficient of the perception loss; λ3 represents the weighting coefficient of the edge loss; and λ4 represents the weighting coefficient of the TV loss.

[0010] Furthermore, the adversarial loss of generator Gx With generator G Y The losses of the confrontation : , In the formula, It is the mathematical expectation of the X domain; It is the mathematical expectation of the Y domain; The identifier represents the original color image Y as input. The output, Represents generator When the output is used as input, the identifier The output; The identifier represents the original grayscale image X as input. The output, Represents generator When the output is used as input, the identifier The output.

[0011] Furthermore, the pixel-feature fusion cycle consistency loss L cycel-semantic : , In the formula, This results in pixel-level cycle consistency loss. This is the feature-level cycle consistency loss; This represents a generator that transforms a Y-domain image into an X-domain image. This represents a generator that transforms an X-domain image into a Y-domain image. It is the mathematical expectation of the X domain; It is the mathematical expectation of the Y domain; Represents the L1 loss function; Represents the L2 loss function; It represents the feature information of each domain image.

[0012] Furthermore, perceived loss : , In the formula, These are the weighting coefficients used to balance content loss and style loss, and Content loss L Content The difference between the feature maps of the l-th layer of the second semantic segmentation network is calculated using the mean squared error (MSE). , In the formula, This represents the second semantic mask feature map. This represents the third semantic mask feature map, with a feature map dimension of . C l H represents the number of channels. l W represents the feature map height. l The width of the feature map; Style loss L Style To measure style consistency using the Gram matrix difference of the feature maps at layer l of the semantic segmentation network, the expression is: , In the formula, Gram matrix The definition of is: ,in, This represents the second semantic mask feature map. This represents the third semantic mask feature map, with a feature map dimension of . C l H represents the number of channels. l W represents the feature map height. l This represents the width of the feature map.

[0013] Furthermore, edge loss L edge : , In the formula, λ is the weight; Edge loss of the original color image Y and the color image Y' generated by the generator Gx for: , In the formula, This represents the pixel value at (i,j) in the second edge probability map; ω represents the pixel value at (i,j) of the third edge probability map; H represents the image height; and W represents the image width. Edge loss of the original grayscale image X and the color image Y' generated by the generator Gx for: , In the formula, This represents the pixel value at (i,j) in the first edge probability map; ω represents the pixel value at (i,j) of the third edge probability map; H represents the height of the image; and W represents the width of the image.

[0014] Furthermore, TV losses for: , In the formula, Y' is the color image generated by generator Gx; This represents the pixel value in the i-th row and j-th column of the c-th channel of the color image generated by generator Gx; This represents the pixel value in the (i+1)th row and jth column of the c-th channel of the color image generated by generator Gx; This represents the pixel value in the i-th row and j+1-th column of the c-th channel of the color image generated by generator Gx; The mask is a binary mask, with a mask value of 0 for key regions and 1 for background regions; the mask is obtained by using the contour information of organs or lesions output by the first edge detection network and the first semantic segmentation network; H represents the height of the image, W represents the width of the image, and C represents the number of channels of the image.

[0015] The beneficial effects of this invention are:

[0016] I. This medical image colorization method based on an improved recurrent generative adversarial network uses a bidirectional recurrent generative adversarial network as the basic network framework and incorporates a semantic segmentation network, an attention mechanism, and an edge detection network. This improves the medical semantic accuracy of medical images, the accuracy of key features such as organ or pathological edges, and the readability of medical images.

[0017] Second, the Med-BiCycleGAN-based medical image colorization method proposed in this invention relaxes the restrictions on the training dataset during network training, eliminating the need for a large training set of grayscale and color paired medical images. This further expands its application scope, making it suitable for situations where it is inconvenient to obtain paired grayscale-color medical image images.

[0018] Third, this invention employs multiple loss functions, among which perceptual loss ensures high-dimensional feature matching and improves the color fidelity of medical image colorization; TV loss smooths medical images and reduces noise; edge loss constrains the edge structure of key features in medical images to avoid color bleeding and distortion. Attached Figure Description

[0019] Figure 1 This is a flowchart illustrating the medical image colorization method based on an improved recurrent generative adversarial network according to an embodiment of the present invention.

[0020] Figure 2This is a schematic diagram illustrating the Med-BiCycleGAN network in the embodiment;

[0021] Figure 3 This is a schematic diagram illustrating the generator in the embodiment;

[0022] Figure 4 This is a schematic diagram illustrating the identification network in the embodiment;

[0023] Figure 5 This is a schematic diagram illustrating the U-net framework used in the semantic segmentation network of this embodiment. Detailed Implementation

[0024] The preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

[0025] The embodiment provides a medical image colorization method based on an improved recurrent generative adversarial network, such as Figure 1 This includes the following steps:

[0026] S1. Acquire grayscale medical images, preprocess the training dataset, and obtain the preprocessed training dataset.

[0027] In step S1, preprocessing includes noise reduction and normalization. The image dimensions and the number of channels in the training data are normalized. The color image (Y domain) has 3 channels, and the grayscale image (X domain) has 1 channel. To obtain a uniform number of channels, the number of channels in the grayscale image (X domain) is modified to 3, with each channel containing grayscale values.

[0028] S2. Construct a medical bidirectional recurrent generative adversarial network, namely the Med-BiCycleGAN network. The Med-BiCycleGAN network includes a generator network and a recognition network. The generator network introduces semantic segmentation, edge detection, and attention mechanisms to achieve mutual conversion between grayscale and color images of the input grayscale medical images and medical color images, and to generate color images and grayscale images accordingly. The recognition network includes a recognizer Dx and a recognizer D. Y The identifier Dx outputs the probability that the input grayscale medical image and the grayscale image generated by the generative network are true grayscale images; the identifier D... Y Based on the input medical color image and the color image generated by the generative network, the output is the probability of a true color image.

[0029] In step S2, the Med-BiCycleGAN network includes a generator Gx and a generator G. YThe network consists of a first attention mechanism module, a first semantic segmentation network, a first edge detection network, a second attention mechanism module, a second semantic segmentation network, a second edge detection network, a third semantic segmentation network, and a third edge detection network.

[0030] Generator Gx: Used to generate a grayscale image X' from the input raw color image Y;

[0031] Generator G Y : Used to generate a color image Y' from the input raw grayscale image X;

[0032] First semantic segmentation network: The original grayscale image X is labeled to guide the generator G. Y After semantically assigning the organ and tissue categories to which the pixels with colors belong, the first semantic mask feature map is output to the first attention mechanism module.

[0033] The first attention mechanism module: assigns weights to the input color image Y' and the first semantic mask feature map, and after weight adjustment, outputs the feature map. Figure 1 Give generator G Y ;

[0034] First edge detection network: After performing edge detection on the input original grayscale image X, it outputs the first edge probability map;

[0035] The second semantic segmentation network: After labeling the input original color image Y to guide the generator Gx to semantically assign the organ and tissue categories to which the pixels of the color belong, it outputs the second semantic mask feature map to the second attention mechanism module;

[0036] The second attention mechanism module assigns weights to the input grayscale image X' and the second semantic mask feature map, and after weight adjustment, outputs the feature map. Figure 2 Give the generator Gx;

[0037] Second edge detection network: After performing edge detection on the input original color image Y, it outputs a second edge probability map;

[0038] The third semantic segmentation network: After labeling the input color image Y' with the organ and tissue categories to which the pixels belong in semantic color assignment to the generator, the output is a third semantic mask feature map;

[0039] The third edge detection network performs edge detection on the input color image Y' and outputs a third edge probability map.

[0040] The Med-BiCycleGAN network is designed based on the Pix2PixGAN framework. Generator Gx and generator G... Y A bidirectional cyclic generator network is formed, with generator G. YThe generator Gx is used to convert medical images from grayscale to colorscale, and vice versa, enabling mutual conversion between grayscale and color images. The generator includes an encoder, residual blocks, and a decoder. The discriminator uses PatchGAN to obtain global and local features of the image.

[0041] like Figure 2 As shown, the bidirectional recurrent generative adversarial network (B-CycleGAN) consists of two unidirectional recurrent generative adversarial networks, including two generators (Gx and G). Y Two identifiers (Dx and D) Y Generator G Y The generator Gx transforms the real X-domain image X into a Y-domain image Y', or transforms the X-domain image X' generated by Gx into a Y-domain image Y'; the generator Gx transforms the real Y-domain image Y into an X-domain image X', or transforms the real Y-domain image Y into a Y-domain image X'. Y The generated Y-domain image Y' is transformed into an X-domain image X". The input to the discriminator DY is either the real Y-domain image Y or the Y-domain image Y' generated by GY, and the output is the probability PY that the input image belongs to the real Y-domain image. The input to the discriminator Dx is either the real X-domain image X or the X-domain image X' generated by Gx, and the output is the probability PX that the input image belongs to the real X-domain image. The function of this B-CycleGAN network is to realize the mutual conversion between X-domain images and Y-domain images (i.e., to realize the mutual conversion between grayscale images and color images).

[0042] Generator G Y The input original grayscale image X and the input original color image Y of generator Gx do not need to be paired; generator G Y The input is the original grayscale image X and the recognizer D. Y The input original color image Y of the generator Gx and the input original grayscale image X of the discriminator Dx do not need to be paired. Therefore, training a recurrent generative adversarial network (GAN) does not require paired grayscale and color images. This feature relaxes the restrictions on the training dataset and further expands its application scope, making it suitable for situations where paired grayscale and color images are unavailable. Different adversarial loss functions are used for different training data (from the perspective of whether they are paired or not). When the training data is paired, the adversarial loss function adds structural consistency loss; when the training data is unpaired, semantic loss is added to the adversarial loss function. The existence of the recurrent consistency loss function ensures the performance of the generator even for unpaired images.

[0043] Figure 3 This is a schematic diagram illustrating the generator in the embodiment. For example... Figure 3The generator is based on U-Net and employs a design approach of downsampling, upsampling, and skip connections. The classic U-Net network is an encoder-decoder structure. To better address the vanishing gradient problem and the loss of semantic information, a residual module is added between the encoder and decoder. The encoder consists of three convolutional layers, a data normalization layer, and an activation layer; the residual module consists of a convolutional layer, a data normalization layer, an activation layer, and an addition layer; the decoder consists of two convolutional transpose layers, a data normalization layer, an activation layer, and one convolutional transpose layer and an activation layer. The specific construction process is as follows:

[0044] The initial input image size is 512*512*3 pixels (512 pixels wide and 512 pixels high), with 3 channels. The first convolutional layer, conv1, has the following input parameters: 1 channel input, 64 channels output, 4*4 kernel size, padding=1, and stride=2. The formula for image size change after convolution is: Here, `scale` represents the image height or width, `kernel` represents the height or width of the convolution kernel, `pading` represents the number of padding pixels in the height or width, and `stride` represents the convolutional stride. Through the first convolutional layer `conv1`, the original `scale` changes from 512 to 256, i.e., 256*256, and the number of channels becomes 64. Next, a data normalization layer normalizes the image data, keeping the image size and number of channels unchanged at 256*256. Then, the ReLU function is used in the activation layer for non-linear activation of the image data, thereby alleviating the gradient vanishing problem and accelerating computation. Simultaneously, the sparse output of ReLU contributes to model regularization. The image size and number of channels remain unchanged in the activation layer.

[0045] Next, the image enters the second convolutional layer, conv2. The input parameters are: 64 channels for input, 128 channels for output, a 4x4 kernel size, padding=1, and stride=2. The resulting image size is 128x128 pixels with 128 channels. After passing through a data normalization layer and an activation layer, the image enters the third convolutional layer, conv3. The input parameters are: 128 channels for input, 256 channels for output, a 4x4 kernel size, padding=1, and stride=2. The resulting image size is 64x64 pixels with 256 channels.

[0046] Afterwards, the image passes through six residual blocks, each consisting of a convolutional layer, a data normalization layer, an activation layer, and an additive layer. The convolutional parameters for each convolutional layer are: 128 channels for input, 256 channels for output, a 3x3 kernel size, padding=1, and stride=1. The image size and number of channels remain unchanged.

[0047] The material then enters the decoder, which consists of three transposed convolutions, the process of which is the reverse of the convolutions. Each transposed convolutional layer is followed by a data normalization layer and an activation layer (ReLU). The last layer has no data normalization layer, and the activation layer uses the tanh activation function.

[0048] like Figure 4 The diagram shows that identifiers Dx and Dy use the same structure and settings. Taking the network structure of identifier Dx as an example, the specific explanation is as follows:

[0049] The input layer receives either a real image of the transformed image domain or an image of the transformed image domain generated by the generator. The input image size is 512*512 pixels with 3 channels. Next, it enters the fourth convolutional layer, conv21, with the following input parameters: 3 channels input, 64 channels output, 4*4 kernel size, padding=1, stride=2. The resulting image size is 256*256 pixels with 64 channels. After passing through a data normalization layer and an activation layer, it enters the fifth convolutional layer, conv22, with the following input parameters: 64 channels input, 128 channels output, 4*4 kernel size, padding=1, stride=2. The resulting image size is 128*128 pixels with 128 channels. After passing through a data normalization layer and an activation layer, the image enters the next sixth convolutional layer, conv23. The input parameters are: 128 channels for input and 256 channels for output. The kernel size is 4*4, padding=1, and stride=2. The resulting image size is 64*64 pixels with 256 channels. After passing through another data normalization layer and an activation layer, the image enters the final convolutional layer. The input parameters are: 256 channels for input and 256 channels for output. The kernel size is 1*1, padding=0, and stride=1.

[0050] like Figure 5The semantic segmentation network uses the U-net framework. First, the input image size is 512*512*3 pixels (512 pixels wide and 512 pixels high), with 3 channels. The first convolutional layer, convS1, has the following parameters: 1 channel input, 64 channels output, 3*3 kernel size, padding=1, and stride=1. After convolution, the image size becomes 510*510 pixels, and the number of channels becomes 64. After ReLU activation, the image enters the second convolutional layer, convS2, with the same parameters as convS1. The image size after convolution becomes 508*508 pixels, while the number of channels remains unchanged at 64. After ReLU activation, the image enters a 2*2 max pooling layer for downsampling, resulting in an image size of 254*254 pixels, while the number of channels remains unchanged at 64. Next, the image enters the third convolutional layer, convS3. The input parameters for convS3 are: 64 channels for input, 128 channels for output, a 3x3 kernel size, padding=1, and stride=1. After convolution, the image size becomes 252x252, and the number of channels becomes 128. After ReLU activation, the image enters the fourth convolutional layer, convS4, with the same parameters as convS3. The image size after convolution becomes 250x250, and the number of channels remains unchanged at 128. After ReLU activation, a 2x2 max pooling layer is used for downsampling, changing the image size to 125x125 while maintaining the 128 channel count. Then, the image enters the fifth convolutional layer, convS5. The input parameters for convS5 are: 128 channels for input, 256 channels for output, a 3x3 kernel size, padding=1, and stride=1. The image size after convolution becomes 123x123, and the number of channels becomes 256. After ReLU activation, the image enters the sixth convolutional layer, convS6, with the same convolutional parameters as convS5. The resulting image size becomes 121*121, while the number of channels remains unchanged at 256. After ReLU activation, a 2*2 max pooling layer is used for downsampling, reducing the image size to 60*60 while maintaining the 256 channels. Next, the image enters the seventh convolutional layer, convS7. The input parameters for convS5 are: 256 channels for input and 512 channels for output. The kernel size is 3*3, with padding=1 and stride=1. The resulting image size becomes 58*58, and the number of channels becomes 512. After ReLU activation, the image enters the eighth convolutional layer, convS8, with the same convolutional parameters as convS7. The resulting image size becomes 56*56, while maintaining the 512 channels. After ReLU activation, the image is downsampled into a 2*2 max pooling layer, resulting in an image size of 28*28 while maintaining the number of channels at 512.After downsampling, the image passes through two convolutional layers, convS9 and convS10, with a kernel size of 3*3, padding=1, stride=1, and 1024 channels. These layers are then activated by ReLU, resulting in a 24*24 image with 1024 channels. Downsampling is then performed in reverse order of upsampling, ultimately yielding a 324*324*2 semantic segmentation output image.

[0051] In step S2, the bidirectional recurrent generative adversarial network (GAN) architecture is further extended. An attention mechanism and a semantic segmentation branch are introduced into the generator module. The segmentation network guides the generator to focus on the anatomical structures of medical images, resolving semantic confusion and enhancing structural-semantic consistency. The semantic segmentation network provides "semantic prior constraints" for the colorization process, allowing the model to clearly define the location and category of different anatomical structures in the medical image, ensuring that the colorization result is both visually realistic and consistent with the anatomical structure of the medical image (avoiding color confusion or misalignment). The semantic segmentation network adopts a U-Net structure. A spatial-channel parallel attention mechanism is introduced into the semantic segmentation auxiliary branch. First, channel attention emphasizes edge features, then spatial attention expands the receptive field and utilizes contextual information to generate an enhanced semantic feature map, which is then fed into the colorization branch. This improves the accuracy of semantic segmentation, thereby optimizing the colorization effect. The edge detection network employs a Convolution, Transformer, and Operator (CTO) network architecture. By combining a Convolutional Neural Network (CNN), a Visual Transformer (ViT), and explicit boundary detection operations, it performs edge detection on both the original medical images and the generator-generated images, achieving high-precision image segmentation while maintaining an optimal balance between accuracy and efficiency. It follows a standard encoder-decoder segmentation paradigm, where the encoder network uses a CNN backbone to capture local semantic information and a lightweight ViT auxiliary network to integrate long-range dependencies. Boundary masks obtained through specialized boundary detection operations such as Sobel are used as explicit supervision to guide the decoding learning process, enhancing the boundary learning capability.

[0052] S3. Initialize the network parameters of the Med-BiCycleGAN network.

[0053] In step S3, the initialized parameters include: generator encoder depth, number of generator input layer channels, number of residual blocks, discriminator network depth, number of discriminator input layer channels, maximum training epochs, minimum training batch size, learning rate, gradient decay factor, squared gradient decay factor, model hold frequency, cyclic loss weights, number of test samples, and network loss weights. In a specific example, the generator encoder depth is 3, the number of generator input layer channels is 64, the number of residual blocks is 6, the discriminator network depth is 4, the number of discriminator input layer channels is 64, the maximum training epochs are 1000, the minimum training batch size is 1, the learning rate is 0.002, the gradient decay factor is 0.85, the squared gradient decay factor is 0.999, the model hold frequency is 100, the cyclic loss weights are 10, the number of test samples is 50, and the network weight threshold parameters are initialized using a random method. The network loss weights are set as follows: Cycle Consistency Loss weight λ1 is 10 (the core constraint of network training); Perception Loss weight λ2 is 0.4 (values between 0.1 and 0.5); Edge Loss weight λ3 is 0.6 (values between 0.5 and 1); TV Weight Loss λ4 is 0.03 (values between 0.01 and 0.5, balancing smoothness and detail).

[0054] S4. Input the preprocessed training dataset into the Med-BiCycleGAN network, calculate the forward output of each branch and layer of the Med-BiCycleGAN medical bidirectional recurrent generative adversarial network, and calculate the total loss and gradient of the Med-BiCycleGAN network. The total loss includes adversarial loss, recurrent consistency loss, perceptual loss, edge loss and TV loss.

[0055] In step S4, the total loss L of the Med-BiCycleGAN network is calculated. Total : , In the formula, Adversarial loss for generator Gx; For generator G Y The losses incurred in the fight against it; The cyclic consistency loss for pixel-feature fusion; To perceive loss; For edge loss; λ1 represents the weighting coefficient of the cycle consistency loss; λ2 represents the weighting coefficient of the perception loss; λ3 represents the weighting coefficient of the edge loss; and λ4 represents the weighting coefficient of the TV loss.

[0056] Adversarial loss of generator Gx Adversarial loss against generator GY :

[0057]

[0058]

[0059] In the formula, It is the mathematical expectation of the X domain; It is the mathematical expectation of the Y domain; The identifier represents the original color image Y as input. The output, Represents generator When the output is used as input, the identifier The output; The identifier represents the original grayscale image X as input. The output, Represents generator When the output is used as input, the identifier The output.

[0060] To ensure that the cyclic consistency process not only guarantees "pixel / structure consistency" but also maintains "medical semantic consistency"—that is, that key semantic regions such as organs and lesions are not lost or confused in the "grayscale → color → grayscale" and "color → grayscale → color" cycles, and to avoid generating invalid results that "appear to be consistent in pixels but have semantic errors"—semantic loss is incorporated into the cyclic consistency loss. This is achieved through a pixel-feature fusion cyclic consistency loss L. cycel-semantic The expression is as follows.

[0061] Pixel-feature fusion cycle consistency loss L cycel-semantic : ,

[0062] In the formula, This results in pixel-level cycle consistency loss. This is the feature-level cycle consistency loss; This represents a generator that transforms a Y-domain image into an X-domain image. This represents a generator that transforms an X-domain image into a Y-domain image. It is the mathematical expectation of the X domain; It is the mathematical expectation of the Y domain; Represents the L1 loss function; Represents the L2 loss function; It represents the feature information of each domain image.

[0063] Perceived loss The high-level semantic features of the generated color image and the original color image are obtained from the semantic segmentation network. The difference between the two in the feature space is calculated, rather than the pixel-level difference, so that the generated result is more consistent with human visual perception (such as structural consistency and texture rationality). The perceptual loss consists of content loss and style loss.

[0064] Perceived loss :

[0065]

[0066] In the formula, , These are the weighting coefficients used to balance content loss and style loss, and Content loss L Content The mean squared error (MSE) is used to calculate the difference in the feature maps of the l-th layer of the second semantic segmentation network, which preserves structural information while avoiding interference from low-level pixel noise, resulting in a content loss of L. Content The expression is: , In the formula, This represents the second semantic mask feature map. This represents the third semantic mask feature map, with a feature map dimension of . C l H represents the number of channels. l W represents the feature map height. l The width of the feature map;

[0067] Style loss L Style To measure style consistency using the Gram matrix difference of the feature maps at layer l of the semantic segmentation network, the expression is: , In the formula, Gram matrix The definition of is: ,in, This represents the second semantic mask feature map. This represents the third semantic mask feature map, with a feature map dimension of . C l H represents the number of channels. l W represents the feature map height. l This represents the width of the feature map.

[0068] In medical images such as CT and MRI, edges often correspond to key clinical information such as organ boundaries and lesion contours. Edge loss can constrain the model to avoid destroying the edge structure of the original grayscale image during colorization, while ensuring that the edge color transitions in the generated image are natural, without blurring or distortion. Edge loss The expression is as follows.

[0069] Edge loss L edge : , In the formula, λ is the weight, which takes a value between 0.3 and 0.5; Edge loss of the original color image Y and the color image Y' generated by the generator Gx for: , In the formula, This represents the pixel value at (i,j) in the second edge probability map; ω represents the pixel value at (i,j) of the third edge probability map; ω represents the edge weight, which is set to 10; H represents the height of the image, and W represents the width of the image; Edge loss of the original grayscale image X and the color image Y' generated by the generator Gx for: , In the formula, This represents the pixel value at (i,j) in the first edge probability map; ω represents the pixel value at (i,j) of the third edge probability map; ω represents the edge weight, which is set to 10; H represents the height of the image, and W represents the width of the image.

[0070] The main function of the TV loss function is to penalize the differences between adjacent pixels in an image, thereby making the image smoother and more continuous, and avoiding the generation of an overly noisy or discontinuous image. However, critical areas in medical images (such as lesions and organ boundaries) need to retain clear structure, while background areas can be smoothed appropriately. By introducing a spatial weight mask, the TV loss can apply smoothing constraints only to non-critical areas, avoiding damage to important structures.

[0071] TV loss for: ,

[0072] In the formula, Y' is the color image generated by generator Gx; This represents the pixel value in the i-th row and j-th column of the c-th channel of the color image generated by generator Gx; This represents the pixel value in the (i+1)th row and jth column of the c-th channel of the color image generated by generator Gx; This represents the pixel value in the i-th row and j+1-th column of the c-th channel of the color image generated by generator Gx; The mask is a binary mask, with a mask value of 0 for key regions and 1 for background regions; the mask is obtained by using the contour information of organs or lesions output by the first edge detection network and the first semantic segmentation network; H represents the height of the image, W represents the width of the image, and C represents the number of channels of the image.

[0073] In step S4, a composite loss function is employed, integrating adversarial loss, semantically fused cycle consistency loss, perceptual loss, TV loss, and edge loss. Edge loss and adversarial loss ensure consistency of key edges between the medical image and the generated image. Perceptual loss guarantees high-dimensional feature matching and improves color fidelity; TV loss smooths the image and reduces noise; edge loss constrains edge structures, preventing color bleeding and distortion. This approach effectively addresses issues in medical image colorization, such as inaccurate color or color bleeding, image noise, structural distortion, and semantic confusion.

[0074] S5. Update the weight thresholds and parameters of each layer in the Med-BiCycleGAN network.

[0075] S6. Determine whether the Med-BiCycleGAN network meets the iteration termination condition. If not, proceed to step S4; otherwise, stop network training and obtain the trained Med-BiCycleGAN network.

[0076] In step S6, the iteration termination condition is determined by whether the total loss value is less than a set threshold or whether the number of training iterations has reached the maximum value.

[0077] S7. Input the medical image to be converted into the trained Med-BiCycleGAN network to obtain a colorized image.

[0078] This medical image colorization method based on an improved recurrent generative adversarial network uses a bidirectional recurrent generative adversarial network as the basic network framework and incorporates a semantic segmentation network, an attention mechanism, and an edge detection network. This improves the medical semantic accuracy of medical images, the accuracy of key features such as organ or pathological edges, and the readability of medical images.

[0079] The Med-BiCycleGAN-based medical image colorization method proposed in this invention relaxes the restrictions on the training dataset during network training, eliminating the need for a large training set of grayscale and color paired medical images. This further expands its application scope and makes it suitable for situations where it is inconvenient to obtain paired grayscale-color medical image images.

[0080] The improved Med-BiCycleGAN network proposed in this invention solves the problems of inaccurate color, image noise, structural distortion, and semantic confusion in existing medical image colorization methods.

[0081] This invention employs multiple loss functions, including perceptual loss to ensure high-dimensional feature matching and improve the color fidelity of medical image colorization; TV loss to smooth medical images and reduce noise; and edge loss to constrain the edge structure of key features in medical images and avoid color bleeding and distortion.

[0082] The above are merely preferred embodiments of the present invention and do not constitute any limitation on the present invention. Any equivalent substitutions or modifications made by those skilled in the art to the technical solutions and content disclosed in the present invention without departing from the scope of the present invention shall be deemed to have remained within the scope of protection of the present invention.

Claims

1. A medical image colorization method based on an improved recurrent generative adversarial network, characterized in that: Includes the following steps, S1. Obtain grayscale medical images and medical color images to obtain a training dataset. Preprocess the training dataset to obtain a preprocessed training dataset. S2. Construct a medical bidirectional recurrent generative adversarial network, namely the Med-BiCycleGAN network. The Med-BiCycleGAN network includes a generator network and a recognition network. The generator network introduces semantic segmentation, edge detection, and attention mechanisms to achieve mutual conversion between grayscale and color images of the input grayscale medical images and medical color images, and to generate color images and grayscale images accordingly. The recognition network includes a recognizer Dx and a recognizer D. Y The identifier Dx outputs the probability that the input grayscale medical image and the grayscale image generated by the generative network are true grayscale images; the identifier D... Y Based on the input medical color image and the color image generated by the generative network, the output is the probability of being a true color image; S3. Initialize the network parameters of the Med-BiCycleGAN network; S4. Input the preprocessed training dataset into the Med-BiCycleGAN network, calculate the forward output of each branch and layer of the Med-BiCycleGAN medical bidirectional recurrent generative adversarial network, and calculate the total loss and gradient of the Med-BiCycleGAN network; where the total loss includes adversarial loss, recurrent consistency loss, perceptual loss, edge loss and TV loss. S5. Update the weight thresholds and parameters of each layer in the Med-BiCycleGAN network; S6. Determine whether the Med-BiCycleGAN network meets the iteration termination condition. If not, go to step S4; otherwise, stop network training and obtain the trained Med-BiCycleGAN network. S7. Input the medical image to be converted into the trained Med-BiCycleGAN network to obtain a colorized image.

2. The medical image colorization method based on an improved recurrent generative adversarial network as described in claim 1, characterized in that: In step S2, the Med-BiCycleGAN network includes a generator Gx and a generator G. Y The network consists of a first semantic segmentation network, a first attention mechanism module, a first edge detection network, a second semantic segmentation network, a second attention mechanism module, a second edge detection network, a third semantic segmentation network, and a third edge detection network. Generator Gx: Used to generate a grayscale image X' from the input raw color image Y; Generator G Y Used to generate a color image Y' from the input raw grayscale image X; First semantic segmentation network: The original grayscale image X is labeled to guide the generator G. Y After semantically assigning the organ and tissue categories to which the pixels with colors belong, the first semantic mask feature map is output to the first attention mechanism module. The first attention mechanism module: It assigns weights to the input color image Y' and the first semantic mask feature map, and after weight adjustment, outputs the feature map 1 to the generator G. Y ; First edge detection network: After performing edge detection on the input original grayscale image X, it outputs the first edge probability map; The second semantic segmentation network: After labeling the input original color image Y to guide the generator Gx to semantically assign the organ and tissue categories to which the pixels of the color belong, it outputs the second semantic mask feature map to the second attention mechanism module; The second attention mechanism module assigns weights to the input grayscale image X' and the second semantic mask feature map, and after weight adjustment, outputs feature map 2 to the generator Gx; Second edge detection network: After performing edge detection on the input original color image Y, it outputs a second edge probability map; The third semantic segmentation network: After labeling the input color image Y' to guide the generator to assign the organ and tissue categories to which the pixels belong according to semantics, the output is a third semantic mask feature map; The third edge detection network performs edge detection on the input color image Y' and outputs a third edge probability map.

3. The medical image colorization method based on an improved recurrent generative adversarial network as described in claim 1, characterized in that: In step S3, the initialized parameters include: generator encoder depth, number of generator input layer channels, number of residual blocks, discriminator network depth, number of discriminator input layer channels, maximum training epoch, minimum training batch size, learning rate, gradient decay factor, squared gradient decay factor, model hold frequency, cyclic loss weights, number of test samples, and network loss weights.

4. The medical image colorization method based on an improved recurrent generative adversarial network as described in claim 2, characterized in that, In step S4, the total loss L of the Med-BiCycleGAN network is calculated. Total : ， In the formula, Adversarial loss for generator Gx; For generator G Y The losses incurred in the fight against it; The cyclic consistency loss for pixel-feature fusion; To perceive loss; For edge loss; λ1 represents the weighting coefficient of the cycle consistency loss; λ2 represents the weighting coefficient of the perception loss; λ3 represents the weighting coefficient of the edge loss; and λ4 represents the weighting coefficient of the TV loss.

5. The medical image colorization method based on an improved recurrent generative adversarial network as described in claim 4, characterized in that, Adversarial loss of generator Gx With generator G Y The losses of the confrontation : ， In the formula, It is the mathematical expectation of the X domain; It is the mathematical expectation of the Y domain; The identifier represents the original color image Y as input. The output, Represents generator When the output is used as input, the identifier The output; The identifier represents the original grayscale image X as input. The output, Represents generator When the output is used as input, the identifier The output.

6. The medical image colorization method based on an improved recurrent generative adversarial network as described in claim 4, characterized in that, Pixel-feature fusion cycle consistency loss L cycel-semantic : ， In the formula, This results in pixel-level cycle consistency loss. This is the feature-level cycle consistency loss; This represents a generator that transforms a Y-domain image into an X-domain image. This represents a generator that transforms an X-domain image into a Y-domain image. It is the mathematical expectation of the X domain; It is the mathematical expectation of the Y domain; Represents the L1 loss function; Represents the L2 loss function; It represents the feature information of each domain image.

7. The medical image colorization method based on an improved recurrent generative adversarial network as described in claim 4, characterized in that, Perceived loss : ， In the formula, These are the weighting coefficients used to balance content loss and style loss, and Content loss L Content The difference between the feature maps of the l-th layer of the second semantic segmentation network is calculated using the mean squared error (MSE). ， In the formula, This represents the second semantic mask feature map. This represents the third semantic mask feature map, with a feature map dimension of . C l H represents the number of channels. l W represents the feature map height. l The width of the feature map; Style loss L Style To measure style consistency using the Gram matrix difference of the feature maps at layer l of the semantic segmentation network, the expression is: ， In the formula, Gram matrix The definition of is: ,in, This represents the second semantic mask feature map. This represents the third semantic mask feature map, with a feature map dimension of . C l H represents the number of channels. l W represents the feature map height. l This represents the width of the feature map.

8. The medical image colorization method based on an improved recurrent generative adversarial network as described in claim 4, characterized in that, Edge loss L edge : ， In the formula, λ is the weight; Edge loss of the original color image Y and the color image Y' generated by the generator Gx for: ， In the formula, This represents the pixel value at (i,j) in the second edge probability map; ω represents the pixel value at (i,j) of the third edge probability map; H represents the image height; and W represents the image width. Edge loss of the original grayscale image X and the color image Y' generated by the generator Gx for: ， In the formula, This represents the pixel value at (i,j) in the first edge probability map; ω represents the pixel value at (i,j) of the third edge probability map; H represents the height of the image; and W represents the width of the image.

9. The medical image colorization method based on an improved recurrent generative adversarial network as described in claim 4, characterized in that, TV loss for: ， In the formula, Y' is the color image generated by generator Gx; This represents the pixel value in the i-th row and j-th column of the c-th channel of the color image generated by generator Gx; This represents the pixel value in the (i+1)th row and jth column of the c-th channel of the color image generated by generator Gx; This represents the pixel value in the i-th row and j+1-th column of the c-th channel of the color image generated by generator Gx; The mask is a binary mask, with a mask value of 0 for key regions and 1 for background regions; the mask is obtained by using the contour information of organs or lesions output by the first edge detection network and the first semantic segmentation network; H represents the height of the image, W represents the width of the image, and C represents the number of channels of the image.