A coronary artery and pericoronary adipose tissue segmentation system based on self-supervised learning
The coronary artery and pericoronary adipose tissue segmentation system, which utilizes self-supervised learning to pre-train a model on unannotated data and combines it with traditional algorithms, solves the dependence on large-scale labeled datasets in existing technologies, achieves efficient coronary artery and pericoronary adipose tissue segmentation, reduces costs and improves segmentation efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HARBIN MEDICAL UNIVERSITY
- Filing Date
- 2024-12-04
- Publication Date
- 2026-06-26
AI Technical Summary
Existing supervised learning models rely on large-scale labeled datasets, resulting in high segmentation costs for coronary arteries and pericoronary adipose tissue, especially when CCTA image annotations are scarce, making efficient segmentation difficult.
A self-supervised learning-based coronary artery and pericoronary adipose tissue segmentation system is developed, comprising modules for CCTA image acquisition, preprocessing, segmentation mask extraction, Transformer encoder, image reconstruction, and feature comparison. The system pre-trains the model on unannotated data through self-supervised learning and fine-tunes it on a small amount of annotated data, then combines it with traditional image processing algorithms for segmentation.
It reduces reliance on large-scale labeled datasets, lowers network training costs, and constructs an efficient segmentation framework that can be applied to the segmentation of coronary arteries and surrounding adipose tissue in clinical practice.
Smart Images

Figure CN119693391B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of medical image analysis technology, specifically relating to a coronary artery and pericoronary adipose tissue segmentation system based on self-supervised learning. Background Technology
[0002] Coronary CT angiography (CCTA), as a non-invasive and information-rich diagnostic tool, has been widely used in the assessment of coronary artery lesions. Pericoronary adipose tissue (PCAT) attenuation is considered a novel biomarker associated with coronary artery inflammation and has been shown to be correlated with the severity of coronary artery lesions. PCAT attenuation assessed by CCTA is becoming a predictive indicator of acute coronary syndromes. Therefore, reliable automated coronary artery segmentation not only aids in anatomical assessment but also supports subsequent quantification of PCAT attenuation.
[0003] The development of deep learning has introduced a new paradigm for medical image segmentation. However, the dependence of supervised learning models on large-scale labeled datasets poses a significant bottleneck, especially given the scarcity and high cost of professional annotations for CCTA images. Self-supervised learning, as an emerging branch of unsupervised learning, extracts feature representations from unlabeled data and then fine-tunes them using a small amount of annotated data tailored to the specific task. Therefore, how to efficiently segment coronary arteries and pericoronary adipose tissue using self-supervised learning is a pressing issue that needs to be addressed. Summary of the Invention
[0004] The purpose of this invention is to address the problem that existing supervised learning models rely on large-scale labeled datasets, and to propose a self-supervised learning-based coronary artery and pericoronary adipose tissue segmentation system.
[0005] The technical solution adopted by this invention to solve the above-mentioned technical problems is: a coronary artery and pericoronary adipose tissue segmentation system based on self-supervised learning, the system comprising a CCTA image acquisition module, a CCTA image preprocessing module, a CCTA image segmentation mask extraction module, a Transformer encoder, an image reconstruction module, a feature comparison module, a coronary artery segmentation module, and a pericoronary adipose tissue segmentation module; wherein:
[0006] The CCTA image acquisition module is used to acquire a CCTA image training set, which includes labeled CCTA images and unlabeled CCTA images.
[0007] The CCTA image preprocessing module is used to preprocess each CCTA image in the acquired training set to obtain preprocessed CCTA images.
[0008] The CCTA image segmentation mask extraction module is used to extract the segmentation mask from the acquired labeled CCTA images, and obtain the segmentation mask for each labeled CCTA image.
[0009] The Transformer encoder, image reconstruction module, and feature comparison module are jointly pre-trained based on each pre-processed CCTA image;
[0010] The coronary artery segmentation module and the pre-trained Transformer encoder are jointly trained using the segmentation masks of each labeled CCTA image;
[0011] After preprocessing the CCTA image to be segmented using the CCTA image preprocessing module, the preprocessed CCTA image to be segmented is segmented into coronary arteries using the trained Transformer encoder and coronary artery segmentation module to obtain the coronary artery segmentation result of the CCTA image to be segmented.
[0012] The pericoronary adipose tissue segmentation module is used to process the coronary artery segmentation results of the CCTA image to be segmented, and obtain the pericoronary adipose tissue segmentation results of the CCTA image to be segmented.
[0013] Furthermore, the CCTA image preprocessing module crops and standardizes each CCTA image in the training set, unifying each CCTA image to the same size.
[0014] Furthermore, the Transformer encoder includes 12 encoding units connected in series;
[0015] For any preprocessed CCTA image, the preprocessed CCTA image is uniformly divided into non-overlapping patches, each patch having a size of P×P×P, where P is a predefined patch resolution.
[0016] The obtained patches are flattened and passed through a linear projection layer. The output of the linear projection layer is then passed through a position embedding layer, that is, the position embedding is performed on the output of the linear projection layer to obtain the position embedding result for each patch.
[0017] The position embedding results of all patches are then used as input to the Transformer encoder. Within the Transformer encoder, the position embedding results of all patches are sequentially passed through 12 cascaded coding units, and the feature maps output by the 3rd, 6th, 9th, and 12th coding units are stored. That is, for each patch, a set of feature maps is obtained.
[0018] Furthermore, each coding unit of the Transformer encoder includes a first normalization layer, a multi-head attention layer, a second normalization layer, and a multilayer perceptron;
[0019] Taking the first coding unit as an example
[0020] Within the first coding unit, the input features first pass through the first normalization layer, and then the output of the first normalization layer is used as the input to the multi-head attention layer.
[0021] Then, the output of the multi-head attention layer is added to the input features of the first coding unit to obtain the sum A;
[0022] The summed result A then passes through a second normalization layer, and the output of the second normalization layer is used as the input of the multilayer perceptron.
[0023] Then, the output of the multilayer perceptron is added to A to obtain the summed result B, which is then used as the output of the first coding unit.
[0024] Furthermore, the Transformer encoder, image reconstruction module, and feature comparison module are jointly pre-trained based on the pre-processed CCTA images. The specific pre-training process is as follows:
[0025] Step 1: Initialize the pre-training rounds t = 1;
[0026] Step 2: Flatten the patches corresponding to each preprocessed CCTA image, and then pass each patch through a linear projection layer, a location embedding layer, and a Transformer encoder. The Transformer encoder outputs a set of feature maps for each patch.
[0027] Step 3: For any patch, denote the patch as z. i , store z i The corresponding set of feature maps and z i As input to the image reconstruction module;
[0028] The working process within the image reconstruction module is as follows:
[0029] (1)z i The layers sequentially pass through the first convolutional layer, the first batch normalization (BN) layer, the first ReLU activation function layer, the second convolutional layer, the second BN layer, and the second ReLU activation function layer; that is...
[0030] z i First, the input goes through the first convolutional layer, and the output of the first convolutional layer is used as the input of the first batch normalization (BN) layer.
[0031] The output of the first BN layer is used as the input of the first ReLU activation function layer;
[0032] The output of the first ReLU activation function layer is used as the input of the second convolutional layer;
[0033] Use the output of the second convolutional layer as the input of the second batch normalization layer;
[0034] The output of the second BN layer is used as the input of the second ReLU activation function layer, and the output of the second ReLU activation function layer is denoted as a;
[0035] (2) The feature map output by the third encoding unit passes through the first deconvolution subunit, the second deconvolution subunit, and the third deconvolution subunit in sequence; that is...
[0036] The feature map output by the third coding unit first passes through the first deconvolution subunit, and the output of the first deconvolution subunit is used as the input of the second deconvolution unit.
[0037] The output of the second deconvolution subunit is used as the input of the third deconvolution subunit;
[0038] Let b be the output of the third deconvolution subunit;
[0039] (3) The feature map output by the sixth coding unit passes through the fourth and fifth deconvolution sub-units in sequence; that is...
[0040] The feature map output by the sixth coding unit first passes through the fourth deconvolution subunit, and the output of the fourth deconvolution subunit is used as the input of the fifth deconvolution unit.
[0041] Let c be the output of the fifth deconvolution subunit;
[0042] (4) The feature map output by the ninth coding unit is passed through the sixth deconvolution subunit, and the output of the sixth deconvolution subunit is denoted as d;
[0043] (5) The feature map output by the twelfth coding unit is passed through the first upsampling unit, and the output of the first upsampling unit is denoted as e;
[0044] (6) After concatenating the outputs e and d, the concatenated result is passed sequentially through the third convolutional layer, the third batch normalization (BN) layer, the third ReLU activation function layer, the fourth convolutional layer, the fourth BN layer, and the fourth ReLU activation function layer, i.e.
[0045] The concatenated result first passes through the third convolutional layer, and the output of the third convolutional layer is used as the input of the third batch normalization (BN) layer.
[0046] The output of the third BN layer is used as the input of the third ReLU activation function layer;
[0047] The output of the third ReLU activation function layer is used as the input of the fourth convolutional layer;
[0048] Use the output of the fourth convolutional layer as the input of the fourth batch normalization layer;
[0049] Use the output of the fourth BN layer as the input of the fourth ReLU activation function layer;
[0050] The output of the fourth ReLU activation function layer is then passed through the second upsampling unit, and the output of the second upsampling unit is denoted as i'.
[0051] (7) After concatenating the outputs c and i', the concatenated result is passed sequentially through the fifth convolutional layer, the fifth batch normalization (BN) layer, the fifth ReLU activation function layer, the sixth convolutional layer, the sixth BN layer, and the sixth ReLU activation function layer, i.e.
[0052] The concatenated result first passes through the fifth convolutional layer, and the output of the fifth convolutional layer is used as the input of the fifth batch normalization (BN) layer.
[0053] Use the output of the fifth BN layer as the input of the fifth ReLU activation function layer;
[0054] The output of the fifth ReLU activation function layer is used as the input of the sixth convolutional layer;
[0055] Use the output of the sixth convolutional layer as the input of the sixth batch normalization layer;
[0056] Use the output of the sixth BN layer as the input of the sixth ReLU activation function layer;
[0057] The output of the sixth ReLU activation function layer is then passed through the third upsampling unit, and the output of the third upsampling unit is denoted as j'.
[0058] (8) After concatenating the outputs b and j', the concatenated result is passed sequentially through the seventh convolutional layer, the seventh batch normalization (BN) layer, the seventh ReLU activation function layer, the eighth convolutional layer, the eighth BN layer, and the eighth ReLU activation function layer, i.e.
[0059] The concatenated result first passes through the seventh convolutional layer, and the output of the seventh convolutional layer is used as the input of the seventh batch normalization (BN) layer.
[0060] Use the output of the seventh BN layer as the input of the seventh ReLU activation function layer;
[0061] Use the output of the seventh ReLU activation function layer as the input of the eighth convolutional layer;
[0062] Use the output of the eighth convolutional layer as the input of the eighth batch normalization layer;
[0063] Use the output of the eighth BN layer as the input of the eighth ReLU activation function layer;
[0064] The output of the eighth ReLU activation function layer is then passed through the fourth upsampling unit, and the output of the fourth upsampling unit is denoted as k.
[0065] (9) After concatenating the outputs k and a, the concatenated result is passed sequentially through the ninth convolutional layer, the ninth batch normalization (BN) layer, the ninth ReLU activation function layer, the tenth convolutional layer, the tenth BN layer, and the tenth ReLU activation function layer, i.e.
[0066] The concatenated result first passes through the ninth convolutional layer, and the output of the ninth convolutional layer is used as the input of the ninth batch normalization (BN) layer.
[0067] Use the output of the ninth BN layer as the input of the ninth ReLU activation function layer;
[0068] Use the output of the ninth ReLU activation function layer as the input of the tenth convolutional layer;
[0069] Use the output of the tenth convolutional layer as the input of the tenth batch normalization layer;
[0070] Use the output of the 10th BN layer as the input of the 10th ReLU activation function layer;
[0071] Then, the output of the 10th ReLU activation layer is used as the input of the 11th convolutional layer, and the output of the 11th convolutional layer is used as z. i Reconstruction results
[0072] Step 4: Store z i The corresponding set of feature maps is input into the feature comparison module. Within the feature comparison module, z i The corresponding feature map is sequentially passed through a global average pooling layer and a linear projection layer to obtain z. i Feature embedding z i ′;
[0073] Step 5: Based on the reconstruction results and feature embedding z′ i Calculate the total loss function L, perform backpropagation based on the total loss function L, and adjust the network parameters in the Transformer encoder, image reconstruction module, and feature comparison module.
[0074] Stop the pre-training of round t when the total loss function L converges, and then continue to execute step 6;
[0075] Step 6: Determine if the maximum number of pre-training rounds has been reached;
[0076] If this is achieved, the final pre-training result is obtained;
[0077] If the target is not met, then after re-patching each preprocessed CCTA image, set t = t + 1 and return to step 2.
[0078] Furthermore, the first deconvolution subunit includes a deconvolution layer, a convolution layer, a BN layer and a ReLU activation function layer in sequence. That is, in the first deconvolution subunit, the input feature map first passes through the deconvolution layer, and then the output of the deconvolution layer is used as the input of the convolution layer.
[0079] Use the output of the convolutional layer as the input of the BN layer;
[0080] Use the output of the BN layer as the input to the ReLU activation function layer;
[0081] The output of the ReLU activation function layer is used as the output of the first deconvolution subunit.
[0082] Furthermore, the specific process of step 3 is as follows:
[0083] z i ′=h(g(f))
[0084] Where f is z i The feature map, g(f) is the feature vector output by the global average pooling layer, h represents the linear projection layer, and z i ′ is the feature embedding output by the linear projection layer.
[0085] Furthermore, the total loss function L is:
[0086] L = L recon +L contrast
[0087] Among them, L recon It is a reconstruction loss, L contrast It is patch contrast loss;
[0088]
[0089] Where M is the set of all patches, and ||·||2 is the L2 norm;
[0090]
[0091] Where, z′ i It is z i Feature embedding, z + Is with z i Feature embedding from another patch of the same CCTA image, sim(z) i ,z + ) represents z i With z +The similarity between them, N(i) represents the difference between z and z. i The set of all patches except z′, sim(z′) i , z′ j ) represents z′ i With z′ j The similarity between them, z′ j It is z j Feature embedding.
[0092] Furthermore, the coronary artery segmentation module and the pre-trained Transformer encoder are jointly trained using the segmentation masks of each labeled CCTA image. The specific training process is as follows:
[0093] Step 1: Initialize the training rounds p = 1;
[0094] Step 2: Flatten the patches corresponding to each preprocessed CCTA image, and then pass each patch through a linear projection layer, a location embedding layer, and a Transformer encoder. The Transformer encoder outputs a set of feature maps for each patch.
[0095] Step 3: For any patch, denote the patch as z. i , in z i The corresponding set of feature maps includes the feature maps output by the third, sixth, ninth, and twelfth coding units, as well as z. i As input to the coronary artery segmentation module;
[0096] Step 4: Calculate the segmentation loss L′ based on the coronary artery segmentation results predicted by the coronary artery segmentation module and the segmentation mask. Perform backpropagation based on the segmentation loss L′ and adjust the network parameters in the Transformer encoder and the coronary artery segmentation module until the segmentation loss L′ converges, then stop training in the p-th round.
[0097] The segmentation loss L′ is:
[0098] L′=L CE +L Dice
[0099] Among them, L CE It is the cross-entropy loss, L Dice It is a Dice loss;
[0100]
[0101] Where N is the set of all patches corresponding to the labeled CCTA image, and K is the number of pixels in a patch. It is zi The k-th pixel in the segmentation mask corresponds to the ground truth label. It is z i The prediction result corresponding to the k-th pixel in the image;
[0102]
[0103] Step 5: Determine if the maximum number of training rounds has been reached;
[0104] If this is achieved, the final training result is obtained;
[0105] If the target is not met, then after re-patching each preprocessed CCTA image, set p = p + 1 and return to step two.
[0106] Furthermore, the working process of the pericoronal adipose tissue segmentation module is as follows:
[0107] Step 1: Extract the coronary artery centerline from the coronary artery foreground in the segmentation result using a skeletonization algorithm;
[0108] Step 2: Based on the extracted coronary artery centerline, draw the normal vector of the coronary artery centerline through each point on the coronary artery centerline. The obtained normal vector is the direction of coronary artery lumen expansion at the corresponding point on the coronary artery centerline. Then calculate the distance from each point along the normal vector to the wall of the coronary artery lumen, which is the coronary radius at each point on the coronary artery centerline.
[0109] Step 3: Dilate the coronary arteries so that the radius of the coronary artery at each point on the center line of the dilated coronary artery doubles.
[0110] Step 4: Use a Gaussian filtering algorithm to smooth the dilated coronary arteries, and then perform threshold segmentation on the smoothing result. That is, pixels with gray values in the range of [-190HU, -30HU] are taken as the pixels corresponding to the pericoronal adipose tissue, and the pericoronal adipose tissue segmentation result is obtained.
[0111] The beneficial effects of this invention are:
[0112] This invention acquires CCTA images using coronary CT angiography and employs a self-supervised learning approach to pre-train a coronary artery segmentation model on unannotated CCTA images, enabling the model to learn rich feature representations. Subsequently, the coronary artery segmentation model is fine-tuned on annotated data, thereby reducing reliance on large-scale labeled datasets. The pericoronary fat segmentation algorithm uses traditional image processing algorithms, requiring no annotation and saving network training costs. This invention constructs an efficient segmentation framework that solves the high cost problem required in existing coronary artery and surrounding adipose tissue segmentation work, enabling its widespread application in clinical practice. Attached Figure Description
[0113] Figure 1 This is a framework diagram of the training process of a self-supervised learning-based coronary artery and pericoronary adipose tissue segmentation system according to the present invention.
[0114] Figure 2 This is a schematic diagram of the Transformer encoder and image reconstruction module of the present invention;
[0115] Figure 3 This is a flowchart of the pericoronal adipose tissue segmentation algorithm. Detailed Implementation
[0116] Specific implementation method one: Combining Figure 1 This embodiment describes a self-supervised learning-based coronary artery and pericoronary adipose tissue segmentation system. The system includes a CCTA image acquisition module, a CCTA image preprocessing module, a CCTA image segmentation mask extraction module, a Transformer encoder, an image reconstruction module, a feature comparison module, a coronary artery segmentation module, and a pericoronary adipose tissue segmentation module; wherein:
[0117] The CCTA image acquisition module is used to acquire a CCTA image training set, which includes labeled CCTA images and unlabeled CCTA images.
[0118] The CCTA image preprocessing module is used to preprocess each CCTA image in the acquired training set to obtain preprocessed CCTA images.
[0119] The CCTA image segmentation mask extraction module is used to extract the segmentation mask from the acquired labeled CCTA images, and obtain the segmentation mask for each labeled CCTA image.
[0120] For any labeled CCTA image, let y be the segmentation mask of that labeled CCTA image. gt ∈R H×w×C C is the number of foreground categories to be segmented in the CCTA image, which is set to 1 in the coronary artery segmentation task. For each pixel in the image, if the pixel belongs to the coronary artery, it is marked as 1 on the corresponding mask, otherwise it is marked as 0.
[0121] The Transformer encoder, image reconstruction module, and feature comparison module are jointly pre-trained based on each pre-processed CCTA image;
[0122] The coronary artery segmentation module and the pre-trained Transformer encoder are jointly trained using the segmentation masks of each labeled CCTA image;
[0123] After preprocessing the CCTA image to be segmented (which can be obtained using the CCTA image acquisition module) using the CCTA image preprocessing module, the preprocessed CCTA image to be segmented is then segmented into coronary arteries using the trained Transformer encoder and coronary artery segmentation module to obtain the coronary artery segmentation result of the CCTA image to be segmented.
[0124] The pericoronary adipose tissue segmentation module is used to process the coronary artery segmentation results of the CCTA image to be segmented, and obtain the pericoronary adipose tissue segmentation results of the CCTA image to be segmented.
[0125] Specific Implementation Method Two: This implementation method differs from Specific Implementation Method One in that the CCTA image preprocessing module crops and standardizes each CCTA image in the training set, unifying each CCTA image to the same size.
[0126] The other steps and parameters are the same as in Specific Implementation Method 1.
[0127] CCTA images are obtained using computed tomography (CT) technology. By sequentially cropping and standardizing the CCTA images, a set of CCTA images, each with a size of 512×512×256, can be obtained. Furthermore, depending on the actual needs, each CCTA image can be standardized to other sizes besides 512×512×256.
[0128] Specific Implementation Method 3: This implementation method differs from Specific Implementation Method 1 or 2 in that the Transformer encoder includes 12 encoding units connected in series.
[0129] For any preprocessed CCTA image, the preprocessed CCTA image is uniformly divided into non-overlapping patches (slices), and the size of each patch is P×P×P, where P is a predefined patch resolution.
[0130] The resulting patches are flattened and then projected through a linear projection layer (i.e., the flattened result is projected into an embedding space to create a sequence X∈R). N×d Where N is the number of patches and d is the embedding dimension, the output of the linear projection layer is then passed through the position embedding layer (which is crucial for learning the relative positions between input patches), that is, the position embedding is performed on the output of the linear projection layer to obtain the position embedding result for each patch.
[0131] The position embedding results of all patches are then used as input to the Transformer encoder. Within the Transformer encoder, the position embedding results of all patches are sequentially passed through 12 cascaded coding units, and the feature maps output by the 3rd, 6th, 9th, and 12th coding units are stored. That is, for each patch, a set of feature maps is obtained.
[0132] Other steps and parameters are the same as in specific implementation method one or two.
[0133] Specific Implementation Method Four: This implementation method differs from one of the specific implementation methods one to three in that each coding unit of the Transformer encoder includes a first normalization layer, a multi-head attention layer, a second normalization layer, and a multilayer perceptron.
[0134] Taking the first coding unit as an example
[0135] Within the first coding unit, the input features first pass through the first normalization layer, and then the output of the first normalization layer is used as the input to the multi-head attention layer.
[0136] The output of the multi-head attention layer is then added to the input features of the first coding unit to obtain the sum A.
[0137] The summed result A then passes through a second normalization layer, and the output of the second normalization layer is used as the input of the multilayer perceptron.
[0138] The output of the multilayer perceptron is then added to A to obtain the sum B, which is then used as the output of the first coding unit.
[0139] The other steps and parameters are the same as those in one of the specific implementation methods one to three.
[0140] In this invention, the output of the first coding unit is used as the input of the second coding unit. The working process of the second coding unit is the same as that of the first coding unit. Then, the output of the second coding unit is used as the input of the third coding unit, and so on. The working process of each coding unit is the same. The feature maps output by each coding unit are stored, that is, a set of feature maps is obtained for each patch.
[0141] Specific Implementation Method Five: Combining Figure 2 This embodiment is described below. The difference between this embodiment and one of the specific embodiments one to four is that the Transformer encoder, image reconstruction module, and feature comparison module are jointly pre-trained based on the pre-processed CCTA images. The specific pre-training process is as follows:
[0142] Step 1: Initialize the pre-training rounds t = 1;
[0143] Step 2: Flatten the patches corresponding to each preprocessed CCTA image, and then pass each patch through a linear projection layer, a position embedding layer, and a Transformer encoder. The Transformer encoder outputs a set of feature maps for each patch (i.e., each patch has a corresponding set of feature maps).
[0144] Step 3: For any patch, denote the patch as z. i , store z i The corresponding set of feature maps and z i As input to the image reconstruction module;
[0145] The working process within the image reconstruction module is as follows:
[0146] (1)z i The layers sequentially pass through the first convolutional layer, the first batch normalization (BN) layer, the first ReLU activation function layer, the second convolutional layer, the second BN layer, and the second ReLU activation function layer; that is...
[0147] z i First, the input goes through the first convolutional layer, and the output of the first convolutional layer is used as the input of the first batch normalization (BN) layer.
[0148] The output of the first BN layer is used as the input of the first ReLU activation function layer;
[0149] The output of the first ReLU activation function layer is used as the input of the second convolutional layer;
[0150] Use the output of the second convolutional layer as the input of the second batch normalization layer;
[0151] The output of the second BN layer is used as the input of the second ReLU activation function layer, and the output of the second ReLU activation function layer is denoted as a;
[0152] (2) The feature map output by the third encoding unit passes through the first deconvolution subunit, the second deconvolution subunit, and the third deconvolution subunit in sequence; that is...
[0153] The feature map output by the third coding unit first passes through the first deconvolution subunit, and the output of the first deconvolution subunit is used as the input of the second deconvolution unit.
[0154] The output of the second deconvolution subunit is used as the input of the third deconvolution subunit;
[0155] Let b be the output of the third deconvolution subunit;
[0156] (3) The feature map output by the sixth coding unit passes through the fourth and fifth deconvolution sub-units in sequence; that is...
[0157] The feature map output by the sixth coding unit first passes through the fourth deconvolution subunit, and the output of the fourth deconvolution subunit is used as the input of the fifth deconvolution unit.
[0158] Let c be the output of the fifth deconvolution subunit;
[0159] (4) The feature map output by the ninth coding unit is passed through the sixth deconvolution subunit, and the output of the sixth deconvolution subunit is denoted as d;
[0160] (5) The feature map output by the twelfth coding unit is passed through the first upsampling unit (i.e., the deconvolution layer), and the output of the first upsampling unit is denoted as e;
[0161] (6) After concatenating the outputs e and d, the concatenated result is passed sequentially through the third convolutional layer, the third batch normalization (BN) layer, the third ReLU activation function layer, the fourth convolutional layer, the fourth BN layer, and the fourth ReLU activation function layer, i.e.
[0162] The concatenated result first passes through the third convolutional layer, and the output of the third convolutional layer is used as the input of the third batch normalization (BN) layer.
[0163] The output of the third BN layer is used as the input of the third ReLU activation function layer;
[0164] The output of the third ReLU activation function layer is used as the input of the fourth convolutional layer;
[0165] Use the output of the fourth convolutional layer as the input of the fourth batch normalization layer;
[0166] Use the output of the fourth BN layer as the input of the fourth ReLU activation function layer;
[0167] The output of the fourth ReLU activation function layer is then passed through the second upsampling unit, and the output of the second upsampling unit is denoted as i'.
[0168] (7) After concatenating the outputs c and i', the concatenated result is passed sequentially through the fifth convolutional layer, the fifth batch normalization (BN) layer, the fifth ReLU activation function layer, the sixth convolutional layer, the sixth BN layer, and the sixth ReLU activation function layer, i.e.
[0169] The concatenated result first passes through the fifth convolutional layer, and the output of the fifth convolutional layer is used as the input of the fifth batch normalization (BN) layer.
[0170] Use the output of the fifth BN layer as the input of the fifth ReLU activation function layer;
[0171] The output of the fifth ReLU activation function layer is used as the input of the sixth convolutional layer;
[0172] Use the output of the sixth convolutional layer as the input of the sixth batch normalization layer;
[0173] Use the output of the sixth BN layer as the input of the sixth ReLU activation function layer;
[0174] The output of the sixth ReLU activation function layer is then passed through the third upsampling unit, and the output of the third upsampling unit is denoted as j'.
[0175] (8) After concatenating the outputs b and j', the concatenated result is passed sequentially through the seventh convolutional layer, the seventh batch normalization (BN) layer, the seventh ReLU activation function layer, the eighth convolutional layer, the eighth BN layer, and the eighth ReLU activation function layer, i.e.
[0176] The concatenated result first passes through the seventh convolutional layer, and the output of the seventh convolutional layer is used as the input of the seventh batch normalization (BN) layer.
[0177] Use the output of the seventh BN layer as the input of the seventh ReLU activation function layer;
[0178] Use the output of the seventh ReLU activation function layer as the input of the eighth convolutional layer;
[0179] Use the output of the eighth convolutional layer as the input of the eighth batch normalization layer;
[0180] Use the output of the eighth BN layer as the input of the eighth ReLU activation function layer;
[0181] The output of the eighth ReLU activation function layer is then passed through the fourth upsampling unit, and the output of the fourth upsampling unit is denoted as k.
[0182] (9) After concatenating the outputs k and a, the concatenated result is passed sequentially through the ninth convolutional layer, the ninth batch normalization (BN) layer, the ninth ReLU activation function layer, the tenth convolutional layer, the tenth BN layer, and the tenth ReLU activation function layer, i.e.
[0183] The concatenated result first passes through the ninth convolutional layer, and the output of the ninth convolutional layer is used as the input of the ninth batch normalization (BN) layer.
[0184] Use the output of the ninth BN layer as the input of the ninth ReLU activation function layer;
[0185] Use the output of the ninth ReLU activation function layer as the input of the tenth convolutional layer;
[0186] Use the output of the tenth convolutional layer as the input of the tenth batch normalization layer;
[0187] Use the output of the 10th BN layer as the input of the 10th ReLU activation function layer;
[0188] Then, the output of the 10th ReLU activation layer is used as the input of the 11th convolutional layer, and the output of the 11th convolutional layer is used as z. i Reconstruction results
[0189] Step 4: Store z i The corresponding set of feature maps is input into the feature comparison module. Within the feature comparison module, z i The corresponding feature map is sequentially passed through a global average pooling layer and a linear projection layer to obtain z. i Feature embedding z i ′;
[0190] Step 5: Based on the reconstruction results and feature embedding z′ i Calculate the total loss function L, perform backpropagation based on the total loss function L, and adjust the network parameters in the Transformer encoder, image reconstruction module, and feature comparison module.
[0191] Stop the pre-training of round t when the total loss function L converges, and then continue to execute step 6;
[0192] Step 6: Determine if the maximum number of pre-training rounds has been reached;
[0193] If this is achieved, the final pre-training result is obtained;
[0194] If the target is not met, then after re-patching each preprocessed CCTA image, set t = t + 1 and return to step 2.
[0195] The other steps and parameters are the same as those in one of the specific implementation methods one to four.
[0196] Specific Implementation Method Six: This implementation method differs from Specific Implementation Methods One to Five in that the first deconvolution subunit includes a deconvolution layer, a convolution layer, a BN layer, and a ReLU activation function layer in sequence. That is, in the first deconvolution subunit, the input feature map first passes through the deconvolution layer, and then the output of the deconvolution layer is used as the input of the convolution layer.
[0197] Use the output of the convolutional layer as the input of the BN layer;
[0198] Use the output of the BN layer as the input to the ReLU activation function layer;
[0199] The output of the ReLU activation function layer is used as the output of the first deconvolution subunit.
[0200] The other steps and parameters are the same as those in one of the specific implementation methods one to five.
[0201] The structure and operation of the second, third, fourth, fifth, and sixth deconvolution subunits are the same as those of the first deconvolution subunit.
[0202] Specific Implementation Method Seven: This implementation method differs from Specific Implementation Methods One to Six in that the specific process of step 3 is as follows:
[0203] z i ′=h(g(f))
[0204] Where f is z i The feature map, g(f) is the feature vector output by the global average pooling layer, h represents the linear projection layer, and z i ′ is the feature embedding output by the linear projection layer.
[0205] The other steps and parameters are the same as those in one of the specific implementation methods one to six.
[0206] Specific Implementation Method Eight: This implementation method differs from Specific Implementation Methods One through Seven in that the total loss function L is:
[0207] L = L trcon +L contrast
[0208] Among them, L recon It is a reconstruction loss, L contrast It is patch contrast loss;
[0209]
[0210] Where M is the set of all patches, and ||·||2 is the L2 norm;
[0211]
[0212] Where, z′ i It is z i Feature embedding, z + Is with z i Feature embedding from another patch of the same CCTA image, sim(z) i ,z + ) represents z i With z + The similarity between them, N(i) represents the difference between z and z. i The set of all patches except z′, sim(z′) i , z′ j ) represents z′ i With z′ j The similarity between them, z′ j It is z jFeature embedding.
[0213] The other steps and parameters are the same as those in any of the specific implementation methods one to seven.
[0214] Specific Implementation Method Nine: This implementation method differs from Specific Implementation Methods One through Eight in that the coronary artery segmentation module and the pre-trained Transformer encoder are jointly trained using segmentation masks from each labeled CCTA image. The specific training process is as follows:
[0215] Step 1: Initialize the training rounds p = 1;
[0216] Step 2: Flatten the patches corresponding to each preprocessed CCTA image, and then pass each patch through a linear projection layer, a location embedding layer, and a Transformer encoder. The Transformer encoder outputs a set of feature maps for each patch.
[0217] Step 3: For any patch, denote the patch as z. i , in z i The corresponding set of feature maps includes the feature maps output by the third, sixth, ninth, and twelfth coding units, as well as z. i As input to the coronary artery segmentation module;
[0218] Step 4: Calculate the segmentation loss L′ based on the coronary artery segmentation results predicted by the coronary artery segmentation module and the segmentation mask. Perform backpropagation based on the segmentation loss L′ and adjust the network parameters in the Transformer encoder and the coronary artery segmentation module until the segmentation loss L′ converges, then stop training in the p-th round.
[0219] The segmentation loss L′ is:
[0220] L′=L CE +L Dice
[0221] Among them, L CE It is the cross-entropy loss, L Dice It is a Dice loss;
[0222]
[0223] Where N is the set of all patches corresponding to the labeled CCTA image, and K is the number of pixels in a patch. It is z i The k-th pixel in the segmentation mask corresponds to the ground truth label. It is z i The prediction result corresponding to the k-th pixel in the image;
[0224]
[0225] Step 5: Determine if the maximum number of training rounds has been reached;
[0226] If this is achieved, the final training result is obtained;
[0227] If the target is not met, then after re-patching each preprocessed CCTA image, set p = p + 1 and return to step two.
[0228] The other steps and parameters are the same as those in one of the specific implementation methods one to eight.
[0229] It should be noted that the structure of the coronary artery segmentation module is roughly the same as that of the image reconstruction module. Specifically, for the part before the tenth ReLU activation function layer, the structure of the coronary artery segmentation module is exactly the same as that of the image reconstruction module. The only difference is that in the coronary artery segmentation module, the output of the tenth ReLU activation function layer passes through the upsampling unit, the ReLU activation function layer and the softmax activation function layer in sequence, and the coronary artery segmentation result is output through the softmax activation function layer.
[0230] Dice loss is used to evaluate the degree of overlap in a segmentation task. Combining cross-entropy loss with Dice loss can help the segmentation model converge better during training and improve the accuracy of segmentation.
[0231] Specific Implementation Method Ten: Combining Figure 3 This embodiment is described below. The difference between this embodiment and one of the specific embodiments one through nine is that the working process of the pericoronal adipose tissue segmentation module is as follows:
[0232] Step 1: Extract the coronary artery centerline from the coronary artery foreground in the segmentation result using a skeletonization algorithm;
[0233] Step 2: Based on the extracted coronary artery centerline, draw the normal vector of the coronary artery centerline through each point on the coronary artery centerline. The obtained normal vector is the direction of coronary artery lumen expansion at the corresponding point on the coronary artery centerline. Then calculate the distance from each point along the normal vector to the wall of the coronary artery lumen, which is the coronary radius at each point on the coronary artery centerline.
[0234] Step 3: Dilate the coronary arteries so that the radius of the coronary artery at each point on the center line of the dilated coronary artery doubles.
[0235] Step 4: Use a Gaussian filtering algorithm to smooth the dilated coronary arteries, and then perform threshold segmentation on the smoothing result. That is, pixels with gray values in the range of [-190HU, -30HU] are taken as the pixels corresponding to the pericoronal adipose tissue, and the pericoronal adipose tissue segmentation result is obtained.
[0236] The other steps and parameters are the same as those in any of the specific implementation methods one to nine.
[0237] The above examples of the present invention are merely illustrative of the computational model and process of the present invention, and are not intended to limit the implementation of the present invention. Those skilled in the art will recognize that other variations or modifications can be made based on the above description. It is impossible to exhaustively list all possible implementations here. Any obvious variations or modifications derived from the technical solutions of the present invention are still within the scope of protection of the present invention.
Claims
1. A coronary artery and pericoronary adipose tissue segmentation system based on self-supervised learning, characterized in that, The system includes a CCTA image acquisition module, a CCTA image preprocessing module, a CCTA image segmentation mask extraction module, a Transformer encoder, an image reconstruction module, a feature comparison module, a coronary artery segmentation module, and a pericoronary adipose tissue segmentation module; wherein: The CCTA image acquisition module is used to acquire a CCTA image training set, which includes labeled CCTA images and unlabeled CCTA images. The CCTA image preprocessing module is used to preprocess each CCTA image in the acquired training set to obtain preprocessed CCTA images. The CCTA image segmentation mask extraction module is used to extract the segmentation mask from the acquired labeled CCTA images, and obtain the segmentation mask for each labeled CCTA image. The Transformer encoder, image reconstruction module, and feature comparison module are jointly pre-trained based on the pre-processed CCTA images; the specific pre-training process is as follows: Step 1: Initialize the pre-training rounds t=1; Step 2: Flatten the patches corresponding to each preprocessed CCTA image, and then pass each patch through a linear projection layer, a location embedding layer, and a Transformer encoder. The Transformer encoder outputs a set of feature maps for each patch. Step 3: For any patch, denote the patch as... , will store The corresponding set of feature maps and As input to the image reconstruction module; The working process within the image reconstruction module is as follows: (1) The layers sequentially pass through the first convolutional layer, the first batch normalization (BN) layer, the first ReLU activation function layer, the second convolutional layer, the second BN layer, and the second ReLU activation function layer; that is... First, the input goes through the first convolutional layer, and the output of the first convolutional layer is used as the input of the first batch normalization (BN) layer. The output of the first BN layer is used as the input of the first ReLU activation function layer; The output of the first ReLU activation function layer is used as the input of the second convolutional layer; Use the output of the second convolutional layer as the input of the second batch normalization layer; The output of the second BN layer is used as the input of the second ReLU activation function layer, and the output of the second ReLU activation function layer is denoted as a; (2) The feature map output by the third encoding unit passes through the first deconvolution subunit, the second deconvolution subunit, and the third deconvolution subunit in sequence; that is... The feature map output by the third coding unit first passes through the first deconvolution subunit, and the output of the first deconvolution subunit is used as the input of the second deconvolution unit. The output of the second deconvolution subunit is used as the input of the third deconvolution subunit; Let b be the output of the third deconvolution subunit; (3) The feature map output by the sixth coding unit passes through the fourth and fifth deconvolution sub-units in sequence; that is... The feature map output by the sixth coding unit first passes through the fourth deconvolution subunit, and the output of the fourth deconvolution subunit is used as the input of the fifth deconvolution unit. Let c be the output of the fifth deconvolution subunit; (4) The feature map output by the ninth coding unit is passed through the sixth deconvolution subunit, and the output of the sixth deconvolution subunit is denoted as d; (5) The feature map output by the twelfth coding unit is passed through the first upsampling unit, and the output of the first upsampling unit is denoted as e; (6) After concatenating the outputs e and d, the concatenated result is passed sequentially through the third convolutional layer, the third batch normalization (BN) layer, the third ReLU activation function layer, the fourth convolutional layer, the fourth BN layer, and the fourth ReLU activation function layer, i.e. The concatenated result first passes through the third convolutional layer, and the output of the third convolutional layer is used as the input of the third batch normalization (BN) layer. The output of the third BN layer is used as the input of the third ReLU activation function layer; The output of the third ReLU activation function layer is used as the input of the fourth convolutional layer; Use the output of the fourth convolutional layer as the input of the fourth batch normalization layer; Use the output of the fourth BN layer as the input of the fourth ReLU activation function layer; The output of the fourth ReLU activation function layer is then passed through the second upsampling unit, and the output of the second upsampling unit is denoted as i'. (7) After concatenating the output c and the output i', the concatenated result is passed sequentially through the fifth convolutional layer, the fifth batch normalization (BN) layer, the fifth ReLU activation function layer, the sixth convolutional layer, the sixth BN layer, and the sixth ReLU activation function layer, i.e. The concatenated result first passes through the fifth convolutional layer, and the output of the fifth convolutional layer is used as the input of the fifth batch normalization (BN) layer. Use the output of the fifth BN layer as the input of the fifth ReLU activation function layer; The output of the fifth ReLU activation function layer is used as the input of the sixth convolutional layer; Use the output of the sixth convolutional layer as the input of the sixth batch normalization layer; Use the output of the sixth BN layer as the input of the sixth ReLU activation function layer; The output of the sixth ReLU activation function layer is then passed through the third upsampling unit, and the output of the third upsampling unit is denoted as j'. (8) After concatenating the outputs b and j', the concatenated result is passed sequentially through the seventh convolutional layer, the seventh BN layer, the seventh ReLU activation function layer, the eighth convolutional layer, the eighth BN layer, and the eighth ReLU activation function layer, i.e. The concatenated result first passes through the seventh convolutional layer, and the output of the seventh convolutional layer is used as the input of the seventh batch normalization (BN) layer. Use the output of the seventh BN layer as the input of the seventh ReLU activation function layer; Use the output of the seventh ReLU activation function layer as the input of the eighth convolutional layer; Use the output of the eighth convolutional layer as the input of the eighth batch normalization layer; Use the output of the eighth BN layer as the input of the eighth ReLU activation function layer; The output of the eighth ReLU activation function layer is then passed through the fourth upsampling unit, and the output of the fourth upsampling unit is denoted as k'. (9) After concatenating the outputs k' and a, the concatenated result is passed sequentially through the ninth convolutional layer, the ninth BN layer, the ninth ReLU activation function layer, the tenth convolutional layer, the tenth BN layer, and the tenth ReLU activation function layer, i.e. The concatenated result first passes through the ninth convolutional layer, and the output of the ninth convolutional layer is used as the input of the ninth batch normalization (BN) layer. Use the output of the ninth BN layer as the input of the ninth ReLU activation function layer; Use the output of the ninth ReLU activation function layer as the input of the tenth convolutional layer; Use the output of the tenth convolutional layer as the input of the tenth batch normalization layer; Use the output of the 10th BN layer as the input of the 10th ReLU activation function layer; Then, the output of the tenth ReLU activation function layer is used as the input of the eleventh convolutional layer, and the output of the eleventh convolutional layer is used as... Reconstruction results ; Step 4: Store The corresponding set of feature maps is input into the feature comparison module. Within the feature comparison module, The corresponding feature maps are sequentially passed through a global average pooling layer and a linear projection layer to obtain... Feature embedding ; Step 5: Based on the reconstruction results and feature embedding Calculate the total loss function According to the total loss function Perform backpropagation to adjust the network parameters in the Transformer encoder, image reconstruction module, and feature contrast module; Until the total loss function When convergence is achieved, stop the pre-training of round t and then continue to step 6; Step 6: Determine if the maximum number of pre-training rounds has been reached; If this is achieved, the final pre-training result is obtained; If the target is not met, then after re-patching each preprocessed CCTA image, set t=t+1 and return to step 2. The coronary artery segmentation module and the pre-trained Transformer encoder are jointly trained using the segmentation masks of each labeled CCTA image; After preprocessing the CCTA image to be segmented using the CCTA image preprocessing module, the preprocessed CCTA image to be segmented is segmented into coronary arteries using the trained Transformer encoder and coronary artery segmentation module to obtain the coronary artery segmentation result of the CCTA image to be segmented. The pericoronary adipose tissue segmentation module is used to process the coronary artery segmentation results of the CCTA image to be segmented, and obtain the pericoronary adipose tissue segmentation results of the CCTA image to be segmented.
2. The coronary artery and pericoronary adipose tissue segmentation system based on self-supervised learning according to claim 1, characterized in that, The CCTA image preprocessing module crops and standardizes each CCTA image in the training set, unifying all CCTA images to the same size.
3. The coronary artery and pericoronary adipose tissue segmentation system based on self-supervised learning according to claim 2, characterized in that, The Transformer encoder includes 12 encoding units connected in series; For any preprocessed CCTA image, divide the preprocessed CCTA image evenly into non-overlapping patches, each patch having a size of [size missing]. ,in, It is a predefined patch resolution; The obtained patches are flattened and passed through a linear projection layer. The output of the linear projection layer is then passed through a position embedding layer, that is, the position embedding is performed on the output of the linear projection layer to obtain the position embedding result for each patch. The position embedding results of all patches are then used as input to the Transformer encoder. Within the Transformer encoder, the position embedding results of all patches are sequentially passed through 12 cascaded coding units, and the feature maps output by the 3rd, 6th, 9th, and 12th coding units are stored. That is, for each patch, a set of feature maps is obtained.
4. The coronary artery and pericoronary adipose tissue segmentation system based on self-supervised learning according to claim 3, characterized in that, Each coding unit of the Transformer encoder includes a first normalization layer, a multi-head attention layer, a second normalization layer, and a multilayer perceptron. Taking the first coding unit as an example Within the first coding unit, the input features first pass through the first normalization layer, and then the output of the first normalization layer is used as the input to the multi-head attention layer. Then, the output of the multi-head attention layer is added to the input features of the first coding unit to obtain the sum A; The summed result A then passes through a second normalization layer, and the output of the second normalization layer is used as the input of the multilayer perceptron. Then, the output of the multilayer perceptron is added to A to obtain the summed result B, which is then used as the output of the first coding unit.
5. The coronary artery and pericoronary adipose tissue segmentation system based on self-supervised learning according to claim 4, characterized in that, The first deconvolution subunit includes a deconvolution layer, a convolution layer, a BN layer and a ReLU activation function layer in sequence. That is, in the first deconvolution subunit, the input feature map first passes through the deconvolution layer, and then the output of the deconvolution layer is used as the input of the convolution layer. Use the output of the convolutional layer as the input of the BN layer; Use the output of the BN layer as the input to the ReLU activation function layer; The output of the ReLU activation function layer is used as the output of the first deconvolution subunit.
6. The coronary artery and pericoronary adipose tissue segmentation system based on self-supervised learning according to claim 5, characterized in that, The specific process of step 4 is as follows: in, yes Feature map, It is the feature vector output by the global average pooling layer. Indicates a linear projection layer. It is the feature embedding output by the linear projection layer.
7. The coronary artery and pericoronary adipose tissue segmentation system based on self-supervised learning according to claim 6, characterized in that, The total loss function for: in, It is a reconstruction loss. It is patch contrast loss; in, It is a collection of all patches. It is the L2 norm; in, yes Feature embedding, Is with Feature embeddings from another patch of the same CCTA image, express and Similarity between them Indicates except The collection of all patches except those mentioned above. express and Similarity between them yes Feature embedding.
8. The coronary artery and pericoronary adipose tissue segmentation system based on self-supervised learning according to claim 7, characterized in that, The coronary artery segmentation module and the pre-trained Transformer encoder are jointly trained using segmentation masks from each labeled CCTA image. The specific training process is as follows: Step 1: Initialize the training rounds p=1; Step 2: Flatten the patches corresponding to each preprocessed CCTA image, and then pass each patch through a linear projection layer, a location embedding layer, and a Transformer encoder. The Transformer encoder outputs a set of feature maps for each patch. Step 3: For any patch, denote it as... ,exist The corresponding set of feature maps includes the feature maps output by the third, sixth, ninth, and twelfth coding units, and... As input to the coronary artery segmentation module; Step 4: Calculate the segmentation loss based on the coronary artery segmentation results predicted by the coronary artery segmentation module and the segmentation mask. According to the segmentation loss Perform backpropagation and adjust the network parameters in the Transformer encoder and coronary segmentation module until segmentation loss occurs. The training in the p-th round is stopped upon convergence. The segmentation loss for: in, It is cross-entropy loss. It is a Dice loss; in, It is the set of all patches corresponding to the labeled CCTA images. It is the number of pixels in a patch. yes The first in The actual label corresponding to each pixel in the segmentation mask. yes The first in The prediction result corresponding to each pixel; Step 5: Determine if the maximum number of training rounds has been reached; If this is achieved, the final training result is obtained; If the target is not met, then after re-patching each preprocessed CCTA image, set p=p+1 and return to step two.
9. A coronary artery and pericoronary adipose tissue segmentation system based on self-supervised learning according to claim 8, characterized in that, The working process of the pericoronal adipose tissue segmentation module is as follows: Step 1: Extract the coronary artery centerline from the coronary artery foreground in the segmentation result using a skeletonization algorithm; Step 2: Based on the extracted coronary artery centerline, draw the normal vector of the coronary artery centerline through each point on the coronary artery centerline. The obtained normal vector is the direction of coronary artery lumen expansion at the corresponding point on the coronary artery centerline. Then calculate the distance from each point along the normal vector to the wall of the coronary artery lumen, which is the coronary radius at each point on the coronary artery centerline. Step 3: Dilate the coronary arteries so that the radius of the coronary artery at each point on the center line of the dilated coronary artery doubles. Step 4: Use a Gaussian filtering algorithm to smooth the dilated coronary arteries, and then perform threshold segmentation on the smoothing result. That is, pixels with gray values in the range of [-190HU, -30HU] are taken as the pixels corresponding to the pericoronal adipose tissue, and the pericoronal adipose tissue segmentation result is obtained.