Mixed self-supervised learning method for malignancy degree of single pulmonary nodule based on CT image

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By employing a hybrid self-supervised learning method based on CT images, combining contrastive and generative self-supervised learning, the problems of large training data requirements and feature loss in the classification of pulmonary granulomatous nodules and solid lung adenocarcinoma by deep learning are solved, achieving more accurate lesion feature extraction and classification.

CN119027369BActive Publication Date: 2026-06-19NORTHEASTERN UNIV CHINA

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: NORTHEASTERN UNIV CHINA
Filing Date: 2024-07-12
Publication Date: 2026-06-19

Application Information

Patent Timeline

12 Jul 2024

Application

19 Jun 2026

Publication

CN119027369B

IPC: G06T7/00; G06V10/774; G06V10/764; G06V10/77; G06V10/40; G06V10/82; G06N3/0455; G06N3/0464; G06N3/0895

AI Tagging

Application Domain

Image analysis Character and pattern recognition

Technology Topics

Pulmonary noduleCoronal plane

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Pulmonary nodule localization and aspiration needle
CN224403736UPulmonary noduleSurgery
Four-dimensional optical flow field real-time navigation system for pulmonary nodules and method thereof
CN121337471BPulmonary noduleData acquisition
A dual-branch lung nodule detection system and method combining domain incremental learning and low-dose CT
CN122312499APulmonary noduleData stream
A medical image segmentation method based on multi-expert perception reward and group relative strategy optimization
CN122289173Areduce fitReduce forecast shockPattern recognitionPulmonary nodule
Deep learning lung nodule segmentation system and boundary extraction method
CN122289285APulmonary noduleImaging processing

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing deep learning methods face challenges in distinguishing between pulmonary granulomatous nodules and solid lung adenocarcinoma, including the large requirement for training data, potential feature loss due to differences between natural images and medical imaging, and the model's inability to cope with subtle changes.

Method used

A hybrid self-supervised learning method based on CT images is adopted. By acquiring horizontal and coronal images, and combining encoder, momentum encoder and decoder, feature extraction and pixel reconstruction are performed. Taking advantage of the advantages of contrastive and generative self-supervised learning, a classification model for pulmonary granulomatous nodules and solid lung adenocarcinoma is constructed.

Benefits of technology

It enhances the classification ability of pulmonary granulomatous nodules and solid lung adenocarcinoma, making full use of the contextual information of medical imaging to ensure thorough extraction of lesion features and maximize information.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN119027369B_ABST

Patent Text Reader

Abstract

This application discloses a hybrid self-supervised learning method for determining the malignancy of a single pulmonary nodule based on CT images. The method includes: inputting a horizontal plane image of the pulmonary lesion into an encoder for feature extraction, obtaining a horizontal plane feature vector of the lesion; inputting a coronal plane image of the pulmonary lesion into a momentum encoder for feature extraction, obtaining a coronal plane feature vector of the lesion, and storing it in a dynamic dictionary; reconstructing the occluded portion of the horizontal plane image using the horizontal plane feature vector and the decoder to obtain a target image; calculating a loss value based on the horizontal plane feature vector, the coronal plane feature vectors in the dynamic dictionary, and the target image; and iteratively training the encoder, momentum encoder, and decoder together according to the loss value until preset conditions are met, thus constructing a preset classification model for pulmonary granulomatous nodules and solid pulmonary adenocarcinoma. This application can distinguish between a single atypical pulmonary granulomatous nodule and solid pulmonary adenocarcinoma.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of self-supervised learning technology, and in particular to a hybrid self-supervised learning method for determining the malignancy of a single pulmonary nodule based on CT images. Background Technology

[0002] Lung cancer is the leading cause of cancer death worldwide, posing a serious threat to human life and health. Pulmonary granulomatous nodules (PGNs), with their spiculated or lobulated appearance, often resemble solid lung adenocarcinoma (SLA) on CT scans, making differentiation difficult.

[0003] Computer-aided diagnosis (CAD), particularly enhanced by deep learning techniques, offers a novel approach to diagnosing lung diseases, such as differentiating between solitary granulomas and adenocarcinomas. However, a significant limitation when using deep learning in diagnosis is the need for training neural networks to acquire vast amounts of precisely labeled medical data. Obtaining this data requires substantial resources, including time and funding.

[0004] Currently, self-supervised learning is considered a potential solution to the challenge of requiring finely labeled medical data. Unlike traditional supervised learning, self-supervised learning utilizes unlabeled data, deriving proxy tasks from the data itself to train models without manual annotation. However, despite significant progress in self-supervised learning, the differences between natural images and medical imaging modalities still pose significant challenges to its application in the medical field. Medical images primarily come from radiographic, functional, magnetic resonance, and ultrasound imaging modalities, while natural images are mainly captured by ambient light. This difference highlights significant differences in application and algorithm design. One significant difference lies in the dimensionality and nature of the images; many medical images are 3D single-channel grayscale representations, while natural images are primarily 2D color images. This difference can lead to the potential loss of important pathological features when applying deep learning architectures originally designed for natural images to medical data. Furthermore, the high similarity between medical images of the same anatomical region or similar physiological states stems from the inherent similarity of human tissues. In natural images, subtle structural changes may be considered insignificant, but these subtle changes in medical images may be indicators of pathological changes. Given the unique challenges posed by the differences between natural images and medical imaging, traditional neural networks often struggle to classify pulmonary granulomatous nodules and solid lung adenocarcinoma. Summary of the Invention

[0005] In view of this, this application provides a hybrid self-supervised learning method for determining the malignancy of a single pulmonary nodule based on CT images. The main purpose is to differentiate between a single atypical pulmonary granulomatous nodule and solid pulmonary adenocarcinoma preoperatively using CT images and a self-supervised learning method.

[0006] According to a first aspect of this application, a hybrid self-supervised learning method for determining the malignancy of a single pulmonary nodule based on CT images is provided, the method comprising:

[0007] Obtain horizontal and coronal images of lung lesions;

[0008] The horizontal plane image is input into an encoder for feature extraction to obtain the horizontal plane feature vector of the lung lesion;

[0009] The coronal image is input into a momentum encoder for feature extraction to obtain the coronal feature vector of the lung lesion, and the coronal feature vector is stored in a dynamic dictionary;

[0010] Based on the horizontal plane feature vector and the decoder, pixel reconstruction is performed on the occluded part in the horizontal plane image to obtain the target image;

[0011] Based on the horizontal plane feature vector, the coronal plane feature vectors in the dynamic dictionary, and the target image, the loss value is calculated;

[0012] Based on the loss value, the encoder, the momentum encoder, and the decoder are jointly iteratively trained until preset conditions are met, thereby constructing a preset classification model for pulmonary granulomatous nodules and solid pulmonary adenocarcinoma.

[0013] According to a second aspect of this application, a hybrid self-supervised learning device for determining the malignancy of a single pulmonary nodule based on CT images is provided, the device comprising:

[0014] The acquisition unit is used to acquire horizontal and coronal images of lung lesions;

[0015] The first extraction unit is used to input the horizontal plane image into the encoder for feature extraction to obtain the horizontal plane feature vector of the lung lesion;

[0016] The second extraction unit is used to input the coronal image into the momentum encoder for feature extraction, obtain the coronal feature vector of the lung lesion, and store the coronal feature vector in the dynamic dictionary;

[0017] The reconstruction unit is used to perform pixel reconstruction of the occluded part in the horizontal plane image based on the horizontal plane feature vector and the decoder to obtain the target image;

[0018] The calculation unit is used to calculate the loss value based on the horizontal plane feature vector, each coronal plane feature vector in the dynamic dictionary, and the target image;

[0019] The training unit is used to perform joint iterative training on the encoder, the momentum encoder and the decoder based on the loss value until a preset condition is met, thereby constructing a preset classification model for pulmonary granulomatous nodules and solid pulmonary adenocarcinoma.

[0020] According to a third aspect of this application, a storage medium is provided that stores a computer program thereon, which, when executed by a processor, implements the above-described hybrid self-supervised learning method for determining the malignancy of a single pulmonary nodule based on CT images.

[0021] According to a fourth aspect of this application, an electronic device is provided, including a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, wherein the processor executes the program to implement the above-described hybrid self-supervised learning method for determining the malignancy of a single pulmonary nodule based on CT images.

[0022] By employing the aforementioned technical solutions, this application provides a hybrid self-supervised learning method for determining the malignancy of single pulmonary nodules based on CT images. Through training the encoder, momentum encoder, and decoder, it successfully integrates the advantages of contrastive and generative self-supervised learning. This synthesis enables the model to not only extract features from medical images from a global perspective but also to meticulously examine unique image regions, thereby enhancing the ability to distinguish between pre-defined pulmonary granulomatous nodules and solid pulmonary adenocarcinoma classification models. This application innovatively proposes a groundbreaking self-supervised learning method that fully utilizes the comprehensive contextual information present in medical imaging. Compared with traditional 2D medical imaging methods, this application promotes the extraction of multifaceted lesion features, ensures the thoroughness of data representation, and maximizes the information content of the samples.

[0023] The above description is only an overview of the technical solution of this application. In order to better understand the technical means of this application and to implement it in accordance with the contents of the specification, and to make the above and other objects, features and advantages of this application more obvious and understandable, specific embodiments of this application are given below. Attached Figure Description

[0024] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings:

[0025] Figure 1 The illustration shows a flowchart of a hybrid self-supervised learning method for determining the malignancy of a single pulmonary nodule based on CT images, as provided in an embodiment of this application.

[0026] Figure 2 The illustration shows lesion image (CT and pathology) samples of PGN and SLA in datasets 1 and 2 provided in this application, as well as schematic diagrams of excluded lesion images;

[0027] Figure 3 A schematic diagram of the MCMAE-NET network architecture provided in an embodiment of this application is shown;

[0028] Figure 4 This paper illustrates a flowchart of another hybrid self-supervised learning method for determining the malignancy of a single pulmonary nodule based on CT images, provided in an embodiment of this application.

[0029] Figure 5 A schematic diagram illustrating the MCMAE-NET network structure and its detailed parameters provided in an embodiment of this application is shown.

[0030] Figure 6 This illustration shows a schematic diagram of a hybrid self-supervised learning device for determining the malignancy of a single pulmonary nodule based on CT images, as provided in an embodiment of this application. Detailed Implementation

[0031] The present application will be described in detail below with reference to the accompanying drawings and embodiments. It should be noted that, unless otherwise specified, the embodiments and features described in the embodiments of the present application can be combined with each other.

[0032] Given the unique challenges posed by the differences between natural images and medical imaging, traditional neural networks often struggle to classify pulmonary granulomatous nodules and solid lung adenocarcinoma.

[0033] To address the aforementioned issues, this invention proposes an innovative self-supervised learning network, MCMAE-NET, specifically designed for medical image classification, particularly focusing on distinguishing between pulmonary granulomatous nodules and solid lung adenocarcinoma in CT scans of lung cancer patients. The architecture of MCMAE-NET integrates several key components to achieve its superior performance. First and foremost, this invention curates sample pairs by extracting regions of interest (ROIs) from both the horizontal and coronal planes of each case, ensuring a comprehensive and diverse dataset. Subsequently, the MCMAE-NET model, as a hybrid of self-supervised learning strategies, effectively combines the advantages of contrastive and generative self-supervised learning. This integration enables MCMAE-NET to leverage its ability to distinguish feature differences between instances while also emphasizing feature variations within instances.

[0034] Based on the above-mentioned inventive concept, embodiments of the present invention provide a hybrid self-supervised learning method for determining the malignancy of a single pulmonary nodule based on CT images, such as... Figure 1As shown, the method includes:

[0035] Step 101: Obtain horizontal and coronal images of the lung lesions.

[0036] The horizontal plane image is composed of horizontal plane slices, and the coronal plane image is composed of coronal plane slices. The slices in the horizontal plane image and the coronal plane image are randomly masked.

[0037] For embodiments of the present invention, when constructing the sample training set, it is necessary to pre-set the inclusion and exclusion criteria for the sample data. The inclusion criteria include: 1. Patients whose pathological diagnosis of pulmonary granulomatous lesions (tuberculous or fungal granulomas) and adenocarcinoma is confirmed by surgical resection or image-guided biopsy; 2. All patients underwent routine and enhanced CT scans using the same CT scanner and standardized reconstruction parameters within two weeks post-surgery; 3. Isolated solid SPNs, ranging in size from 7 to 30 mm, without calcification or fatty components, exhibiting spinous processes, lobulation, or pleural indentation features, and without associated atelectasis or lymphadenopathy; 4. Laboratory analysis of routine tumor markers (CEA, CA125, CA153) performed within one week prior to surgery, with positive thresholds set at >5 ng / ml, >35 ng / ml, and >25 ng / ml, respectively, according to the reference ranges of embodiments of the present invention. Exclusion criteria included: 1. Nodules with highly suggestive features of benign lesions, such as the characteristic halo sign of tuberculomas with caseous necrosis and cavitation or fungal granulomas; 2. Individuals with a history of other malignancies or concurrent malignancies; 3. Cases where imaging was performed using different algorithms, different slice thicknesses, or different CT scanners. Some typical examples (including CT and pathological images) can be found in [link to relevant documentation]. Figure 2 .

[0038] Of these cases, 333 were collected from one hospital, including 105 cases of PGN and 228 cases of SLA. This specific dataset, designated as Dataset 1, served as the basis for model training, testing, and validation. Additionally, 161 cases were collected from other hospitals, including 67 cases of PGN and 94 cases of SLA. This external dataset, designated as Dataset 2, was used for external model validation to evaluate the model's generalization ability. CT images of all patients were acquired using a multi-detector CT system (AS+128-Slice; Siemens Healthineers, Germany). For Dataset 1, the CT scan parameters were set as follows: tube voltage of 120 kVp, mean tube current of 299.88 mAs with a standard deviation of ±134.62 mAs, and mean slice thickness of 2.31 mm with a standard deviation of ±0.71 mm. For dataset 2, the parameters include a tube voltage of 120 kVp, a mean tube current of 194.31 mAs with a standard deviation of ±116.13 mAs, and a mean slice thickness of 4.03 mm with a standard deviation of ±1.57 mm. All images from these datasets were exported in DICOM format and subsequently used for image feature extraction in the MCMAE-NET model.

[0039] In this embodiment of the invention, lesion size is defined as the maximum diameter of the tumor in the axial image. A spinous process is defined as a linear or pointed extension emanating from the edge of a nodule or mass, extending into the lung parenchyma without touching the pleural surface. Lobulation features are characterized by a wavy, lobulated structure on a portion of the lesion surface, excluding the area adjacent to the pleura. Pleural indentation is defined as a linear structure extending from the tumor to the pleural surface. These morphological assessments provide important information for subsequent analysis and classification of PGN and SLA. The clinical features of dataset 1 are summarized in Table 1 of this embodiment of the invention.

[0040] Table 1

[0041]

[0042] * indicates that the result is significant. a and b represent the two-sample t-test and chi-square test, respectively.

[0043] Furthermore, to ensure the quality and consistency of the dataset during training and evaluation, embodiments of the present invention employ a comprehensive series of data processing and enhancement techniques. Preprocessing of CT images involves key steps such as resolution normalization, noise reduction, and contrast enhancement. These measures are crucial for ensuring dataset consistency and improving the quality of input data, and are essential for subsequent analysis stages.

[0044] First, given the differences in slice thickness across different scans, the initial steps included resampling all CT images to a uniform 1mm slice thickness using trilinear interpolation. This meticulous process ensured all images conformed to a consistent format, facilitating seamless subsequent analysis. Furthermore, a key aspect of the preprocessing workflow was intensity normalization. CT images underwent intensity normalization to mitigate image intensity variations caused by differences in acquisition settings. This calibration involved standardizing Hounsfield units (HU) to establish a uniform intensity scale across all images. Therefore, all CT data were uniformly set to a window width of 1400 HU and a window horizontal width of -500 HU. Additionally, to reduce interference from surrounding information and focus on the lesion, the embodiment of the invention selected the center point of each lesion as the anchor point. Then, the embodiment of the invention extracted a square ROI region with sides of 50mm, exceeding the maximum diameter of the tumor. Multiple slices containing the tumor lesion were obtained from both the horizontal and coronal planes. Subsequently, slices from the horizontal plane are arranged sequentially and stitched together to obtain a horizontally stitched image. Similarly, slices from the coronal plane are arranged sequentially and stitched together to obtain a coronally stitched image. Finally, sample pairs are generated based on the horizontally and coronally stitched images of the same lesion, serving as the training dataset for the model. This training dataset includes multiple sample pairs of different lesions. During model training, the horizontally and coronally stitched images in the training dataset are randomly masked to obtain the horizontal and coronal images of the lesion.

[0045] Based on this, step 101 specifically includes: obtaining horizontal and coronal sections of the lung lesion; stitching the horizontal and coronal sections together to obtain a horizontal stitched image and a coronal stitched image; and randomly masking the sections in the horizontal and coronal stitched images to obtain the horizontal image and the coronal image.

[0046] For example, 25 slices containing tumor lesions were obtained from both the horizontal and coronal planes. These slices from the horizontal and coronal planes were then arranged sequentially to generate two sets of 5x5 image slices. Each set of image slices was then stitched together and randomly masked to obtain the horizontal image X. q and coronal X-ray k .

[0047] To mitigate the risk of overfitting during model training, this invention employs various data augmentation techniques to expand the dataset. This approach successfully increases the diversity of the dataset, thereby improving the model's generalization ability. In this invention, specific augmentation techniques such as rotation, horizontal flipping, and four-way translation centered on a central anchor point are implemented to generate additional valuable data. The combination of these techniques significantly increases the size of the dataset; Dataset 1 generates a total of 13,320 pairs of samples. Subsequently, Dataset 1 is used for self-supervised pre-training of the MCMAE-NET model. During the fine-tuning phase of the downstream task, training is performed using the unexpanded original dataset, with the data divided into training and test sets in an 8:2 ratio, and five-fold cross-validation is performed. Dataset 2 is used as an external validation set to independently evaluate the model's performance.

[0048] Figure 3 The overall architecture of the self-supervised learning network MCMAE-NET innovatively combines generative and contrastive self-supervised learning. It mainly comprises three components: a data processing module, a generative self-supervised learning task module, and a contrastive self-supervised learning task module. The data processing module stitches horizontal and coronal slices together to obtain horizontal and coronal stitched images, which, after random occlusion, can be used as input images for the MCMAE-NET network. The generative self-supervised learning task module includes an encoder and a decoder. The contrastive self-supervised learning task module includes a momentum encoder. These modules work collaboratively to significantly improve the accuracy and efficiency of PGN (Potentially Generated Nodules) and SLA (Solid Adenocarcinoma) classification in CT images.

[0049] Step 102: Input the horizontal plane image into the encoder for feature extraction to obtain the horizontal plane feature vector of the lung lesion.

[0050] like Figure 3 As shown, after initializing the encoder, the horizontal plane image is... X The input q is fed into the encoder for feature extraction, resulting in the horizontal feature vector q of the lung lesion.

[0051] Step 103: Input the coronal image into the momentum encoder for feature extraction to obtain the coronal feature vector of the lung lesion, and store the coronal feature vector in the dynamic dictionary.

[0052] The dynamic dictionary stores coronal feature vectors for different lesions.

[0053] like Figure 3As shown, after initializing the momentum encoder, the coronal image X is... k The input is fed into the momentum encoder for feature extraction to obtain the coronal feature vector k of the lung lesion, and stored in the dynamic dictionary. The coronal feature vectors of different lesions are all stored in the dynamic dictionary.

[0054] Step 104: Based on the horizontal plane feature vector and the decoder, perform pixel reconstruction on the occluded part in the horizontal plane image to obtain the target image.

[0055] In this embodiment of the invention, the mask token of the occluded slice in the horizontal plane image is combined with the horizontal plane feature vector and then input into the decoder to perform pixel reconstruction on the occluded slice to obtain the target image Target.

[0056] Step 105: Calculate the loss value based on the horizontal plane feature vector, each coronal plane feature vector in the dynamic dictionary, and the target image.

[0057] In this embodiment of the invention, to calculate the loss value, step 105 specifically includes: calculating the similarity loss based on the horizontal plane feature vector and each coronal plane feature vector in the dynamic dictionary; calculating the pixel loss based on the target image and the horizontal plane stitched image; and adding the similarity loss and the pixel loss based on a preset temperature coefficient to obtain the loss value.

[0058] The preset temperature coefficient is used to control the concentration level of the distribution and can be set according to actual business needs.

[0059] Specifically, horizontal feature vectors from the same lesion show high similarity to coronal feature vectors, while horizontal feature vectors from different lesions show low similarity to coronal feature vectors. Based on this, a similarity loss function can be constructed to calculate the similarity loss. Simultaneously, the pixel difference between the reconstructed target image and the horizontally stitched image is calculated to obtain the pixel loss. Then, based on a preset temperature coefficient, the similarity loss and pixel loss are added together to obtain the current loss value.

[0060] Step 106: Based on the loss value, perform joint iterative training on the encoder, the momentum encoder, and the decoder until the preset conditions are met, and construct a preset classification model for pulmonary granulomatous nodules and solid pulmonary adenocarcinoma.

[0061] In this embodiment of the invention, when iteratively training the MCMAE-NET network, step 106 specifically includes: updating the parameters of the encoder and the decoder according to the loss value to obtain the updated encoder and the updated decoder; updating the parameters of the momentum encoder based on the parameters in the updated encoder to obtain the updated momentum encoder; repeating the iterative update process of the encoder, the momentum encoder and the decoder until a preset number of iterations is reached, and outputting the finally trained encoder; constructing the preset classification model of pulmonary granulomatous nodules and solid pulmonary adenocarcinoma based on the finally trained encoder and the fully connected layers.

[0062] Specifically, the momentum encoder performs a momentum update operation during the update process, meaning the parameters in the momentum encoder change along with the encoder. After updating the encoder, a portion of its updated parameters is used to modify the momentum encoder; for example, 1% of the encoder's updated parameters are used to modify the momentum encoder. This method ensures slow and controlled updates to the momentum encoder. When the loss value is minimized or the preset number of iterations is reached, iterative training stops, and the final trained encoder is output. A fully connected layer is then connected after the trained encoder for task classification. Finally, the trained encoder and the fully connected layer are fine-tuned to obtain the preset classification model for pulmonary granulomatous nodules and solid lung adenocarcinoma.

[0063] This invention provides a hybrid self-supervised learning method for assessing the malignancy of single pulmonary nodules based on CT images. By training the encoder, momentum encoder, and decoder, it successfully integrates the advantages of contrastive and generative self-supervised learning. This synthesis enables the model to not only extract features from medical images from a global perspective but also to meticulously examine unique image regions, thereby enhancing the ability of pre-defined classification models for pulmonary granulomatous nodules and solid pulmonary adenocarcinoma. This invention innovatively proposes a groundbreaking self-supervised learning method that fully utilizes the comprehensive contextual information present in medical imaging. Compared with traditional 2D medical imaging methods, this invention promotes the extraction of multifaceted lesion features, ensures the thoroughness of data representation, and maximizes the information content of the samples.

[0064] Furthermore, as a refinement and extension of the specific implementation of the above embodiments, and to fully illustrate the implementation of this embodiment, this embodiment also provides another hybrid self-supervised learning method based on CT images for determining the malignancy of a single pulmonary nodule, such as... Figure 4 As shown, the method includes:

[0065] Step 201: Obtain horizontal and coronal images of the lung lesions.

[0066] In the field of self-supervised learning, this invention investigates the synergistic effect of contrastive self-supervised learning and generative self-supervised learning. Contrastive self-supervised learning, as the name suggests, reveals unique image features by comparing positive and negative instances in a high-dimensional space. On the other hand, generative self-supervised learning utilizes the ability of image reconstruction to acquire valuable image information. It can be seen that masked autoencoders (MAEs) tend to adopt a global perspective when considering images, while momentum contrastive (MoCo) methods tend to examine unique image regions and strategically locate positive and negative instances.

[0067] MoCo is a model based on contrastive self-supervised learning that excels in various vision tasks. It employs a unique strategy involving a dynamic dictionary and momentum-based updates. This approach enables MoCo to efficiently learn robust and unique features by contrasting positive and negative data samples. MAE, on the other hand, is a model based on generative self-supervised learning that also performs well in various vision tasks. It leverages an asymmetric encoder-decoder architecture, and its success is largely attributed to the use of the Vanilla Vision Transformer, which extracts global features from the input data. This architecture allows MAE to efficiently reconstruct missing parts of the input, thereby learning comprehensive representations.

[0068] To optimize model training and enhance the capabilities of the Transformer, this embodiment of the invention introduces hybrid self-supervised learning as a key intermediate task in MCMAE-NET. This strategic addition enables the Transformer to explore the unique properties of features and the overall image context.

[0069] The detailed network architecture of the MCMAE-NET model is as follows: Figure 5 As shown, the encoder and momentum encoder employ the ViT-large model, characterized by 24 stacked encoder blocks, a token vector length of 1024, and 16 heads in a multi-head attention mechanism. These modules are specifically designed to process unmasked image slices. The image data processing flow of this scheme is described in general below. First, the image data is flattened into a one-dimensional array. These arrays are then linearly transformed and merged with the positional embeddings of the original image, augmented by adding a class tag at the beginning. Due to the asymmetric encoder-decoder architecture, where the encoder processes only unmasked slices to save computational resources, the decoder is configured with 8 stacked decoder blocks and a token vector length of 512. The decoder's task is to process not only the unmasked tokens encoded by the encoder but also the masked tokens. It is important to note that these masked tokens are not derived from the embedding transformations of previously masked slices; instead, they are learnable and shared across all masked slices.

[0070] In traditional contrastive learning, the dictionary size is typically equal to the mini-batch size. However, this approach is often limited by GPU memory and computational power, preventing the use of large batch sizes. To address this limitation, the MoCo framework employs a queue-based mechanism to store the dictionary, which contains the feature vectors of the images, allowing for a larger dictionary size. Nevertheless, choosing an excessively large dictionary may hinder model convergence when the dataset size is not exceptionally large. Therefore, this embodiment of the invention chooses to set the dictionary size to 700, with each vector having a dimension of 128. Queue maintenance involves enqueuing the feature vectors of the latest batch of images and dequeuing the feature vectors of the earliest batch.

[0071] To ensure the consistency of keys in the queue, it is necessary to gradually update the momentum encoder associated with the dictionary. This is achieved by implementing a momentum-based approach. Specifically, after updating the encoder, only 1% of its updated parameters are used to modify the momentum encoder. This approach ensures slow and controlled updates to the momentum encoder, maintaining the stability and consistency of keys in the queue.

[0072] The above is the overall processing flow for image data. When specifically training the model, the process of obtaining the horizontal plane image and the coronal plane image is exactly the same as step 101, and will not be repeated here.

[0073] Step 202: Input the horizontal plane image into the encoder for feature extraction to obtain the horizontal plane feature vector of the lung lesion.

[0074] In this embodiment of the invention, step 202 specifically includes the following steps for the encoder to extract the horizontal feature vector: calculating the position embedding value of the unmasked slice in the horizontal image; extracting the embedding vector of the horizontal image using the convolutional layer in the encoder; superimposing the position embedding value of the unmasked slice in the horizontal image with the embedding vector of the horizontal image to obtain the feature vector of the horizontal image; masking the feature vector of the horizontal image to obtain the masked feature vector of the horizontal image; adding the target embedding vector of the horizontal image; concatenating the target embedding vector of the horizontal image with the masked feature vector of the horizontal image to obtain the target feature vector of the horizontal image; and processing the target feature vector of the horizontal image using the converter model in the encoder to obtain the horizontal feature vector of the lung lesion.

[0075] Specifically, such as Figure 5As shown, the horizontal plane image is input into a two-dimensional convolutional layer for feature extraction and array transformation to obtain an embedding vector of 25×1024. Then, the embedding vector is superimposed with the position embedding values of the unoccluded slices to obtain the feature vector of the horizontal plane image, which is also 25×1024. The specific formula for calculating the position embedding values is as follows:

[0076]

[0077] Where PE(pos,2i) and PE(pos,2i+1) represent the position embedding values in the even (2i) and odd (2i+1) dimensions, respectively, pos represents a specific position in the sequence, and d model This refers to the embedding dimension of the model. This embodiment of the invention enhances the label embedding by integrating location data, thereby achieving a more detailed and comprehensive representation of image spatial relationships.

[0078] Next, the feature vector 25×1024 of the horizontal plane image is masked to obtain the masked feature vector 12×1024, which is then concatenated with the target embedding vector Class Token to obtain the target feature vector 13×1024. Finally, the target feature vector 13×1024×L(24) is input into the Transformer Blocks model for processing, and then normalized to obtain the horizontal plane feature vector 13×1024.

[0079] Step 203: Input the coronal image into the momentum encoder for feature extraction to obtain the coronal feature vector of the lung lesion, and store the coronal feature vector in the dynamic dictionary.

[0080] In this embodiment of the invention, step 203 specifically includes the following steps for the process of extracting coronal feature vectors using a momentum encoder: calculating the position embedding values of unmasked slices in the coronal image; extracting the embedding vector of the coronal image using a convolutional layer in the momentum encoder; superimposing the position embedding values of unmasked slices in the coronal image with the embedding vector of the coronal image to obtain the feature vector of the coronal image; masking the feature vector of the coronal image to obtain the masked feature vector of the coronal image; adding the target embedding vector of the coronal image; concatenating the target embedding vector of the coronal image with the masked feature vector of the coronal image to obtain the target feature vector of the coronal image; and processing the target feature vector of the coronal image using a converter model in the momentum encoder to obtain the coronal feature vector of the lung lesion.

[0081] Specifically, such as Figure 5As shown, the coronal image is input into a two-dimensional convolutional layer for feature extraction and array transformation to obtain an embedding vector of 25×1024. Then, the embedding vector is superimposed with the position embedding values of the unmasked slices to obtain the feature vector of the coronal image of 25×1024. The specific calculation formula for the position embedding values is the same as that in step 202.

[0082] Next, the feature vector 25×1024 of the coronal image is masked to obtain the masked feature vector 12×1024, which is then concatenated with the target embedding vector Class Token to obtain the target feature vector 13×1024. Finally, the target feature vector 13×1024×L(24) is input into the Transformer Blocks model for processing, and then normalized to obtain the coronal feature vector 13×1024.

[0083] Step 204: Input the horizontal plane feature vector and the coronal plane feature vector into their respective one-dimensional convolutional layers for dimensionality reduction to obtain the reduced horizontal plane feature vector and the reduced coronal plane feature vector.

[0084] The embodiments of the present invention introduce a feature flattening module, namely a one-dimensional convolutional layer, which can reduce the dimensionality of the extracted lesion features. This integration makes the combination of contrastive and generative self-supervised learning techniques more effective.

[0085] Step 205: Based on the reduced horizontal feature vector and the decoder, perform pixel reconstruction on the occluded part in the horizontal image to obtain the target image.

[0086] In this embodiment of the invention, step 205 specifically includes the following steps for pixel reconstruction: calculating the position embedding value of the occluded slice in the horizontal plane image; performing dimensionality reduction processing on the horizontal plane feature vector using the fully connected layer of the decoder to obtain a dimensionality-reduced horizontal plane feature vector; combining the mask token of the occluded slice in the horizontal plane image with the dimensionality-reduced horizontal plane feature vector to obtain a combined horizontal plane feature vector; superimposing the combined horizontal plane feature vector with the position embedding values of the occluded and unoccluded slices in the horizontal plane image to obtain a superimposed horizontal plane feature vector; and processing the superimposed horizontal plane feature vector using the converter model in the decoder to obtain the target image.

[0087] Specifically, such as Figure 5As shown, the horizontal feature vector is input into a fully connected layer for dimensionality reduction, resulting in a 13×512 horizontal feature vector. This vector is then combined with the mask token of the occluded slice to obtain a combined horizontal feature vector of 26×512. The combined horizontal feature vector is then superimposed with the position embedding value to obtain a superimposed horizontal feature vector of 26×512. This superimposed horizontal feature vector is then input into the TransformerBlocks model for processing. Finally, after normalization, a fully connected layer, and removal of the target embedding vector, the target image of 250×250×3 is obtained.

[0088] Step 206: Calculate the loss value based on the reduced horizontal feature vector, the reduced coronal feature vectors in the dynamic dictionary, and the target image.

[0089] In this embodiment of the invention, the horizontal plane feature vector q and the coronal plane feature vector k are dimensionally reduced through their respective one-dimensional convolutional layers to generate the reduced horizontal plane feature vector q. + and the reduced water crown feature vector k + It should be noted that k + The system locates its location in a queue-like in-memory database, which evolves continuously with the introduction of new training batches, seamlessly replacing the oldest batches. Therefore, this embodiment of the invention uses the InfoNCE loss function to measure q. + and k + The similarity between them serves as the cornerstone of the comparison task. The formula for the InfoNCE loss function is as follows:

[0090]

[0091] Where, q + The reduced horizontal feature vector serves as the query key, while Each reduced coronal feature vector in the dynamic dictionary serves as a key in the dictionary. Among these, q... + and The positive pair is considered as a positive sample pair, while other bonds are treated as q. + Negative sample pairs, With q + They originate from the same lesion. Furthermore, τ represents a temperature parameter used to control the concentration level of the distribution. By minimizing the infoNCE loss, the model is trained to effectively distinguish between positive and negative sample pairs, thereby enhancing its ability to learn discriminative features from the data.

[0092] Furthermore, the decoder's task is to predict and reconstruct the pixel values of the occluded slice. Subsequently, the mean squared error (MSE) loss function is used to quantify the difference between the predicted pixel values and the original pixel values. The formula for the MSE loss function is as follows:

[0093]

[0094] Where N represents the total number of missing pixels in the original image. This represents the actual pixel value of the i-th pixel. This represents the pixel value predicted by the model. By minimizing the MSE loss, the model is trained to accurately reconstruct each pixel, thereby facilitating the learning of global features of the image.

[0095] Finally, the InfoNCE loss function and the MSE loss function are combined using specific weighting coefficients λ to form the loss function for optimizing the model. The loss value can be calculated using this loss function. The formula for the loss function is as follows:

[0096] L=λ×L MSE +(1-λ)×L InfoNCE

[0097] Wherein, λ is the weighting coefficient, which can be set according to actual business needs.

[0098] Step 207: Based on the loss value, perform joint iterative training on the encoder, the momentum encoder, and the decoder until the preset conditions are met, and construct a preset classification model for pulmonary granulomatous nodules and solid pulmonary adenocarcinoma.

[0099] In this embodiment of the invention, the encoder and momentum encoder have the same dimension. During training, the momentum encoder performs momentum update operations based on the encoder's updates, meaning the parameters of the momentum encoder change as the encoder's parameters change. After training is complete, a fully connected layer is connected after the trained encoder for task classification. Finally, the trained encoder and fully connected layer are fine-tuned to obtain the preset classification model for pulmonary granulomatous nodules and solid lung adenocarcinoma.

[0100] In specific training, this embodiment of the invention used an NVIDIA GeForce RTX 4070 with 12GB of memory. The MCMAE-NET model was trained and tested in PyTorch (version 1.7), including all variants and comparison models. All graphs were created using matplotlib in Python. The network in this embodiment is an evolution of MAE and MoCo; pre-trained model parameters cannot be directly loaded into MCMAE-NET, therefore training from scratch is required. Fortunately, after several iterations, the network parameters eventually converged optimally.

[0101] During the pre-training phase, the image size was 250×250, and the batch size was set to 64. This embodiment of the invention uses the AdamW optimizer for optimization, setting betas = (0.9, 0.95), the training epochs to 100, and the initial learning rate to 1×10⁻², decreasing by 0.1 at the 120th and 160th epochs. In the fine-tuning phase of the downstream task, this embodiment utilizes the pre-trained encoder module, adding a two-class fully connected layer while maintaining the same image size and batch size. The training epochs are 100, the initial learning rate is 1×10⁻³, and the learning rate decreases by 0.1 at the 40th and 70th epochs.

[0102] Furthermore, this embodiment of the invention selects commonly used evaluation metrics to assess the model's performance. These specifically include the area under the curve (AUC) with a 95% confidence interval (95% CI), accuracy (ACC), sensitivity (SEN), and specificity (SPE). AUC quantifies the model's ability to distinguish between classes.

[0103]

[0104] The True Positive Rate (TPR) and False Positive Rate (FPR) vary with different thresholds t. The AUC is presented along with its 95% confidence interval, providing a statistical range indicating the interval within which the true AUC value is likely to be at a 95% confidence level. This metric enhances the interpretability and reliability of the AUC indicator.

[0105] ACC reflects the model's overall performance in correctly classifying positive and negative samples. SEN plays a crucial role in determining the model's ability to identify truly positive samples, which is essential to ensuring that no real cases are missed. Conversely, SPE assesses the model's ability to accurately identify negative samples, which is crucial for minimizing false positives.

[0106]

[0107] Where TP represents a true positive, TN represents a true negative, FP represents a false positive, and FN represents a false negative.

[0108] In actual classification, the horizontal plane image of the lesion to be predicted is directly obtained and input into the preset classification model of pulmonary granulomatous nodules and solid pulmonary adenocarcinoma to distinguish whether the lesion belongs to pulmonary granulomatous nodules or solid pulmonary adenocarcinoma.

[0109] This invention provides another hybrid self-supervised learning method for assessing the malignancy of single pulmonary nodules based on CT images. By training the encoder, momentum encoder, and decoder, it successfully integrates the advantages of contrastive and generative self-supervised learning. This synthesis enables the model to not only extract features from medical images from a global perspective but also to meticulously examine unique image regions, thereby enhancing the ability of the pre-defined classification model for pulmonary granulomatous nodules and solid pulmonary adenocarcinoma. This invention innovatively proposes a groundbreaking self-supervised learning method that fully utilizes the comprehensive contextual information present in medical imaging. Compared with traditional 2D medical imaging methods, this invention promotes the extraction of multifaceted lesion features, ensures the thoroughness of data representation, and maximizes the information content of the samples.

[0110] Furthermore, as Figure 1 and Figure 4 The specific implementation of the method shown in this embodiment provides a hybrid self-supervised learning device for determining the malignancy of a single pulmonary nodule based on CT images, such as... Figure 6 As shown, the device includes: an acquisition unit 31, a first extraction unit 32, a second extraction unit 33, a reconstruction unit 34, a calculation unit 35, and a training unit 36.

[0111] The acquisition unit 31 can be used to acquire horizontal and coronal images of lung lesions.

[0112] The first extraction unit 32 can be used to input the horizontal plane image into the encoder for feature extraction to obtain the horizontal plane feature vector of the lung lesion.

[0113] The second extraction unit 33 can be used to input the coronal image into a momentum encoder for feature extraction, obtain the coronal feature vector of the lung lesion, and store the coronal feature vector in a dynamic dictionary.

[0114] The reconstruction unit 34 can be used to perform pixel reconstruction of the occluded part in the horizontal plane image based on the horizontal plane feature vector and the decoder to obtain the target image.

[0115] The calculation unit 35 can be used to calculate the loss value based on the horizontal plane feature vector, each coronal plane feature vector in the dynamic dictionary and the target image.

[0116] The training unit 36 can be used to perform joint iterative training on the encoder, the momentum encoder and the decoder based on the loss value until a preset condition is met, thereby constructing a preset classification model for pulmonary granulomatous nodules and solid pulmonary adenocarcinoma.

[0117] In some embodiments, the acquisition unit 31 may be specifically used to acquire horizontal and coronal slices of lung lesions; stitch the horizontal and coronal slices together to obtain a horizontally stitched image and a coronally stitched image; and randomly mask the slices in the horizontally stitched image and the coronally stitched image to obtain the horizontal image and the coronal image.

[0118] In some embodiments, the first extraction unit 32 may be specifically used to calculate the position embedding value of the unmasked slice in the horizontal plane image; extract the embedding vector of the horizontal plane image using the convolutional layer in the encoder; superimpose the position embedding value of the unmasked slice in the horizontal plane image with the embedding vector of the horizontal plane image to obtain the feature vector of the horizontal plane image; mask the feature vector of the horizontal plane image to obtain the masked feature vector of the horizontal plane image; add the target embedding vector of the horizontal plane image; concatenate the target embedding vector of the horizontal plane image with the masked feature vector of the horizontal plane image to obtain the target feature vector of the horizontal plane image; and process the target feature vector of the horizontal plane image using the converter model in the encoder to obtain the horizontal plane feature vector of the lung lesion.

[0119] In some embodiments, the second extraction unit 33 may be specifically used to calculate the position embedding value of the unmasked slice in the coronal image; extract the embedding vector of the coronal image using the convolutional layer in the momentum encoder; superimpose the position embedding value of the unmasked slice in the coronal image with the embedding vector of the coronal image to obtain the feature vector of the coronal image; mask the feature vector of the coronal image to obtain the masked feature vector of the coronal image; add the target embedding vector of the coronal image; concatenate the target embedding vector of the coronal image with the masked feature vector of the coronal image to obtain the target feature vector of the coronal image; and process the target feature vector of the coronal image using the converter model in the momentum encoder to obtain the coronal feature vector of the lung lesion.

[0120] In some embodiments, the device further includes a dimensionality reduction unit.

[0121] The dimensionality reduction unit can be used to input the horizontal plane feature vector and the coronal plane feature vector into their respective one-dimensional convolutional layers for dimensionality reduction, so as to obtain the reduced horizontal plane feature vector and the reduced coronal plane feature vector.

[0122] The reconstruction unit 34 can be specifically used to perform pixel reconstruction on the occluded part of the horizontal plane image based on the reduced horizontal plane feature vector and the decoder to obtain the target image.

[0123] The calculation unit 35 can be specifically used to calculate the loss value based on the reduced horizontal feature vector, the reduced coronal feature vectors in the dynamic dictionary, and the target image.

[0124] In some embodiments, the reconstruction unit 34 may further be used to calculate the position embedding value of the occluded slice in the horizontal plane image; perform dimensionality reduction processing on the horizontal plane feature vector using the fully connected layer of the decoder to obtain a dimensionality-reduced horizontal plane feature vector; combine the mask token of the occluded slice in the horizontal plane image with the dimensionality-reduced horizontal plane feature vector to obtain a combined horizontal plane feature vector; superimpose the combined horizontal plane feature vector with the position embedding values of the occluded and unoccluded slices in the horizontal plane image to obtain a superimposed horizontal plane feature vector; and process the superimposed horizontal plane feature vector using the converter model in the decoder to obtain the target image.

[0125] In some implementations, the calculation unit 35 may be specifically used to calculate similarity loss based on the horizontal plane feature vector and each coronal plane feature vector in the dynamic dictionary; calculate pixel loss based on the target image and the horizontal plane stitched image; and add the similarity loss and the pixel loss based on a preset temperature coefficient to obtain the loss value.

[0126] In some implementations, the training unit 36 may be specifically used to update the parameters of the encoder and the decoder according to the loss value to obtain an updated encoder and an updated decoder; update the parameters of the momentum encoder based on the parameters in the updated encoder to obtain an updated momentum encoder; repeat the iterative update process of the encoder, the momentum encoder and the decoder until a preset number of iterations is reached, and output the finally trained encoder; and construct the preset classification model of pulmonary granulomatous nodules and solid pulmonary adenocarcinoma based on the finally trained encoder and the fully connected layers.

[0127] It should be noted that other corresponding descriptions of the functional units involved in the hybrid self-supervised learning device for determining the malignancy of a single pulmonary nodule based on CT images provided in this embodiment of the invention can be found in the following references. Figure 1 and Figure 4 The corresponding descriptions in [the document] will not be repeated here.

[0128] Based on the above, Figure 1 and Figure 4Accordingly, this embodiment also provides a storage medium storing a computer program that, when executed by a processor, implements the above-described method. Figure 1 and Figure 5 The method shown is a hybrid self-supervised learning method for determining the malignancy of a single pulmonary nodule based on CT images.

[0129] Based on this understanding, the technical solution of this application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (such as CD-ROM, USB flash drive, mobile hard drive, etc.) and includes several instructions to cause an electronic device (such as a personal computer, server, or network device, etc.) to execute the methods of various implementation scenarios of this application.

[0130] Based on the above, Figure 1 and Figure 4 The method shown, and Figure 6 To achieve the above objectives, the present application also provides an electronic device, specifically a personal computer, tablet computer, server, or other network device, as shown in the virtual device embodiment. This device includes a storage medium and a processor; the storage medium stores a computer program; the processor executes the computer program to achieve the above-described objectives. Figure 1 and Figure 4 The method shown is a hybrid self-supervised learning method for determining the malignancy of a single pulmonary nodule based on CT images.

[0131] Optionally, the aforementioned physical devices may also include a user interface, a network interface, a camera, radio frequency (RF) circuitry, sensors, audio circuitry, a Wi-Fi module, etc. The user interface may include a display screen, input units such as a keyboard, etc., and optional user interfaces may also include USB interfaces, card reader interfaces, etc. The network interface may optionally include standard wired interfaces, wireless interfaces (such as Wi-Fi interfaces), etc.

[0132] Those skilled in the art will understand that the physical device structure provided in this embodiment does not constitute a limitation on the physical device, and may include more or fewer components, or combine certain components, or have different component arrangements.

[0133] The storage medium may also include an operating system and a network communication module. The operating system is a program that manages the hardware and software resources of the aforementioned physical device, supporting the operation of information processing programs and other software and / or programs. The network communication module is used to enable communication between the various components within the storage medium, as well as communication with other hardware and software in the information processing physical device.

[0134] Through the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general-purpose hardware platform, or it can be implemented by hardware.

[0135] This invention successfully integrates the advantages of contrastive self-supervised learning and generative self-supervised learning by training the encoder, momentum encoder, and decoder. This synthesis enables the model to not only extract features from medical images from a global perspective but also to meticulously examine unique image regions, thereby enhancing the discrimination ability of the pre-defined classification model for pulmonary granulomatous nodules and solid pulmonary adenocarcinoma. This invention innovatively proposes a groundbreaking self-supervised learning method that fully utilizes the comprehensive contextual information present in medical imaging. Compared with traditional 2D medical imaging methods, this invention promotes the extraction of multifaceted lesion features, ensures the thoroughness of data representation, and maximizes the information content of the samples.

[0136] Those skilled in the art will understand that the accompanying drawings are merely schematic diagrams of a preferred embodiment, and the modules or processes shown in the drawings are not necessarily essential for implementing this application. Those skilled in the art will understand that the modules in the apparatus of the embodiment can be distributed within the apparatus of the embodiment as described, or can be modified to be located in one or more apparatuses different from this embodiment. The modules of the above-described embodiment can be combined into one module, or further divided into multiple sub-modules.

[0137] The serial numbers in this application are for descriptive purposes only and do not represent the superiority or inferiority of any particular implementation scenario. The above disclosures are merely a few specific implementation scenarios of this application; however, this application is not limited thereto, and any variations conceived by those skilled in the art should fall within the protection scope of this application.

Claims

1. A hybrid self-supervised learning method for malignancy degree of single pulmonary nodule based on CT images, characterized in that, include: Obtain horizontal and coronal images of lung lesions; The horizontal plane image is input into an encoder for feature extraction to obtain the horizontal plane feature vector of the lung lesion; The coronal image is input into a momentum encoder for feature extraction to obtain the coronal feature vector of the lung lesion, and the coronal feature vector is stored in a dynamic dictionary; Based on the horizontal plane feature vector and the decoder, pixel reconstruction is performed on the occluded part in the horizontal plane image to obtain the target image; Based on the horizontal plane feature vector, the coronal plane feature vectors in the dynamic dictionary, and the target image, the loss value is calculated; Based on the loss value, the encoder, the momentum encoder, and the decoder are jointly iteratively trained until preset conditions are met, thereby constructing a preset classification model for pulmonary granulomatous nodules and solid pulmonary adenocarcinoma.

2. The method of claim 1, wherein, Obtain horizontal and coronal images of lung lesions, including: Obtain horizontal and coronal sections of lung lesions; The horizontal plane slices and the coronal plane slices are stitched together to obtain a horizontal plane stitched image and a coronal plane stitched image; The slices in the horizontal plane mosaic image and the coronal plane mosaic image are randomly masked to obtain the horizontal plane image and the coronal plane image, respectively.

3. The method of claim 1, wherein, The horizontal plane image is input into an encoder for feature extraction to obtain the horizontal plane feature vector of the lung lesion, including: Calculate the embedding value of the unmasked slice in the horizontal plane image; The embedding vector of the horizontal plane image is extracted using the convolutional layer in the encoder; The embedding values of the unmasked slices in the horizontal plane image are superimposed with the embedding vector of the horizontal plane image to obtain the feature vector of the horizontal plane image; The feature vectors of the horizontal plane image are masked to obtain the masked feature vectors of the horizontal plane image; Add the target embedding vector of the horizontal plane image, and concatenate the target embedding vector of the horizontal plane image with the feature vector of the occluded horizontal plane image to obtain the target feature vector of the horizontal plane image; The target feature vector of the horizontal plane image is processed using the converter model in the encoder to obtain the horizontal plane feature vector of the lung lesion; and / or The coronal image is input into a momentum encoder for feature extraction to obtain the coronal feature vector of the lung lesion, including: Calculate the embedding value of the location of the unmasked slice in the coronal image; The embedding vector of the coronal image is extracted using the convolutional layer in the momentum encoder; The embedding values of the unmasked slices in the coronal image are superimposed with the embedding vector of the coronal image to obtain the feature vector of the coronal image; The feature vectors of the coronal image are masked to obtain the masked feature vectors of the coronal image; Add the target embedding vector of the coronal image, and concatenate the target embedding vector of the coronal image with the feature vector of the occluded coronal image to obtain the target feature vector of the coronal image; The target feature vector of the coronal image is processed using the converter model in the momentum encoder to obtain the coronal feature vector of the lung lesion.

4. The method according to claim 1, characterized in that, The method further includes: The horizontal plane feature vector and the coronal plane feature vector are respectively input into their respective one-dimensional convolutional layers for dimensionality reduction, resulting in the reduced horizontal plane feature vector and the reduced coronal plane feature vector; Based on the horizontal plane feature vector and the decoder, pixel reconstruction is performed on the occluded portion of the horizontal plane image to obtain the target image, including: Based on the reduced horizontal plane feature vector and the decoder, pixel reconstruction is performed on the occluded part in the horizontal plane image to obtain the target image; Based on the horizontal plane feature vector, each coronal feature vector in the dynamic dictionary, and the target image, the loss value is calculated, including: The loss value is calculated based on the reduced horizontal feature vector, the reduced coronal feature vectors in the dynamic dictionary, and the target image.

5. The method according to claim 3, characterized in that, Based on the horizontal plane feature vector and the decoder, pixel reconstruction is performed on the occluded portion of the horizontal plane image to obtain the target image, including: Calculate the embedding value of the occluded slice in the horizontal plane image; The horizontal surface feature vector is reduced in dimension by using the fully connected layer of the decoder to obtain the reduced horizontal surface feature vector. The mask token of the occluded slice in the horizontal plane image is combined with the dimension-reduced horizontal plane feature vector to obtain the combined horizontal plane feature vector. The combined horizontal feature vector is superimposed with the position embedding values of the occluded and unoccluded slices in the horizontal image to obtain the superimposed horizontal feature vector. The target image is obtained by processing the superimposed horizontal feature vector using the converter model in the decoder.

6. The method of claim 2, wherein, Based on the horizontal plane feature vector, each coronal feature vector in the dynamic dictionary, and the target image, the loss value is calculated, including: Based on the horizontal feature vector and the coronal feature vectors in the dynamic dictionary, the similarity loss is calculated. Pixel loss is calculated based on the target image and the horizontal plane stitched image; Based on a preset temperature coefficient, the similarity loss and the pixel loss are added together to obtain the loss value.

7. The method of claim 1, wherein, Based on the loss value, the encoder, the momentum encoder, and the decoder are jointly iteratively trained until preset conditions are met, thereby constructing a preset classification model for pulmonary granulomatous nodules and solid pulmonary adenocarcinoma, including: Based on the loss value, the encoder and the decoder are updated to obtain the updated encoder and the updated decoder; Based on the parameters in the updated encoder, the parameters in the momentum encoder are updated to obtain the updated momentum encoder; Repeat the iterative update process of the encoder, the momentum encoder and the decoder until the preset number of iterations is reached, and then output the final trained encoder; Based on the finally trained encoder and fully connected layers, the preset classification model for pulmonary granulomatous nodules and solid pulmonary adenocarcinoma is constructed. 8.A hybrid self-supervised learning device for malignancy degree of a single pulmonary nodule based on a CT image, characterized by, include: The acquisition unit is used to acquire horizontal and coronal images of lung lesions; The first extraction unit is used to input the horizontal plane image into the encoder for feature extraction to obtain the horizontal plane feature vector of the lung lesion; The second extraction unit is used to input the coronal image into the momentum encoder for feature extraction, obtain the coronal feature vector of the lung lesion, and store the coronal feature vector in the dynamic dictionary; The reconstruction unit is used to perform pixel reconstruction of the occluded part in the horizontal plane image based on the horizontal plane feature vector and the decoder to obtain the target image; The calculation unit is used to calculate the loss value based on the horizontal plane feature vector, each coronal plane feature vector in the dynamic dictionary, and the target image; The training unit is used to perform joint iterative training on the encoder, the momentum encoder and the decoder based on the loss value until a preset condition is met, thereby constructing a preset classification model for pulmonary granulomatous nodules and solid pulmonary adenocarcinoma.

9. A storage medium having stored thereon a computer program, characterized in that When the computer program is executed by a processor, it implements the method of any one of claims 1 to 7.

10. An electronic device comprising a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, characterized in that, When the processor executes the computer program, it implements the method of any one of claims 1 to 7.