Self-supervised training method and device of feature encoder, equipment and medium

By transforming and feature-encoding lung nodule images using a self-supervised training method, the problem of the accuracy of lung nodule image feature extraction being affected by manual annotation is solved, and efficient lung nodule image retrieval is achieved.

CN118247198BActive Publication Date: 2026-06-12GUANGZHOU SHIYUAN ELECTRONICS CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
GUANGZHOU SHIYUAN ELECTRONICS CO LTD
Filing Date
2022-12-21
Publication Date
2026-06-12

Smart Images

  • Figure CN118247198B_ABST
    Figure CN118247198B_ABST
Patent Text Reader

Abstract

The application belongs to the technical field of image processing, and discloses a self-supervised training method, device, equipment and medium of a feature encoder, wherein the method comprises the following steps: performing first image transformation on a lung nodule training image to obtain a first transformed lung nodule image; performing second image transformation on the lung nodule training image to obtain a second transformed lung nodule image; inputting the first transformed lung nodule image into a first feature encoder to perform feature encoding and obtain a first feature vector; inputting the second transformed lung nodule image into a second feature encoder to perform feature encoding and obtain a second feature vector; predicting the first feature vector to obtain a predicted feature vector; calculating a contrast loss function value according to the predicted feature vector and the second feature vector; and performing back propagation according to the contrast loss function value to update the first feature encoder parameters. The method does not need to use lung nodule images containing artificial annotation contours to train the first feature encoder, and the extracted feature vector has a high accuracy.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of image processing technology, and for example to a self-supervised training method, apparatus, device, and medium for a feature encoder. Background Technology

[0002] In lung cancer diagnosis and treatment, computer-aided diagnostic technology based on CT lung image retrieval can help doctors quickly retrieve similar past cases for comparative analysis, effectively improving the accuracy and efficiency of diagnosis and treatment. The retrieval process requires comparing the similarity of features, then sorting them from highest to lowest similarity, and outputting the most similar cases for doctors' reference. Features are used to describe and differentiate images, and the quality of their extraction is crucial for the retrieval. Currently, there are generally two methods for extracting lung nodule features: one is to extract semantic features from lung nodule images through semantic description, but the semantic meaning of these features is difficult to standardize and lacks clear criteria; the other is to extract lung nodule image features through manually annotated lung nodule regions, but the accuracy of feature extraction is significantly affected by manually annotated contours.

[0003] In summary, existing technologies suffer from several problems: it is difficult to unify the semantic meaning of the semantic features of lung nodule images, and the accuracy of extracting lung nodule image features is greatly affected by manually annotated contours. Summary of the Invention

[0004] The purpose of this application is to provide a self-supervised training method, apparatus, device, and medium for a feature encoder, which can solve the problem that the accuracy of extracting lung nodule image features is greatly affected by manually annotated contours in the existing technology.

[0005] To achieve the above objectives, in a first aspect, this application provides a self-supervised training method for a feature encoder, comprising:

[0006] Acquire training images of lung nodules, and perform a first image transformation on the training images of lung nodules to obtain first transformed lung nodule images;

[0007] The lung nodule training image is subjected to a second image transformation to obtain a second transformed lung nodule image;

[0008] The first transformed lung nodule image is input into the first feature encoder for feature encoding to obtain the first feature vector;

[0009] The second transformed lung nodule image is input into the second feature encoder for feature encoding to obtain the second feature vector;

[0010] The first feature vector is predicted to obtain the predicted feature vector;

[0011] Calculate the comparison loss function value based on the predicted feature vector and the second feature vector;

[0012] Backpropagation is performed based on the contrastive loss function value to update the parameters of the first feature encoder, thus obtaining the updated parameters of the first feature encoder.

[0013] Secondly, this application provides a self-supervised training device for a feature encoder, comprising:

[0014] The first image transformation module is used to acquire lung nodule training images and perform a first image transformation on the lung nodule training images to obtain a first transformed lung nodule image.

[0015] The second image transformation module is used to perform a second image transformation on the lung nodule training image to obtain a second transformed lung nodule image.

[0016] The first feature encoding module is used to input the first transformed lung nodule image into the first feature encoder for feature encoding to obtain a first feature vector;

[0017] The second feature encoding module is used to input the second transformed lung nodule image into the second feature encoder for feature encoding to obtain the second feature vector.

[0018] The first feature vector prediction module is used to predict the first feature vector to obtain the predicted feature vector;

[0019] The contrast loss function value calculation module is used to calculate the contrast loss function value based on the predicted feature vector and the second feature vector.

[0020] The first feature encoder parameter update module is used to perform backpropagation based on the contrast loss function value to update the first feature encoder parameters and obtain the updated first feature encoder parameters.

[0021] This application also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the self-supervised training method for a feature encoder as described in any of the preceding claims and / or the self-supervised training method for a feature encoder as described in any of the preceding claims.

[0022] This application also provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the self-supervised training method for a feature encoder as described in any of the preceding claims and / or the self-supervised training method for a feature encoder as described in any of the preceding claims.

[0023] This application discloses a self-supervised training method for a feature encoder, comprising: acquiring lung nodule training images; performing a first image transformation on the lung nodule training images to obtain a first transformed lung nodule image; performing a second image transformation on the lung nodule training images to obtain a second transformed lung nodule image. The specific transformation methods and / or specific transformation parameters of the first and second image transformations are different, and the image transformations can reflect the perturbations of the lung nodule training images without destroying or changing the visual structure of the lung nodules. The first transformed lung nodule image is input into a first feature encoder for feature encoding to obtain a first feature vector. The second transformed lung nodule image is input into a second feature encoder for feature encoding to obtain a second feature vector. The parameters of the first and second feature encoders are different, and the feature vector extracted from the transformed lung nodule images can better adapt to the lung nodule image retrieval task. The first feature vector is predicted to obtain a predicted feature vector. After prediction, the predicted feature vector and the second feature vector are not perfectly symmetrical. A contrastive loss function is calculated based on these two feature vectors. This contrastive loss function effectively reflects the differences between the first-transformed lung nodule image and the second-transformed lung nodule image. Backpropagation based on the contrastive loss function value has a fast convergence speed, allowing for rapid updating of the first feature encoder parameters. Training the first feature encoder does not require lung nodule images with manually annotated contours, and the extracted feature vectors have high accuracy. Attached Figure Description

[0024] Figure 1 This is a flowchart illustrating a self-supervised training method for a feature encoder according to one embodiment.

[0025] Figure 2 This is a flowchart illustrating the process of determining the first parameter difference in one embodiment.

[0026] Figure 3 This is a schematic diagram of a process for filtering saved lung nodule images according to one embodiment;

[0027] Figure 4 This is a schematic diagram illustrating the process of inputting a first transformed lung nodule image into a first feature encoder for feature encoding, as per one embodiment.

[0028] Figure 5 This is a schematic diagram of the process of performing a first image transformation on a training image of a lung nodule, according to one embodiment.

[0029] Figure 6 This is a schematic block diagram of a self-supervised training device for a feature encoder according to one embodiment.

[0030] Figure 7 This is a schematic block diagram of the structure of a computer device according to one embodiment.

[0031] The realization of the purpose, functional features and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0032] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0033] Those skilled in the art will understand that, unless specifically stated otherwise, the singular forms “a,” “an,” “the,” and “the” used herein may also include the plural forms. It should be further understood that the term “comprising” as used in this specification means the presence of features, integers, steps, operations, elements, modules, and / or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, modules, components, and / or groups thereof. It should be understood that when we say an element is “connected” or “coupled” to another element, it can be directly connected or coupled to the other element, or there may be intermediate elements. Furthermore, “connected” or “coupled” as used herein can include wireless connections or wireless coupling. The term “and / or” as used herein includes all or any modules and all combinations of one or more associated listed items.

[0034] It will be understood by those skilled in the art that, unless otherwise defined, all terms used herein (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. It should also be understood that terms such as those defined in general dictionaries should be understood to have the same meaning as in the context of the prior art, and should not be interpreted in an idealized or overly formal sense unless specifically defined as herein.

[0035] In one embodiment, refer to Figure 1 This is a flowchart illustrating the self-supervised training method for the feature encoder disclosed in this application, including the following steps S1-S7:

[0036] S1: Obtain lung nodule training images, and perform a first image transformation on the lung nodule training images to obtain a first transformed lung nodule image.

[0037] The lung nodule training image is a three-dimensional image containing the lung nodule region. The lung nodule training image can be a CT image or an MRI image.

[0038] The step of performing a first image transformation on the lung nodule training image to obtain a first transformed lung nodule image includes:

[0039] The lung nodule training images are cropped at the center to obtain cropped images;

[0040] The center-cropped image is resampled to obtain a resampled image;

[0041] The resampled image is rotated to obtain a rotated image;

[0042] The rotated image is translated to obtain the first transformed lung nodule image.

[0043] The first image transformation can ensure the visual similarity of the lung nodule regions and also ensure that the lung nodule regions are centered in the lung nodule training image.

[0044] Optionally, after obtaining the first transformed lung nodule image, Gaussian filtering is applied to the first transformed lung nodule image to remove image noise and improve the image quality of the first transformed lung nodule image.

[0045] The first image transformation can reflect the perturbation of the first lung nodule training image, without destroying or changing the visual structure of the lung nodule.

[0046] S2: Perform a second image transformation on the lung nodule training image to obtain a second transformed lung nodule image.

[0047] The second image transformation differs from the first image transformation in that it may include only one or more steps of center cropping, resampling, rotation, and translation. Alternatively, the second image transformation may include all steps of center cropping, resampling, rotation, and translation, but the angle of rotation and / or the distance of translation differs from the first image transformation.

[0048] The second image transformation can reflect the perturbation of the second lung nodule training image without destroying or changing the visual structure of the lung nodule.

[0049] S3: Input the first transformed lung nodule image into the first feature encoder for feature encoding to obtain the first feature vector.

[0050] The first transformed lung nodule image is sequentially input into the first convolutional layer to the Mth convolutional layer of the first feature encoder to obtain the first convolution result, where M≥2;

[0051] The first convolution result is input into the pooling layer of the first feature encoder for global average pooling to obtain the first pooling result.

[0052] The first pooling result is input into the multilayer perceptual network of the first feature encoder for mapping transformation to obtain the first feature vector.

[0053] The first feature encoder can be a ResNet-3D neural network or a ViT neural network. In this embodiment, a ResNet-3D-34 neural network is used as the first feature encoder.

[0054] The first feature vector is the image feature of the first transformed lung nodule image. Converting the first transformed lung nodule image into the first feature vector can reduce the amount of data, improve the computational efficiency, and better adapt to the lung nodule image retrieval task.

[0055] S4: Input the second transformed lung nodule image into the second feature encoder for feature encoding to obtain the second feature vector.

[0056] The second feature encoder has the same structure as the first feature encoder, but its parameters differ. The process of inputting the second transformed lung nodule image into the second feature encoder for feature encoding is the same as the process of inputting the first transformed lung nodule image into the first feature encoder for feature encoding, and will not be repeated here.

[0057] The second feature vector is the image feature of the second transformed lung nodule image. Converting the second transformed lung nodule image into the second feature vector can reduce the amount of data, improve the computational efficiency, and better adapt to the lung nodule image retrieval task.

[0058] S5: Predict the first feature vector to obtain the predicted feature vector.

[0059] The first feature vector is input into the first linear layer of the prediction network and linearly transformed to obtain the linearly transformed vector.

[0060] The linear transformation vector is input into the regularization layer of the prediction network for regularization to obtain a regularized vector;

[0061] The regularization vector is input into the activation layer of the prediction network to activate it, thus obtaining the activation vector;

[0062] The activation vector is input into the second linear layer of the prediction network and linearly transformed to obtain the predicted feature vector.

[0063] The prediction network is a multilayer sensing network. The number of layers in a multilayer sensing network can be 3, 4, or other numbers. This application example uses 4 layers.

[0064] The predicted feature vector and the second feature vector have the same dimension. After the first feature vector is predicted, the predicted feature vector and the second feature vector are not completely symmetrical.

[0065] S6: Calculate the comparison loss function value based on the predicted feature vector and the second feature vector.

[0066] The formula for calculating the comparison loss function value is as follows:

[0067]

[0068] Among them, L constractive To compare the loss function values, log is the exponential function, exp is the logarithmic function, τ is the first parameter of the contrastive loss function, q is the predicted feature vector, K is the total number of lung nodule training images, and i is the current lung nodule training image number. The second feature vector corresponding to the predicted feature vector. The second feature vector of the training images for other lung nodules, Together with q, they form a positive sample pair. It forms a negative sample pair with q.

[0069] S7: Perform backpropagation based on the contrast loss function value to update the parameters of the first feature encoder, and obtain the updated parameters of the first feature encoder.

[0070] The first feature encoder parameters include the learning rate and the maximum number of iterations.

[0071] This application discloses a self-supervised training method for a feature encoder, comprising: acquiring lung nodule training images; performing a first image transformation on the lung nodule training images to obtain a first transformed lung nodule image; performing a second image transformation on the lung nodule training images to obtain a second transformed lung nodule image. The specific transformation methods and / or specific transformation parameters of the first and second image transformations are different, and the image transformations can reflect the perturbations of the lung nodule training images without destroying or changing the visual structure of the lung nodules. The first transformed lung nodule image is input into a first feature encoder for feature encoding to obtain a first feature vector. The second transformed lung nodule image is input into a second feature encoder for feature encoding to obtain a second feature vector. The parameters of the first and second feature encoders are different, and the feature vector extracted from the transformed lung nodule images can better adapt to the lung nodule image retrieval task. The first feature vector is predicted to obtain a predicted feature vector. After prediction, the predicted feature vector and the second feature vector are not perfectly symmetrical. A contrastive loss function is calculated based on these two feature vectors. This contrastive loss function effectively reflects the differences between the first-transformed lung nodule image and the second-transformed lung nodule image. Backpropagation based on the contrastive loss function value has a fast convergence speed, allowing for rapid updating of the first feature encoder parameters. Training the first feature encoder does not require lung nodule images with manually annotated contours, and the extracted feature vectors have high accuracy.

[0072] In one embodiment, refer to Figure 2After updating the first feature encoder parameters to obtain the updated first feature encoder parameters, the method further includes:

[0073] S81: Calculate the difference between the first feature encoder parameter and the preset first feature encoder parameter to obtain the first parameter difference.

[0074] The first feature encoder parameter may be smaller or larger than the preset first feature encoder parameter. The difference in the first parameter can reflect the size relationship between the first feature encoder parameter and the preset first feature encoder parameter.

[0075] S82: Determine whether the absolute value of the first parameter difference is less than the first parameter difference threshold. If so, stop updating the first feature encoder parameters.

[0076] The absolute value of the first parameter difference can reflect the difference between the first feature encoder parameter and the preset first feature encoder parameter. Based on the absolute value of the first parameter difference, it can be determined whether to continue updating the first feature encoder parameter or stop updating the first feature encoder parameter.

[0077] When the absolute value of the first parameter difference is less than the first parameter difference threshold, it indicates that the first feature encoder parameters meet expectations, and the update of the first feature encoder parameters is stopped.

[0078] If the absolute value of the first parameter difference is greater than or equal to the first parameter difference threshold, then continue to update the parameters of the first feature encoder until the absolute value of the first parameter difference is less than the first parameter difference threshold.

[0079] As described above, after updating the first feature encoder parameters and obtaining the updated first feature encoder parameters, the process further includes calculating the difference between the first feature encoder parameters and the preset first feature encoder parameters to obtain the first parameter difference. It is then determined whether the absolute value of the first parameter difference is less than a first parameter difference threshold; if so, updating the first feature encoder parameters is stopped. The absolute value of the first parameter difference reflects the difference between the first feature encoder parameters and the preset first feature encoder parameters, and the absolute value of the first parameter difference determines whether to continue updating the first feature encoder parameters or stop updating them.

[0080] In one embodiment, refer to Figure 3 After stopping the updating of the first feature encoder parameters, the method further includes:

[0081] S83': Obtain the query image, input the query image into the first feature encoder for feature encoding, and obtain the third feature vector.

[0082] The query image is a 3D image of the case to be queried, and the query image contains the lung nodule region.

[0083] After updating the parameters of the first feature encoder, the query image is input into the trained first feature encoder for feature encoding to obtain the third feature vector.

[0084] S84': Calculate the similarity between the third feature vector and each stored feature vector in the feature database.

[0085] The feature database stores multiple stored feature vectors, each corresponding to a case.

[0086] The similarity between the third feature vector and each stored feature vector in the feature database is calculated. The similarity can be cosine similarity or other similarity. This application takes cosine similarity as an example.

[0087] The higher the cosine similarity between the third feature vector and each stored feature vector in the feature database, the more similar the third feature vector is to the stored feature vector. The range of cosine similarity is 0-1.

[0088] Cosine similarity is calculated using the following formula:

[0089]

[0090] Among them, s i Let h be the i-th cosine similarity, h be the third feature vector, and k be the third feature vector. i For the i-th stored eigenvector, "." represents the dot product operation, ‖h‖ represents the L2 norm of the third eigenvector, and ‖k i ‖ represents the L2 norm of the i-th stored eigenvector.

[0091] S85': Sort all the similarities in descending order to obtain a similarity sequence.

[0092] Sort all similarities in descending order, with the earlier similarity scores appearing in the sequence having the highest similarity.

[0093] S86': Select the top N similarities from the similarity sequence.

[0094] Among the first to Nth similarity scores, the first similarity score is the highest. The value of N ranges from 1 to 6, and preferably, N is set to 3.

[0095] S87': Use the N saved lung nodule images corresponding to the first N similarities as the search results.

[0096] One similarity score corresponds to one saved lung nodule image. For example, with N set to 3, the first similarity score corresponds to the 10th saved lung nodule image, the second similarity score corresponds to the 5th saved lung nodule image, and the third similarity score corresponds to the 8th saved lung nodule image. The 10th, 5th, and 8th saved lung nodule images are output sequentially as search results.

[0097] Converting the query image into a third feature vector and directly calculating the similarity between the third feature vector and each stored feature vector can quickly filter out N saved lung nodule images with high similarity to the query image.

[0098] As described above, after stopping the updating of the first feature encoder parameters, the process further includes acquiring a query image, inputting the query image into the first feature encoder for feature encoding, and obtaining a third feature vector. The similarity between the third feature vector and each stored feature vector in the feature database is calculated. All similarities are sorted in descending order to obtain a similarity sequence. The top N similarities are selected from the similarity sequence, and the N saved lung nodule images corresponding to the top N similarities are used as the search results. Converting the query image into a third feature vector and directly calculating the similarity between the third feature vector and each stored feature vector allows for the rapid selection of N saved lung nodule images with high similarity to the query image.

[0099] In one embodiment, before calculating the similarity between the third feature vector and each stored feature vector in the feature database, the method further includes:

[0100] S71': Use the first feature encoder to extract lung nodule features for each case in the case database to obtain the stored feature vector.

[0101] The parameters of the first feature encoder have been updated. Using the first feature encoder, lung nodule features of each case in the case library can be directly extracted. The case library contains multiple cases, and each case can contain one or more lung nodule training images.

[0102] S72': Combining all the stored feature vectors into the feature database.

[0103] The stored feature vectors in the feature database can be directly used to calculate the similarity between the feature vector and the third feature vector corresponding to the query image. The feature database can be stored on a personal computer or on a server.

[0104] As described above, before calculating the similarity between the third feature vector and each stored feature vector in the feature database, the process includes extracting lung nodule features for each case in the case database using a first feature encoder to obtain stored feature vectors. All stored feature vectors are then grouped into a feature database, which can be directly used to calculate the similarity between the stored feature vector and the third feature vector corresponding to the query image.

[0105] In one embodiment, the training process of the second feature encoder includes:

[0106] Obtain the parameters and adjustment parameters of the second feature encoder to be updated;

[0107] The parameters of the second feature encoder to be updated are updated according to the following formula:

[0108]

[0109] θ2=mθ2+(1-m)θ1;

[0110] Wherein, θ′2 is the updated second feature encoder parameter, θ2 is the second feature encoder parameter to be updated, θ1 is the updated first feature encoder parameter, and m is the adjustment parameter.

[0111] The larger the value of m, the faster the update speed of the second feature encoder parameters; the smaller the value of m, the slower the update speed of the second feature encoder parameters. During the first update, the second feature encoder parameters are the same as the initial second feature encoder parameters, which are identical to the initial first feature encoder parameters.

[0112] As described above, by updating the parameters of the second feature encoder to be updated by updating the parameters of the first feature encoder, and by ensuring that the updated parameters of the second feature encoder are different from the updated parameters of the first feature encoder, it is possible to guarantee that the first feature vector extracted by the first feature encoder is different from the second feature vector extracted by the second feature encoder.

[0113] In one embodiment, refer to Figure 4 The step of inputting the first transformed lung nodule image into a first feature encoder for feature encoding to obtain a first feature vector includes:

[0114] S31: The first transformed lung nodule image is sequentially input into the first convolutional layer to the Mth convolutional layer of the first feature encoder to obtain the first convolution result, where M≥2;

[0115] S32: Input the first convolution result into the pooling layer of the first feature encoder for global average pooling to obtain the first pooling result;

[0116] S33: Input the first pooling result into the multilayer perceptual network of the first feature encoder for mapping transformation to obtain the first feature vector.

[0117] Preferably, M is set to 4, and the first to fourth convolutional layers are all three-dimensional convolutional layers. The first convolutional layer has 32 channels, the second convolutional layer has 64 channels, the third convolutional layer has 128 channels, and the fourth convolutional layer has 256 channels.

[0118] The more convolutional layers there are, the deeper the features of the first-transform lung nodule image can be extracted.

[0119] Inputting the first convolution result into the pooling layer of the first feature encoder for global average pooling can reduce redundant features in the first convolution result and improve computational efficiency.

[0120] Preferably, the multilayer perceptron of the first feature encoder is a 4-layer perceptron. The first pooling result is input into the multilayer perceptron of the first feature encoder for mapping transformation to obtain the first feature vector, which has a dimension of 64.

[0121] As described above, the first transformed lung nodule image is input into the first feature encoder for feature encoding to obtain the first feature vector. This includes sequentially inputting the first transformed lung nodule image into the first convolutional layer to the Mth convolutional layer of the first feature encoder to obtain the first convolution result, where M≥2. The first convolution result is then input into the pooling layer of the first feature encoder for global average pooling to obtain the first pooling result. The first pooling result is then input into the multilayer perceptron of the first feature encoder for mapping transformation to obtain the first feature vector. Multiple convolutional layers can extract features at different levels from the first transformed lung nodule image, and global average pooling can reduce redundant features in the first convolution result, improving computational efficiency.

[0122] In one embodiment, refer to Figure 5 The step of performing a first image transformation on the lung nodule training image to obtain a first transformed lung nodule image includes:

[0123] S11: Perform center cropping on the lung nodule training image to obtain a center cropped image.

[0124] Obtain the 3D image center, cropping length L, cropping width W, and cropping height H of the lung nodule training image. The cropping length range is defined by the 3D image center as the origin, extending from the negative y-axis (L / 2) to the positive y-axis (L / 2). The cropping width range is defined by the 3D image center extending from the negative x-axis (W / 2) to the positive x-axis (W / 2). The cropping height range is defined by the 3D image center as the origin, extending from the negative z-axis (H / 2) to the positive z-axis (H / 2).

[0125] For example, the size of the acquired lung nodule training image is 256×256×256, and the lung nodule training image is cropped to 128×128×128.

[0126] S12: Resample the center-cropped image to obtain a resampled image.

[0127] Resampling includes upsampling and downsampling; this application's embodiment uses downsampling as an example.

[0128] For example, the size of the center-cropped image is 128×128×128. The center-cropped image is downsampled by a ratio of 1 / 2 to obtain a resampled image with a size of 64×64×64.

[0129] S13: Rotate the resampled image to obtain a rotated image.

[0130] The rotation angle range is 0-180 degrees. Preferably, the resampled image is rotated by 60 degrees to obtain the rotated image.

[0131] S14: Translate the rotated image to obtain the first transformed lung nodule image.

[0132] The rotated image is translated in a small range, with the translation amount being 1-10 pixels. Preferably, the rotated image is translated by 5 pixels to obtain the first transformed lung nodule image.

[0133] Optionally, after obtaining the first transformed lung nodule image, Gaussian filtering is applied to the first transformed lung nodule image to remove image noise and improve the image quality of the first transformed lung nodule image.

[0134] The first image transformation can reflect the perturbation of the first lung nodule training image without destroying or altering the visual structure of the lung nodule. The lung nodule region is roughly located at the image center of the first transformed lung nodule image, and the first transformed lung nodule image has the same resolution in all directions.

[0135] As described above, a first image transformation is performed on the lung nodule training image to obtain a first transformed lung nodule image. This includes center-cropping the lung nodule training image to obtain a center-cropped image; resampling the center-cropped image to obtain a resampled image; rotating the resampled image to obtain a rotated image; and translating the rotated image to obtain the first transformed lung nodule image. The first image transformation reflects the perturbation of the first lung nodule training image without destroying or altering the visual structure of the lung nodules. The lung nodule region is approximately located at the image center of the first transformed lung nodule image, and the first transformed lung nodule image has the same resolution in all directions.

[0136] In one embodiment, predicting the first feature vector to obtain a predicted feature vector includes:

[0137] S51: Input the first feature vector into the first linear layer of the prediction network and perform a linear transformation to obtain a linear transformation vector.

[0138] The first linear layer can perform a linear transformation on the first feature vector.

[0139] S52: Input the linear transformation vector into the regularization layer of the prediction network for regularization to obtain a regularized vector.

[0140] The regularization layer of the prediction network can use L1 regularization, L2 regularization, or Dropout regularization.

[0141] Regularizing the linear transformation vector can prevent overfitting.

[0142] S53: Input the regularization vector into the activation layer of the prediction network to activate it, and obtain the activation vector.

[0143] The activation layer of the prediction network can use either the ReLU activation function or the sigmoid activation function.

[0144] Activating a regularized vector can increase its nonlinearity.

[0145] S54: Input the activation vector into the second linear layer of the prediction network and perform a linear transformation to obtain the prediction feature vector.

[0146] The second linear layer can perform a linear transformation on the activation vector, and the resulting predicted feature vector can be adapted to various tasks of retrieving query images.

[0147] As described above, predicting the first feature vector to obtain a predicted feature vector involves inputting the first feature vector into the first linear layer of the prediction network for linear transformation to obtain a linearly transformed vector. The linearly transformed vector is then input into the regularization layer of the prediction network for regularization to obtain a regularized vector. The regularized vector is then input into the activation layer of the prediction network for activation to obtain an activation vector. Finally, the activation vector is input into the second linear layer of the prediction network for linear transformation. The resulting predicted feature vector is adaptable to various tasks involving retrieval of query images.

[0148] Reference Figure 6 This is a schematic block diagram of a self-supervised training device for a feature encoder disclosed in this application. The device includes:

[0149] The first image transformation module 10 is used to acquire lung nodule training images and perform a first image transformation on the lung nodule training images to obtain a first transformed lung nodule image.

[0150] The second image transformation module 20 is used to perform a second image transformation on the lung nodule training image to obtain a second transformed lung nodule image.

[0151] The first feature encoding module 30 is used to input the first transformed lung nodule image into the first feature encoder for feature encoding to obtain a first feature vector.

[0152] The second feature encoding module 40 is used to input the second transformed lung nodule image into the second feature encoder for feature encoding to obtain the second feature vector.

[0153] The first feature vector prediction module 50 is used to predict the first feature vector to obtain a predicted feature vector;

[0154] The contrast loss function value calculation module 60 is used to calculate the contrast loss function value based on the predicted feature vector and the second feature vector;

[0155] The first feature encoder parameter update module 70 is used to perform backpropagation based on the contrast loss function value to update the first feature encoder parameters and obtain the updated first feature encoder parameters.

[0156] In one embodiment, the self-supervised training apparatus for the feature encoder further includes:

[0157] The difference calculation module is used to calculate the difference between the first feature encoder parameter and the preset first feature encoder parameter to obtain the first parameter difference.

[0158] The stop update module is used to determine whether the absolute value of the first parameter difference is less than the first parameter difference threshold. If so, the update of the first feature encoder parameters is stopped.

[0159] In one embodiment, the self-supervised training apparatus for the feature encoder further includes:

[0160] The query image feature encoding module is used to acquire a query image, input the query image into the first feature encoder for feature encoding, and obtain a third feature vector;

[0161] The similarity calculation module is used to calculate the similarity between the third feature vector and each stored feature vector in the feature database;

[0162] The descending order sorting module is used to sort all the similarities in descending order to obtain a similarity sequence;

[0163] A similarity filtering module is used to filter out the top N similarities from the similarity sequence;

[0164] The retrieval result determination module is used to take the N saved lung nodule images corresponding to the first N similarities as the retrieval results.

[0165] In one embodiment, the self-supervised training apparatus for the feature encoder further includes:

[0166] The lung nodule feature extraction module is used to extract lung nodule features for each case in the case database using the first feature encoder, and obtain the stored feature vector.

[0167] The feature database component module is used to assemble all the stored feature vectors into the feature database.

[0168] In one embodiment, the self-supervised training apparatus for the feature encoder further includes:

[0169] The parameter acquisition module is used to acquire the parameters and adjustment parameters of the second feature encoder to be updated;

[0170] The second feature encoder parameter update module is used to update the second feature encoder parameters to be updated according to the following formula:

[0171]

[0172] θ2=mθ2+(1-m)θ1;

[0173] Wherein, θ′2 is the updated second feature encoder parameter, θ2 is the second feature encoder parameter to be updated, θ1 is the updated first feature encoder parameter, and m is the adjustment parameter.

[0174] In one embodiment, the first feature encoding module 30 further includes:

[0175] A convolutional unit is used to sequentially input the first transformed lung nodule image into the first convolutional layer to the Mth convolutional layer of the first feature encoder to obtain the first convolutional result, where M≥2;

[0176] A pooling unit is used to input the first convolution result into the pooling layer of the first feature encoder for global average pooling to obtain the first pooling result.

[0177] The mapping transformation unit is used to input the first pooling result into the multilayer perceptual network of the first feature encoder for mapping transformation to obtain the first feature vector.

[0178] In one embodiment, the first image transformation module 10 further includes:

[0179] The center cropping unit is used to perform center cropping on the lung nodule training image to obtain a center cropped image.

[0180] A resampling unit is used to resample the center-cropped image to obtain a resampled image;

[0181] A rotation unit is used to rotate the resampled image to obtain a rotated image;

[0182] A translation unit is used to translate the rotated image to obtain the first transformed lung nodule image.

[0183] In one embodiment, the first feature vector prediction module 50 further includes:

[0184] The first linear transformation unit is used to input the first feature vector into the first linear layer of the prediction network and perform a linear transformation to obtain a linear transformation vector.

[0185] The regularization unit is used to input the linear transformation vector into the regularization layer of the prediction network for regularization to obtain a regularized vector;

[0186] An activation unit is used to input the regularization vector into the activation layer of the prediction network for activation, thereby obtaining an activation vector.

[0187] The second linear transformation unit is used to input the activation vector into the second linear layer of the prediction network for linear transformation to obtain the prediction feature vector.

[0188] Reference Figure 7 This application also provides a computer device whose internal structure can be as follows: Figure 7As shown, the computer device includes a processor, memory, network interface, and database connected via a system bus. The processor is designed to provide computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores operating devices, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The database stores first and second feature vectors, etc. The network interface is used for communication with external terminals via a network connection. Furthermore, the computer device may also include input devices and a display screen. When executed by a processor, this computer program implements a self-supervised training method for a feature encoder, comprising the following steps: acquiring lung nodule training images; performing a first image transformation on the lung nodule training images to obtain a first transformed lung nodule image; performing a second image transformation on the lung nodule training images to obtain a second transformed lung nodule image; inputting the first transformed lung nodule image into a first feature encoder for feature encoding to obtain a first feature vector; inputting the second transformed lung nodule image into a second feature encoder for feature encoding to obtain a second feature vector; predicting the first feature vector to obtain a predicted feature vector; calculating a contrastive loss function value based on the predicted feature vector and the second feature vector; and performing backpropagation based on the contrastive loss function value to update the parameters of the first feature encoder, thereby obtaining the updated parameters of the first feature encoder. Those skilled in the art will understand that... Figure 7 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer equipment on which the present application is applied.

[0189] One embodiment of this application also provides a computer-readable storage medium storing a computer program thereon. When the computer program is executed by a processor, it implements a self-supervised training method for a feature encoder, including the following steps: acquiring lung nodule training images; performing a first image transformation on the lung nodule training images to obtain a first transformed lung nodule image; performing a second image transformation on the lung nodule training images to obtain a second transformed lung nodule image; inputting the first transformed lung nodule image into a first feature encoder for feature encoding to obtain a first feature vector; inputting the second transformed lung nodule image into a second feature encoder for feature encoding to obtain a second feature vector; predicting the first feature vector to obtain a predicted feature vector; calculating a contrastive loss function value based on the predicted feature vector and the second feature vector; and performing backpropagation based on the contrastive loss function value to update the parameters of the first feature encoder, obtaining updated first feature encoder parameters. It is understood that the computer-readable storage medium in this embodiment can be a volatile readable storage medium or a non-volatile readable storage medium.

[0190] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media provided in this application and in the embodiments may include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-speed SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), RAMbus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

[0191] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, apparatus, article, or method. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, apparatus, article, or method that includes that element.

[0192] The above description is merely a preferred embodiment of the present invention and does not limit the patent scope of the present invention. Any equivalent structural or procedural transformations made based on the content of the present invention's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of the present invention.

Claims

1. A self-supervised training method for a feature encoder, characterized in that, The method includes: Acquire training images of lung nodules, and perform a first image transformation on the training images of lung nodules to obtain first transformed lung nodule images; The lung nodule training image is subjected to a second image transformation to obtain a second transformed lung nodule image; The first transformed lung nodule image is input into the first feature encoder for feature encoding to obtain the first feature vector; The second transformed lung nodule image is input into the second feature encoder for feature encoding to obtain the second feature vector; The first feature vector is predicted to obtain the predicted feature vector; Calculate the comparison loss function value based on the predicted feature vector and the second feature vector; Backpropagation is performed based on the contrastive loss function value to update the parameters of the first feature encoder, thus obtaining the updated parameters of the first feature encoder.

2. The self-supervised training method for the feature encoder according to claim 1, characterized in that, After updating the first feature encoder parameters to obtain the updated first feature encoder parameters, the process further includes: Calculate the difference between the first feature encoder parameters and the preset first feature encoder parameters to obtain the first parameter difference; Determine whether the absolute value of the first parameter difference is less than the first parameter difference threshold. If so, stop updating the first feature encoder parameters.

3. The self-supervised training method for the feature encoder according to claim 2, characterized in that, After stopping the update of the first feature encoder parameters, the method further includes: Obtain the query image, input the query image into the first feature encoder for feature encoding, and obtain the third feature vector; Calculate the similarity between the third feature vector and each stored feature vector in the feature database; Arrange all the similarities in descending order to obtain the similarity sequence; The top N similarities are selected from the similarity sequence; The N saved lung nodule images corresponding to the first N similarities are used as the search results.

4. The self-supervised training method for the feature encoder according to claim 3, characterized in that, Before calculating the similarity between the third feature vector and each stored feature vector in the feature database, the method further includes: The lung nodule features of each case in the case database are extracted using the first feature encoder to obtain the stored feature vector; All the stored feature vectors are combined to form the feature database.

5. The self-supervised training method for the feature encoder according to claim 1, characterized in that, The training process of the second feature encoder includes: Obtain the parameters and adjustment parameters of the second feature encoder to be updated; The parameters of the second feature encoder to be updated are updated according to the following formula: θ′2=mθ2+(1-m)θ1; Wherein, θ′2 is the updated second feature encoder parameter, θ2 is the second feature encoder parameter to be updated, θ1 is the updated first feature encoder parameter, and m is the adjustment parameter.

6. The self-supervised training method for the feature encoder according to claim 1, characterized in that, The step of inputting the first transformed lung nodule image into a first feature encoder for feature encoding to obtain a first feature vector includes: The first transformed lung nodule image is sequentially input into the first convolutional layer to the Mth convolutional layer of the first feature encoder to obtain the first convolution result, where M≥2; The first convolution result is input into the pooling layer of the first feature encoder for global average pooling to obtain the first pooling result. The first pooling result is input into the multilayer perceptual network of the first feature encoder for mapping transformation to obtain the first feature vector.

7. The self-supervised training method for the feature encoder according to claim 1, characterized in that, The step of performing a first image transformation on the lung nodule training image to obtain a first transformed lung nodule image includes: The lung nodule training images are cropped at the center to obtain cropped images; The center-cropped image is resampled to obtain a resampled image; The resampled image is rotated to obtain a rotated image; The rotated image is translated to obtain the first transformed lung nodule image.

8. The self-supervised training method for the feature encoder according to claim 1, characterized in that, The step of predicting the first feature vector to obtain the predicted feature vector includes: The first feature vector is input into the first linear layer of the prediction network and linearly transformed to obtain the linearly transformed vector. The linear transformation vector is input into the regularization layer of the prediction network for regularization to obtain a regularized vector; The regularization vector is input into the activation layer of the prediction network to activate it, thus obtaining the activation vector; The activation vector is input into the second linear layer of the prediction network and linearly transformed to obtain the predicted feature vector.

9. A self-supervised training device for a feature encoder, characterized in that, include: The first image transformation module is used to acquire lung nodule training images and perform a first image transformation on the lung nodule training images to obtain a first transformed lung nodule image. The second image transformation module is used to perform a second image transformation on the lung nodule training image to obtain a second transformed lung nodule image. The first feature encoding module is used to input the first transformed lung nodule image into the first feature encoder for feature encoding to obtain a first feature vector; The second feature encoding module is used to input the second transformed lung nodule image into the second feature encoder for feature encoding to obtain the second feature vector. The first feature vector prediction module is used to predict the first feature vector to obtain the predicted feature vector; The contrast loss function value calculation module is used to calculate the contrast loss function value based on the predicted feature vector and the second feature vector. The first feature encoder parameter update module is used to perform backpropagation based on the contrast loss function value to update the first feature encoder parameters and obtain the updated first feature encoder parameters.

10. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the self-supervised training method for the feature encoder as described in any one of claims 1 to 8.

11. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the steps of the self-supervised training method for the feature encoder as described in any one of claims 1 to 8.