A method and device for identifying gestational diabetes mellitus based on fundus images

By training an unbiased classification model using a multi-scale interactive attention module based on fundus images and a weighted cross-entropy loss based on course learning, the accuracy and robustness issues of gestational diabetes identification were addressed, enabling rapid and effective screening and diagnosis of gestational diabetes.

CN117152517BActive Publication Date: 2026-06-30BEIJING AIRDOC TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING AIRDOC TECH CO LTD
Filing Date
2023-08-31
Publication Date
2026-06-30

Smart Images

  • Figure CN117152517B_ABST
    Figure CN117152517B_ABST
Patent Text Reader

Abstract

This invention discloses a method for identifying gestational diabetes mellitus based on fundus images. One implementation of this method includes: first, using labeled fundus images as training samples; acquiring first-scale and second-scale images corresponding to the training samples; the labels include at least a "normal" label and a "gestational diabetes mellitus" label; second, extracting features from the first-scale and second-scale images respectively to generate corresponding first and second feature representations; then, based on two different attention models, cross-weighting the first and second feature representations according to a cross-attention strategy to obtain weighted features; finally, using the true labels of the training samples as a reference, supervised model training is performed based on the classification prediction results corresponding to the weighted features to generate an unbiased classification model. Therefore, gestational diabetes mellitus can be accurately identified and classified based on fundus images, thus providing effective advice to doctors in clinical practice.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of image processing technology, and in particular relates to a method and device for identifying gestational diabetes based on fundus images. Background Technology

[0002] Gestational diabetes mellitus (GDM) refers to the development of high blood sugar levels during pregnancy in women who previously had no diabetes. GDM can lead to difficult labor and neonatal metabolic abnormalities, impacting the health of both the mother and fetus. With improved living standards, changing dietary habits, and the implementation of the two-child policy, the proportion of older pregnant women is increasing, further contributing to the rise in the incidence of GDM in my country. Simultaneously, the incidence of GDM is also rapidly increasing worldwide. Currently, there is no effective treatment for GDM; therefore, early screening and intervention for vulnerable groups are of great practical significance in preventing the occurrence and progression of the disease.

[0003] In recent years, with the development of technologies such as artificial intelligence and deep learning, many studies have attempted to apply deep learning technology to smart healthcare for clinical assisted screening and decision-making, achieving significant results, such as heart segmentation and prostate segmentation. In the field of ophthalmology, some deep learning models have also been explored, such as for diabetic retinopathy detection, retinal vessel segmentation, and diabetic macular edema classification. However, for the classification and identification of gestational diabetes mellitus, no literature has yet explored the application of deep learning models based on fundus images for this task. Fundus examination is simple and easy to perform, requiring no complex instruments, and is non-invasive and painless to patients, allowing for repeated and thorough examinations. Furthermore, fundus examination allows for dynamic observation of fundus changes, which is helpful in understanding the treatment effects and monitoring the progression of the disease in patients with gestational diabetes. Therefore, this invention proposes a novel method for identifying gestational diabetes mellitus based on fundus images, enabling efficient and rapid screening for gestational diabetes. Summary of the Invention

[0004] To address the aforementioned problems in the existing technology, embodiments of the present invention provide a method and apparatus for identifying gestational diabetes based on fundus images, which can accurately identify and classify gestational diabetes based on fundus images, thereby providing effective advice to doctors in clinical practice.

[0005] According to a first aspect of the present invention, a method for identifying gestational diabetes mellitus based on fundus images is provided. The method includes: using labeled fundus images as training samples; acquiring a first-scale image and a second-scale image corresponding to the training samples; the labels include at least a normal label and a gestational diabetes mellitus label; extracting features from the first-scale image and the second-scale image respectively to generate a first feature representation corresponding to the first-scale image and a second feature representation corresponding to the second-scale image; performing cross-weighting processing on the first feature representation and the second feature representation according to a cross-attention strategy based on two different attention models to obtain weighted features; and using the true labels of the training samples as a reference, performing supervised model training based on the classification prediction results corresponding to the weighted features to generate an unbiased classification model.

[0006] Optionally, the step of performing cross-weighting processing on the first feature representation and the second feature representation based on two different attention models and according to a cross-attention strategy to obtain weighted features includes: inputting the first feature representation into two different attention models to generate corresponding first weights and second weights; inputting the second feature representation into two different attention models to generate corresponding third weights and fourth weights; and performing cross-weighting processing on the first feature representation and the second feature representation using a cross-attention strategy based on the first weight, second weight, third weight, and fourth weight to obtain weighted features.

[0007] Optionally, the step of performing cross-weighting processing on the first feature representation and the second feature representation using a cross-attention strategy based on the first weight, the second weight, the third weight, and the fourth weight to obtain a weighted feature includes: applying the third weight and the fourth weight to the first feature representation respectively and then performing weighting processing to generate a first weighted feature; applying the first weight and the second weight to the second feature representation respectively and then performing weighting processing to generate a second weighted feature; and summing the first weighted feature and the second weighted feature to generate a weighted feature.

[0008] Optionally, the step of using the true labels of the training samples as a reference and performing supervised model training based on the classification prediction results corresponding to the weighted features to generate an unbiased classification model includes: using the true labels of the training samples as a reference and performing supervised model training based on the classification prediction results corresponding to the weighted features; during the model training process, determining the weights corresponding to the cross-entropy loss function according to the number of model iterations to obtain a weighted cross-entropy loss function; and optimizing the model based on several weighted cross-entropy loss functions to generate an unbiased classification model.

[0009] Optionally, determining the weights corresponding to the cross-entropy loss function based on the number of model iterations to obtain a weighted cross-entropy loss function includes: when the number of model iterations is less than E1, applying the same first weight to each sample to generate a first weighted cross-entropy loss function; when the number of model iterations is greater than E1 and less than E2, applying a second weight to the cross-entropy loss function to focus on categories with fewer samples to generate a second weighted cross-entropy loss function; and when the number of model iterations is greater than E2 and less than E, applying a third weight to the cross-entropy loss function to focus on training samples that are more difficult to identify to generate a third weighted cross-entropy loss function.

[0010] Optionally, the method further includes: performing data augmentation on the training samples to generate a training sample dataset.

[0011] Optionally, the method further includes: inputting the weighted features into the classification layer to obtain the classification prediction result corresponding to the weighted features.

[0012] According to a second aspect of the present invention, an apparatus for identifying gestational diabetes mellitus based on fundus images is also provided. The apparatus includes: an acquisition module, configured to use labeled fundus images as training samples; acquire a first-scale image and a second-scale image corresponding to the training samples; the labels include at least a normal label and a gestational diabetes mellitus label; a generation module, configured to extract features from the first-scale image and the second-scale image respectively, generating a first feature representation corresponding to the first-scale image and a second feature representation corresponding to the second-scale image; a weighted processing module, configured to perform cross-weighted processing on the first feature representation and the second feature representation based on two different attention models and a cross-attention strategy to obtain weighted features; and a model training module, configured to use the real labels of the training samples as references and perform supervised model training based on the classification prediction results corresponding to the weighted features to generate an unbiased classification model.

[0013] According to a third aspect of the present invention, an electronic device is also provided, the electronic device comprising: one or more processors; and a memory for storing one or more programs, which, when executed by the one or more processors, cause the one or more processors to perform the method as described in the first aspect.

[0014] According to a fourth aspect of the present invention, a computer-readable medium is also provided, on which a computer program is stored, wherein the program, when executed by a processor, implements the method described in the first aspect.

[0015] This invention provides a method for identifying gestational diabetes mellitus based on fundus images. The method includes: first, using labeled fundus images as training samples; acquiring first-scale and second-scale images corresponding to the training samples; the labels include at least a normal label and a gestational diabetes mellitus label; second, extracting features from the first-scale and second-scale images respectively to generate a first feature representation corresponding to the first-scale image and a second feature representation corresponding to the second-scale image; then, based on two different attention models, cross-weighting the first and second feature representations according to a cross-attention strategy to obtain weighted features; finally, using the true labels of the training samples as a reference, supervised model training is performed based on the classification prediction results corresponding to the weighted features to generate an unbiased classification model. Therefore, this embodiment, by using two different attention models to simultaneously learn channel and spatial attention weights, can help the model identify more detailed small lesion areas, thereby fully utilizing multi-scale contextual information to highlight more information areas for gestational diabetes mellitus identification and improving the accuracy of model identification. Attached Figure Description

[0016] The following sections will describe some specific embodiments of the invention in detail by way of example and not limitation, with reference to the accompanying drawings. The same reference numerals in the drawings denote the same or similar parts or portions. Those skilled in the art should understand that these drawings are not necessarily drawn to scale. In the drawings:

[0017] Figure 1 This is a schematic flowchart of a method for identifying gestational diabetes based on fundus images according to an embodiment of the present invention.

[0018] Figure 2 A schematic flowchart of a method for identifying gestational diabetes based on fundus images, provided in another embodiment of the present invention;

[0019] Figure 3 A flowchart illustrating a method for identifying gestational diabetes based on fundus images, as provided in another embodiment of the present invention;

[0020] Figure 4 This is a schematic diagram of the structure of a classification model in one embodiment of the present invention;

[0021] Figure 5 This is a schematic diagram of the structure of two attention models in one embodiment of the present invention;

[0022] Figure 6 This is a schematic diagram of the structure of a gestational diabetes identification device based on fundus images provided in an embodiment of the present invention. Detailed Implementation

[0023] To make the objectives, features, and advantages of this invention more apparent and understandable, the technical solutions of the embodiments of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this invention, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.

[0024] Since lesions are typically found only in small regions of fundus images, traditional deep learning models struggle to effectively target these small areas. To address this, this invention proposes a multi-scale interactive attention module to better utilize contextual information at different scales, enhancing the model's ability to identify lesions in these small regions. Because fundus images of gestational diabetes typically constitute only a small portion of the image, with the majority being normal, the model is easily dominated by a large number of normal images during training, leading to varying diagnostic difficulties across different fundus images. Therefore, this invention employs a course-based weighted cross-entropy loss to train an unbiased classification model, improving its robustness and enabling more accurate prediction of gestational diabetes based on fundus images.

[0025] like Figure 1 The diagram shown is a flowchart illustrating a method for identifying gestational diabetes based on fundus images according to an embodiment of the present invention.

[0026] A method for identifying gestational diabetes based on fundus images, the method comprising at least the following steps:

[0027] S101, Use labeled fundus images as training samples; Obtain the first-scale image and the second-scale image corresponding to the training samples; The labels include at least a normal label and a gestational diabetes label;

[0028] S102, extract features from the first scale image and the second scale image respectively to generate a first feature representation corresponding to the first scale image and a second feature representation corresponding to the second scale image;

[0029] S103, based on two different attention models, performs cross-weighting on the first feature representation and the second feature representation according to the cross-attention strategy to obtain weighted features;

[0030] S104 uses the true labels of the training samples as a reference and performs supervised model training based on the classification prediction results corresponding to the weighted features to generate an unbiased classification model.

[0031] In S101, all training samples are derived from clinical medical data and labeled by professional physicians using clinical diagnostic information. Based on the labeling results, the training samples are divided into two categories: one category contains fundus images labeled as normal, and the other contains fundus images labeled as gestational diabetes. Each training sample has a corresponding first-scale image and a second-scale image; for example, the first-scale image corresponding to a training sample is 256x256, and the second-scale image is 512x512.

[0032] In step S102, a neural network is used to extract features from the first-scale image, generating a first feature representation corresponding to the first-scale image; the neural network is then used to extract features from the second-scale image, generating a second feature representation corresponding to the second-scale image. Here, the neural network can be a ResNet model or a CNN model, etc.

[0033] In S103, the first feature representation is input into the spatial attention model and the channel attention model respectively to obtain two kinds of attention weights; the second feature representation is input into the spatial attention model and the channel attention model respectively to obtain two kinds of attention weights; the two kinds of attention weights corresponding to the first feature representation are applied to the second feature representation; and the two kinds of attention weights corresponding to the second feature representation are applied to the first feature representation; then the weighted first feature representation and the weighted second feature representation are summed to obtain a weighted feature.

[0034] In S104, a classification layer is used to process the weighted features to generate classification prediction results. The model compares the classification prediction results with the true labels of the training samples and outputs a weighted cross-entropy loss function. Based on several weighted cross-entropy loss functions, an unbiased classification model is generated.

[0035] Therefore, this embodiment uses two different attention models to simultaneously learn channel and spatial attention weights, which helps the model identify more detailed small lesion areas and fully utilizes multi-scale contextual information to highlight more information areas for gestational diabetes identification, thereby improving the accuracy of model identification.

[0036] In a preferred embodiment of this invention, the method further includes: performing data augmentation processing on the training samples to generate a training sample dataset.

[0037] Specifically, data augmentation includes operations such as random horizontal and vertical flipping, random rotation, random cropping, and histogram equalization; thus, data augmentation can increase the amount of training samples and improve the generalization performance of the model.

[0038] In a preferred embodiment, the method further includes: inputting the weighted features into a classification layer to obtain classification prediction results corresponding to the weighted features. Here, the classification prediction results have at least two categories, for example: a normal classification result and a gestational diabetes classification result.

[0039] like Figure 2 The diagram shown is a flowchart illustrating a method for identifying gestational diabetes based on fundus images according to another embodiment of the present invention; Figure 4 The diagram shown is a structural schematic of a classification model in one embodiment of the present invention.

[0040] S201, Use labeled fundus images as training samples; Obtain the first-scale image and the second-scale image corresponding to the training samples; The labels include at least a normal label and a gestational diabetes label;

[0041] S202, extract features from the first scale image and the second scale image respectively to generate a first feature representation corresponding to the first scale image and a second feature representation corresponding to the second scale image;

[0042] S203, input the first feature representation into two different attention models to generate corresponding first weights and second weights;

[0043] S204, input the second feature representation into two different attention models to generate the corresponding third and fourth weights;

[0044] S205, based on the first weight, second weight, third weight and fourth weight, the first feature representation and the second feature representation are cross-weighted using a cross-attention strategy to obtain weighted features; the weighted features are input into the classification layer, and the classification prediction results corresponding to the weighted features are output;

[0045] S206, using the true labels of the training samples as a reference, conducts supervised model training based on the classification prediction results;

[0046] S207, During model training, the weights corresponding to the cross-entropy loss function are determined based on the number of model iterations, thus obtaining the weighted cross-entropy loss function;

[0047] S208 generates an unbiased classification model by optimizing the model based on several weighted cross-entropy loss functions.

[0048] The implementation process of step S201 is as follows: Figure 1 The steps in step S101 are similar and will not be repeated here.

[0049] In steps S202 to S204, the first-scale image and the second-scale image are input into the classification model for feature extraction, generating a first feature representation corresponding to the first-scale image and a second feature representation corresponding to the second-scale image. The first feature representation is then input into the channel attention model and the spatial attention model, respectively, outputting a first weight and a second weight; the second feature representation is input into the channel attention model and the spatial attention model, outputting a third weight and a fourth weight. Thus, the attention weights corresponding to the two different scale images of the training samples can be obtained.

[0050] Here, the classification model is the ResNet-18 model. The ResNet model was proposed by Kaiming He of Microsoft Research Asia. The ResNet-18 model can alleviate the problem of decreased learning efficiency and ineffective improvement of accuracy caused by the increase of deep learning models as the number of layers increases through residual learning. At the same time, the increase of the number of layers makes the model have better feature extraction capabilities and better classification performance.

[0051] like Figure 4 As shown, the input images at a scale of 0.5× include both normal images and images with gestational diabetes mellitus (GDM); the input images at a scale of 1× also include both normal images and GDM images. For the same GDM image, both 0.5× and 1× scale input images are included. Similarly, for the same normal image, both 0.5× and 1× scale input images are included.

[0052] Taking gestational diabetes mellitus images as an example: The 0.5× scale input image is first fed into the ResNet-18 model to obtain the first feature representation. This first feature representation is then fed into the attention module (which includes spatial and channel attention models), outputting the corresponding spatial and channel attention weights. The 0.1× scale input image is first fed into the ResNet-18 model to obtain the second feature representation. This second feature representation is then fed into the attention module, outputting the corresponding spatial and channel attention weights. The spatial and channel attention weights corresponding to the 0.5× scale input image are applied to the second feature representation, and the spatial and channel attention weights corresponding to the 0.1× scale input image are applied to the first feature representation. Finally, the weighted first and second feature representations are summed and fed into the average pooling layer and the fully connected layer, outputting the prediction result. There are two prediction results: a normal image and a gestational diabetes mellitus image.

[0053] The prediction process for normal images is the same as that for gestational diabetes images, and will not be repeated here. During model training, both normal images and gestational diabetes images are input into the model simultaneously.

[0054] It should be noted that the classification model is not limited to the ResNet-18 model; it can also be other deep learning models.

[0055] In S205, for example, the first feature representation is weighted by applying a third weight and a fourth weight respectively, and then weighted to generate a first weighted feature; the second feature representation is weighted by applying a first weight and a second weight respectively, and then weighted to generate a second weighted feature; the first weighted feature and the second weighted feature are summed to generate a weighted feature.

[0056] In steps S206 to S208, during supervised model training, when the number of iterations is less than E1, the same first weight is applied to each sample, generating a first weighted cross-entropy loss function. When the number of iterations is greater than E1 and less than E2, a second weight is applied to the cross-entropy loss function to focus on categories with fewer samples, generating a second weighted cross-entropy loss function. When the number of iterations is greater than E2 and less than E, a third weight is applied to the cross-entropy loss function to focus on training samples with greater recognition difficulty, generating a third weighted cross-entropy loss function. Optimal model parameters are generated based on these weighted cross-entropy loss functions. The model is then optimized based on these optimal parameters to obtain an unbiased classification model. This allows different weights to be applied to the cross-entropy loss function based on the number of iterations during model training, enabling the model to focus more on samples with greater recognition difficulty and categories with fewer samples. This overcomes the technical problem in existing technologies where the model is easily dominated by a large number of normal images during training, leading to varying diagnostic difficulties for different fundus images, and improves the robustness of model training.

[0057] This embodiment enables the training of an unbiased classification model based on a weighted cross-entropy loss function derived from course learning, thus improving model robustness. Furthermore, by performing end-to-end training and prediction, the unbiased classification model can be used to quickly and conveniently identify gestational diabetes mellitus, improving the accuracy of gestational diabetes identification.

[0058] The following section provides a detailed description of the gestational diabetes identification method based on fundus images provided in this embodiment, with specific applications as examples.

[0059] like Figure 3 The diagram shown is a flowchart illustrating a method for identifying gestational diabetes based on fundus images according to another embodiment of the present invention. Figure 5 This is a schematic diagram of the structure of two attention models in one embodiment of the present invention.

[0060] like Figure 3As shown, the method for identifying gestational diabetes mellitus based on fundus images includes at least the following steps: data acquisition and cleaning of fundus images of the target object to obtain a first output image; data amplification processing of the first output image to obtain a training sample dataset; construction of a prediction model using the training sample dataset to obtain an unbiased classification model; and prediction of the fundus image to be tested using the unbiased classification model to output the GDM probability value.

[0061] The specific process is as follows:

[0062] S1, use labeled fundus images as training samples; perform data augmentation on the training samples to generate a training sample dataset; wherein, the training sample dataset includes several training samples; for any training sample: obtain the first-scale image and the second-scale image corresponding to the training sample; the labels include at least a normal label and a gestational diabetes label;

[0063] S2, extract features from the first scale image and the second scale image respectively, and generate a first feature representation corresponding to the first scale image and a second feature representation corresponding to the second scale image;

[0064] S3: Input the first feature representation into two different attention models to generate corresponding first and second weights; input the second feature representation into two different attention models to generate corresponding third and fourth weights; apply the third and fourth weights to the first feature representation respectively and then perform weighted processing to generate the first weighted feature; apply the first and second weights to the second feature representation respectively and then perform weighted processing to generate the second weighted feature; sum the first weighted feature and the second weighted feature to generate the weighted feature.

[0065] S4: Input the weighted features into the classification layer and output the classification prediction results corresponding to the weighted features;

[0066] S5 uses the true labels of the training samples as a reference and performs supervised model training based on the classification prediction results corresponding to the weighted features.

[0067] S6, During model training, when the number of model iterations is less than E1, the same first weight is applied to each sample to generate the first weighted cross-entropy loss function; when the number of model iterations is greater than E1 and less than E2, a second weight is applied to the cross-entropy loss function to focus on the category with fewer samples to generate the second weighted cross-entropy loss function; when the number of model iterations is greater than E2 and less than E, a third weight is applied to the cross-entropy loss function to focus on the training samples that are more difficult to identify to generate the third weighted cross-entropy loss function.

[0068] S7 optimizes the model based on the weighted cross-entropy loss function corresponding to each training sample to generate an unbiased classification model.

[0069] like Figure 5 As shown, specifically, taking the first-scale image as an example: First, the first-scale image is input into the feature extraction network of the ResNet-18 model for feature extraction, resulting in the first feature representation. Where H, W, and C are the length, height, and number of channels of the feature map, respectively; next, the first feature representation is input into the channel attention model and the spatial attention model respectively to obtain the first weight (i.e., channel attention weight) and the second weight (i.e., spatial attention weight); finally, the first weight and the second weight corresponding to the first scale image are applied to the second feature representation and weighted to obtain the second weighted feature. For example: the first feature representation is input into a global average pooling layer (GAP) to obtain the pooled feature; then the pooled feature is input into two 1×1 convolutional layers (Conv); then σ (sigmoid activation function) is used to obtain the channel attention weight; the first feature representation is input into a cross-channel global average pooling layer (Channel-GAP) and then passed through an activation function σ.

[0070] (sigmoid) yields the spatial attention weights.

[0071] Similarly, a second feature representation is obtained based on the second-scale image. The second feature representation is then input into the channel attention model and the spatial attention model respectively to obtain the third weight and the fourth weight. Finally, the third weight and the fourth weight corresponding to the second-scale image are applied to the first feature representation and weighted to obtain the first weighted feature.

[0072] Here, the channel attention model introduces an attention mechanism at the channel level, adaptively reweighting each feature channel and suppressing channels with less information.

[0073] In this embodiment, fundus images at two different scales are input into a feature extraction network to obtain feature representations. Then, these feature representations at different scales are input into an attention module to obtain channel attention weights and spatial attention weights. After obtaining these two attention weights, a cross-attention strategy is used to reweight the depth feature maps at another scale. After reweighting the feature maps at each scale, the weighted features from both scales are summed to obtain the final weighted feature. Therefore, this invention uses a multi-scale interactive attention module to fully utilize multi-scale contextual information to highlight more informational regions for identifying gestational diabetes. This module includes both channel attention and spatial attention models, enabling the model to simultaneously learn channel and spatial attention weights. This helps retain more detailed small lesion regions in the input image and suppresses less useful information, improving the accuracy of unbiased classification models in identifying gestational diabetes.

[0074] A robust unbiased classification model is trained using a course-based weighted cross-entropy loss, which is represented by Equation (1):

[0075]

[0076]

[0077] Where w i It is adaptive weights, CE is cross-entropy loss, B is the number of samples, C is the number of classes, and N is the number of classes. c It is the number of samples in category c. It is the model's predicted output, y i These are the true labels, where 'e' is the current iteration number, 'E' is the total number of iterations, and 'E1' and 'E2' represent the iteration stages. Here, we take... f c e This represents the model's performance on class c at iteration e, with F1 as the evaluation metric. It can be seen that in the first E1 iterations, the weight is 1, treating all samples equally. From E1 to E2 iterations, the model focuses more on classes with fewer samples because they have larger weights. From E2 to E iterations, the model's diagnostic difficulty for different classes of fundus images is assessed based on its performance on each class, giving greater attention to classes with higher diagnostic difficulty because their... Lower.

[0078] Therefore, using a weighted cross-entropy loss based on course learning to train the unbiased classification model improves the robustness of the unbiased classification model, enabling it to more accurately predict gestational diabetes based on fundus images.

[0079] like Figure 6 The diagram shown is a schematic diagram of a gestational diabetes identification device based on fundus images provided in an embodiment of the present invention.

[0080] A device for identifying gestational diabetes based on fundus images, the device 600 comprising: an acquisition module 601, used to acquire labeled fundus images as training samples; acquire a first-scale image and a second-scale image corresponding to the training samples; the labels include at least a normal label and a gestational diabetes label; a generation module 602, used to extract features from the first-scale image and the second-scale image respectively, generating a first feature representation corresponding to the first-scale image and a second feature representation corresponding to the second-scale image; a weighted processing module 603, used to perform cross-weighted processing on the first feature representation and the second feature representation based on two different attention models and a cross-attention strategy to obtain weighted features; and a model training module 604, used to perform supervised model training based on the classification prediction results corresponding to the weighted features, using the real labels of the training samples as a reference, to generate an unbiased classification model.

[0081] In a preferred embodiment, the weighted attention module includes: a first generation unit, configured to input the first feature representation into two different attention models to generate corresponding first weights and second weights; a second generation unit, configured to input the second feature representation into two different attention models to generate corresponding third weights and fourth weights; and a cross-weighting processing unit, configured to perform cross-weighting processing on the first feature representation and the second feature representation using a cross-attention strategy based on the first weight, second weight, third weight, and fourth weight to obtain weighted features.

[0082] In a preferred embodiment, the cross-weighting processing unit includes: a first generation subunit, configured to apply a third weight and a fourth weight to the first feature representation respectively and then perform weighting processing to generate a first weighted feature; a second generation subunit, configured to apply a first weight and a second weight to the second feature representation respectively and then perform weighting processing to generate a second weighted feature; and a weighting subunit, configured to sum the first weighted feature and the second weighted feature to generate a weighted feature.

[0083] In a preferred embodiment, the model training module includes: a model training unit, used to perform supervised model training based on the classification prediction results corresponding to the weighted features, using the true labels of the training samples as a reference; an acquisition unit, used to determine the weights corresponding to the cross-entropy loss function according to the number of model iterations during the model training process, and obtain a weighted cross-entropy loss function; and a generation unit, used to optimize the model based on several weighted cross-entropy loss functions to generate an unbiased classification model.

[0084] In a preferred embodiment, the obtaining unit includes: a first generation subunit, used to apply the same first weight to each sample and generate a first weighted cross-entropy loss function when the number of iterations of the model is less than E1; a second generation subunit, used to apply a second weight to the cross-entropy loss function to focus on the category with fewer samples when the number of iterations of the model is greater than E1 and less than E2, and generate a second weighted cross-entropy loss function; and a third generation subunit, used to apply a third weight to the cross-entropy loss function to focus on training samples that are more difficult to identify when the number of iterations of the model is greater than E2 and less than E, and generate a third weighted cross-entropy loss function.

[0085] In a preferred embodiment, the apparatus further includes a data preprocessing module for performing data augmentation processing on the training samples to generate a training sample dataset.

[0086] In a preferred embodiment, the device further includes an output module for inputting the weighted features into the classification layer to obtain the classification prediction result corresponding to the weighted features.

[0087] The above-described device can execute the gestational diabetes mellitus identification method based on fundus images provided in an embodiment of the present invention, and has the corresponding functional modules and beneficial effects for executing the gestational diabetes mellitus identification method based on fundus images. Technical details not described in detail in this embodiment can be found in the gestational diabetes mellitus identification method based on fundus images provided in an embodiment of the present invention.

[0088] The present invention also provides an electronic device, comprising: a processor; a memory for storing executable instructions of the processor; the processor being configured to read the executable instructions from the memory and execute the instructions to implement the method for identifying gestational diabetes based on fundus images according to the present invention.

[0089] In addition to the methods and apparatus described above, embodiments of this application may also be computer program products, which include computer program instructions that, when executed by a processor, cause the processor to perform the steps in the methods according to various embodiments of this application described in the "Exemplary Methods" section above.

[0090] The computer program product can be written in any combination of one or more programming languages ​​to perform the operations of the embodiments of this application. The programming languages ​​include object-oriented programming languages ​​such as Java and C++, as well as conventional procedural programming languages ​​such as C or similar languages. The program code can be executed entirely on the user's computing device, partially on the user's computing device, as a standalone software package, partially on the user's computing device and partially on a remote computing device, or entirely on a remote computing device or server.

[0091] Furthermore, embodiments of this application may also be computer-readable storage media storing computer program instructions thereon, which, when executed by a processor, cause the processor to perform the steps in the methods according to the following embodiments of this application described in the "Exemplary Methods" section above.

[0092] The computer-readable storage medium may be any combination of one or more readable media. A readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may, for example, include, but is not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatuses, or devices, or any combination thereof. More specific examples of readable storage media (a non-exhaustive list) include: electrical connections having one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.

[0093] The basic principles of this application have been described above with reference to specific embodiments. However, it should be noted that the advantages, benefits, and effects mentioned in this application are merely examples and not limitations, and should not be considered as essential features of each embodiment of this application. Furthermore, the specific details disclosed above are for illustrative and facilitative purposes only, and are not limitations. These details do not limit the application to the necessity of employing the aforementioned specific details for implementation.

[0094] The block diagrams of devices, apparatuses, devices, and systems involved in this application are merely illustrative examples and are not intended to require or imply that they must be connected, arranged, or configured in the manner shown in the block diagrams. As those skilled in the art will recognize, these devices, apparatuses, devices, and systems can be connected, arranged, and configured in any manner. Words such as “comprising,” “including,” “having,” etc., are open-ended terms meaning “including but not limited to,” and are used interchangeably with them. The terms “or” and “and” as used herein refer to the terms “and / or,” and are used interchangeably with them unless the context clearly indicates otherwise. The term “such as” as used herein refers to the phrase “such as but not limited to,” and is used interchangeably with it.

[0095] It should also be noted that in the apparatus, equipment, and methods of this application, the components or steps can be disassembled and / or recombined. These disassemblies and / or recombinations should be considered as equivalent solutions of this application.

[0096] The above description of the disclosed aspects is provided to enable any person skilled in the art to make or use this application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the general principles defined herein can be applied to other aspects without departing from the scope of this application. Therefore, this application is not intended to be limited to the aspects shown herein, but rather to be accorded the widest scope consistent with the principles and novel features disclosed herein.

[0097] The above description has been given for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of this application to the forms disclosed herein. Although numerous exemplary aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, alterations, additions, and sub-combinations thereof.

[0098] In the description of this specification, references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the present invention. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of those different embodiments or examples.

[0099] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this invention, "a plurality of" means two or more, unless otherwise explicitly specified.

[0100] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A method for identifying gestational diabetes based on fundus images, characterized in that, include: Use labeled fundus images as training samples; Obtain the first-scale image and the second-scale image corresponding to the training samples; the labels include at least a normal label and a gestational diabetes label. Feature extraction is performed on the first scale image and the second scale image respectively to generate a first feature representation corresponding to the first scale image and a second feature representation corresponding to the second scale image; Based on two different attention models, the first feature representation and the second feature representation are cross-weighted according to a cross-attention strategy to obtain weighted features; Using the true labels of the training samples as a reference, supervised model training is performed based on the classification prediction results corresponding to the weighted features to generate an unbiased classification model. The method involves cross-weighting the first feature representation and the second feature representation based on two different attention models according to a cross-attention strategy to obtain weighted features; including: The first feature representation is input into two different attention models to generate corresponding first and second weights. The second feature representation is input into two different attention models to generate corresponding third and fourth weights; Based on the first weight, the second weight, the third weight, and the fourth weight, a cross-attention strategy is used to perform cross-weighting on the first feature representation and the second feature representation to obtain weighted features. The step of performing cross-weighting processing on the first feature representation and the second feature representation using a cross-attention strategy based on the first weight, the second weight, the third weight, and the fourth weight to obtain weighted features includes: The first feature representation is weighted by applying a third weight and a fourth weight respectively, and then weighted to generate a first weighted feature. The second feature representation is weighted by applying a first weight and a second weight respectively, and then weighted to generate a second weighted feature. The first weighted feature and the second weighted feature are summed to generate a weighted feature.

2. The method according to claim 1, characterized in that, The step of using the true labels of the training samples as a reference and performing supervised model training based on the classification prediction results corresponding to the weighted features to generate an unbiased classification model includes: Using the true labels of the training samples as a reference, supervised model training is performed based on the classification prediction results corresponding to the weighted features. During model training, the weights corresponding to the cross-entropy loss function are determined based on the number of model iterations, resulting in the weighted cross-entropy loss function. An unbiased classification model is generated based on the optimization model of several weighted cross-entropy loss functions.

3. The method according to claim 2, characterized in that, The weights corresponding to the cross-entropy loss function are determined based on the number of model iterations to obtain the weighted cross-entropy loss function. include: When the number of iterations of the model is less than At that time, the same first weight is applied to each sample to generate the first weighted cross-entropy loss function; When the number of iterations of the model is greater than and less than At that time, a second weight is applied to the cross-entropy loss function to focus on the category with a smaller number of samples, thereby generating a second weighted cross-entropy loss function; When the number of iterations of the model is greater than and less than At that time, a third weight is applied to the cross-entropy loss function to focus on training samples that are more difficult to identify, thereby generating a third weighted cross-entropy loss function.

4. The method according to claim 1, characterized in that, Also includes: The training samples are subjected to data augmentation processing to generate a training sample dataset.

5. The method according to claim 1, characterized in that, Also includes: The weighted features are input into the classification layer to obtain the classification prediction results corresponding to the weighted features.

6. A device for identifying gestational diabetes based on fundus images, characterized in that, include: The acquisition module is used to use labeled fundus images as training samples; Obtain the first-scale image and the second-scale image corresponding to the training samples; the labels include at least a normal label and a gestational diabetes label. The generation module is used to extract features from the first scale image and the second scale image respectively, and generate a first feature representation corresponding to the first scale image and a second feature representation corresponding to the second scale image. The weighted processing module is used to perform cross-weighted processing on the first feature representation and the second feature representation based on two different attention models and according to a cross-attention strategy to obtain weighted features. The model training module is used to take the real labels of the training samples as a reference and perform supervised model training based on the classification prediction results corresponding to the weighted features to generate an unbiased classification model. The weighted attention module includes: a first generation unit, used to input the first feature representation into two different attention models to generate corresponding first weights and second weights; a second generation unit, used to input the second feature representation into two different attention models to generate corresponding third weights and fourth weights; and a cross-weighting processing unit, used to perform cross-weighting processing on the first feature representation and the second feature representation according to the first weight, the second weight, the third weight, and the fourth weight, using a cross-attention strategy to obtain weighted features; The cross-weighting processing unit includes: a first generation subunit, used to apply a third weight and a fourth weight to the first feature representation respectively and then perform weighting processing to generate a first weighted feature; a second generation subunit, used to apply a first weight and a second weight to the second feature representation respectively and then perform weighting processing to generate a second weighted feature; and a weighting subunit, used to sum the first weighted feature and the second weighted feature to generate a weighted feature.

7. An electronic device, comprising: processor; Memory used to store the processor's executable instructions; The processor is configured to read the executable instructions from the memory and execute the method as described in any one of claims 1-5.

8. A computer-readable medium having a computer program stored thereon, which, when executed by a processor, implements the method as described in any one of claims 1-5.