A method, apparatus, device and medium for grading diabetic retinopathy
By introducing ordered representations and reinforcement learning into the hierarchical network model, the serious misjudgment problem in existing diabetic retinopathy grading methods is solved, achieving higher prediction accuracy and clinical safety.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHENZHEN UNIV
- Filing Date
- 2026-05-20
- Publication Date
- 2026-06-19
AI Technical Summary
Existing deep learning-based grading methods for diabetic retinopathy ignore the ordered distances between different grades, leading to serious misjudgments and failing to meet the safety requirements of clinical diagnosis.
Image patch embedding features are obtained through an initial hierarchical network model and ordered representations are added. Then, a hierarchical network model is constructed by introducing hierarchical bias and attention weights using reinforcement learning and ordered inductive representation learning modules. Finally, the hierarchical network model is optimized by combining accuracy rewards, ordered consistency rewards, and asymmetric safety penalty rewards.
It improves the accuracy of predicting diabetic retinopathy grades, reduces the probability of misdiagnosis across large grade ranges, and meets the safety requirements of clinical diagnosis.
Smart Images

Figure CN122244031A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of medical artificial intelligence technology, and in particular to a method, device, equipment and medium for grading diabetic retinopathy. Background Technology
[0002] Diabetic retinopathy (DR) is a leading cause of irreversible blindness in working-age populations worldwide. In clinical practice, the severity of DR is classified into five progressive levels, in the following order: no DR (Grade 0) → mild non-proliferative DR (Grade 1) → moderate non-proliferative DR (Grade 2) → severe non-proliferative DR (Grade 3) → proliferative DR (Grade 4).
[0003] With the development of deep learning technology, automatic grading methods for diabetic retinopathy (DR) based on deep learning (such as convolutional neural networks and Transformer base models) have been proposed. However, existing methods generally treat DR grading as an unordered classification task, ignoring the ordered distance between different DR grades. This results in the penalty for serious jump misjudgments (such as predicting grade 4 as grade 0) being no more severe than that for minor deviations (such as predicting grade 4 as grade 3), leading to large-scale misjudgments in the grading results, which cannot meet the safety requirements of clinical diagnosis.
[0004] Therefore, existing technologies still need to be improved and enhanced. Summary of the Invention
[0005] The technical problem to be solved by this application is to provide a method, device, equipment and medium for grading diabetic retinopathy, in order to address the shortcomings of the prior art.
[0006] To address the aforementioned technical problems, the first aspect of this application provides a grading method for diabetic retinopathy, wherein the grading method for diabetic retinopathy specifically includes: The image patch embedding feature set corresponding to the training fundus image is obtained through an initial hierarchical network model, and ordered representations are added to the image patch embedding features in the image patch embedding feature set to obtain the image features corresponding to the training fundus image. The ordered representations are used to embed the progressive manifold of the severity of diabetic retinopathy into the latent space. Based on the image features, determine the prediction probability vector corresponding to the training fundus image; Based on the image features, the prediction vector of the cumulative probability space corresponding to the fundus image is determined by a preset classifier set, and the initial hierarchical network model is pre-trained based on the prediction probability vector and the prediction vector to obtain a pre-trained hierarchical network model. The prediction vector of the cumulative probability space is used to learn the ordered progressive relationship of the diabetic retinopathy level. The pre-trained hierarchical network model is used as a policy network, and reinforcement learning is performed on the policy network to obtain a hierarchical network model. The rewards used in reinforcement learning include accuracy rewards, ordered consistency rewards, and asymmetric security penalty rewards. The graded network model is used to determine the diabetic retinopathy level of the fundus images to be graded.
[0007] The aforementioned grading method for diabetic retinopathy, wherein the initial grading network model includes a visual feature extraction module and an ordered inductive representation learning module; the step of obtaining image features corresponding to the training fundus images through the initial grading network model specifically includes: The visual feature extraction module is used to extract the image patch embedding feature set corresponding to the training fundus image; The ordered inductive representation learning module is used to determine the grade bias of each diabetic retinopathy grade based on the learnable interval, wherein the grade biases of each diabetic retinopathy grade satisfy the monotonicity constraint. For each image patch embedding feature in the image patch embedding feature set, the ordered inductive representation learning module determines the attention weight of the image patch embedding feature in each diabetic retinopathy level based on the rank bias; The ordered inductive representation learning module uses the attention weights to fuse all image patch embedding features in the image patch embedding feature set to determine the image features corresponding to the training fundus image.
[0008] The aforementioned grading method for diabetic retinopathy, wherein the step of fusing all image patch embedding features in the image patch embedding feature set based on the attention weights using the ordered inductive representation learning module to determine the image features corresponding to the training fundus image specifically includes: For each diabetic retinopathy level, the ordered inductive representation learning module uses the attention weight of each image patch embedding feature at the diabetic retinopathy level to weight all image patch embedding features in the image patch embedding feature set to obtain the layer embedding feature corresponding to the diabetic retinopathy level. The mean of the layer embedding features corresponding to all diabetic retinopathy levels, the maximum layer embedding feature among all layer embedding features corresponding to all diabetic retinopathy levels, and the learnable class vector are concatenated to obtain the image features corresponding to the training fundus image.
[0009] The aforementioned method for grading diabetic retinopathy, wherein the reinforcement learning of the policy network to obtain a grading network model specifically includes: The policy network determines the prediction probability vector corresponding to the training fundus image, and samples the prediction probability vector to generate a sampled prediction probability vector group. Obtain the reward for each sampled prediction probability vector in the sampled prediction probability vector group, and use the reward to determine the ordered risk advantage of each sampled prediction probability vector; An optimization objective is constructed by leveraging the advantages of ordered risk, and the policy network is optimized using the optimization objective to obtain a hierarchical network model.
[0010] The aforementioned method for grading diabetic retinopathy, wherein obtaining the reward for each sampled prediction probability vector in the sampled prediction probability vector group specifically includes: For each sampling prediction probability vector in the sampling prediction probability vector group, determine the sampling prediction level corresponding to the sampling prediction probability vector; The accuracy reward is determined when the sampled prediction level is the same as the level label corresponding to the training fundus image; The ordered consistency reward is determined by the difference between the sampled prediction level and the level label. The asymmetric security penalty reward is determined by using the level difference and the penalty item that the sampled predicted level is less than the level label; The reward for the sampling prediction probability vector is determined based on the accuracy reward, the ordered consistency reward, and the asymmetric security penalty reward.
[0011] The aforementioned method for grading diabetic retinopathy, wherein determining the ordered risk advantage of each sampled prediction probability vector using the reward specifically includes: The mean and standard deviation of the reward for the sampled prediction probability vector group are obtained based on the rewards of all sampled prediction probability vectors. The reward difference for each sampled prediction probability vector is determined by using the reward of each sampled prediction probability vector and the mean of the rewards, and the ordered risk advantage of each sampled prediction probability vector is determined by using the reward difference of each sampled prediction probability vector and the standard deviation of the rewards.
[0012] The aforementioned grading method for diabetic retinopathy, wherein the optimization objective is expressed as: , in, Indicates the optimization objective. Indicates the first Zhang training fundus images, Indicates the first Each sampled prediction probability vector This represents the ratio of the probability of the policy network at the current time step to that of the policy network at the previous time step. This indicates an advantage in orderly risk management. This represents the shear hyperparameter. Indicates the regularization strength. Represents the policy network, This represents the reference policy network. Represents the divergence function. This represents the truncation function. Represents the expectation operator. This represents the function that takes the minimum value.
[0013] A second aspect of this application provides a grading device for diabetic retinopathy, wherein the grading device specifically includes: A construction module is used to obtain an image patch embedding feature set corresponding to the training fundus image through an initial hierarchical network model, and add ordered representations to the image patch embedding features in the image patch embedding feature set to obtain image features corresponding to the training fundus image; determine the prediction probability vector corresponding to the training fundus image based on the image features; determine the prediction vector of the cumulative probability space corresponding to the fundus image based on the image features through a preset classifier set, and pre-train the initial hierarchical network model based on the prediction probability vector and the prediction vector to obtain a pre-trained hierarchical network model; use the pre-trained hierarchical network model as a policy network, and perform reinforcement learning on the policy network to obtain a hierarchical network model; wherein, the ordered representation is used to embed the progressive manifold of the severity of diabetic retinopathy into the latent space; the prediction vector of the cumulative probability space is used to learn the ordered progressive relationship of the diabetic retinopathy level, and the rewards used in reinforcement learning include accuracy rewards, ordered consistency rewards, and asymmetric safety penalty rewards; A control module is used to determine the diabetic retinopathy grade of the fundus image to be graded using the hierarchical network model.
[0014] A third aspect of this application provides a computer-readable storage medium storing one or more programs that can be executed by one or more processors to implement the steps in the grading method for diabetic retinopathy as described above.
[0015] A fourth aspect of this application provides a terminal device, which includes: a processor and a memory; The memory stores a computer-readable program that can be executed by the processor; When the processor executes the computer-readable program, it implements the steps in any of the above-described methods for grading diabetic retinopathy.
[0016] Beneficial effects: 1. This application first uses cumulative probability regularization to enforce the probability consistency of severity thresholds, so that the hierarchical network model can learn the orderly progressive relationship of diabetic retinopathy levels. This avoids the hierarchical network model penalizing serious jump misjudgments (e.g., predicting level 4 as level 0) as severely as it penalizes minor deviations (e.g., predicting level 4 as level 3), thus improving the prediction accuracy of diabetic retinopathy levels.
[0017] 2. After obtaining the pre-trained grading network model, this application uses the pre-trained grading network model as the policy network and optimizes the pre-trained grading network model through reinforcement learning to further strengthen the constraints of ordered grading, thereby effectively improving the accuracy and clinical safety of grading diabetic retinopathy.
[0018] 3. In the process of reinforcement learning on the pre-trained hierarchical network model, this application constructs an ordered risk advantage through rewards including accuracy rewards, ordered consistency rewards, and asymmetric security penalty rewards, and uses the ordered risk advantage to construct an optimization objective, thereby further improving the accuracy of the pre-trained hierarchical network model in classifying diabetic retinopathy.
[0019] 4. This application utilizes the extraction of image patch embedding features and then introduces a monotonic grade bias into the image patch embedding features to determine the image features corresponding to the training fundus images. This allows the progressive manifold of DR severity to be directly embedded into the latent space, which helps the model to better learn the ordered progressive relationship between different grades, further reducing the probability of misjudgment of large-span grades, improving the accuracy of the grading results, and better meeting the safety requirements of clinical diagnosis. Attached Figure Description
[0020] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0021] Figure 1 A flowchart of a grading method for diabetic retinopathy provided in an embodiment of this application.
[0022] Figure 2 This is a flowchart illustrating the principle of constructing a hierarchical network model.
[0023] Figure 3 This is a diagram illustrating the grades of diabetic retinopathy.
[0024] Figure 4 This is a schematic diagram of the experimental results.
[0025] Figure 5 A schematic diagram of the grading device for diabetic retinopathy provided in this application embodiment.
[0026] Figure 6 A schematic block diagram of the terminal device provided in the embodiments of this application. Detailed Implementation
[0027] This application provides a method, apparatus, device, and medium for grading diabetic retinopathy. To make the objectives, technical solutions, and effects of this application clearer and more explicit, the following detailed description is provided with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only for explaining this application and are not intended to limit this application.
[0028] Those skilled in the art will understand that, unless specifically stated otherwise, the singular forms “a,” “an,” “the,” and “the” used herein may also include the plural forms. It should be further understood that the term “comprising” as used in this application means the presence of the stated features, integers, steps, operations, elements, and / or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and / or groups thereof. It should be understood that when we say an element is “connected” or “coupled” to another element, it can be directly connected or coupled to the other element, or there may be intermediate elements. Furthermore, “connected” or “coupled” as used herein can include wireless connections or wireless coupling. The term “and / or” as used herein includes all or any units and all combinations of one or more associated listed items.
[0029] It will be understood by those skilled in the art that, unless otherwise defined, all terms used herein (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains. It should also be understood that terms such as those defined in general dictionaries should be understood to have the same meaning as in the context of the prior art, and should not be interpreted in an idealized or overly formal sense unless specifically defined as herein.
[0030] It should be understood that the sequence number and size of each step in this embodiment do not imply the order of execution. The execution order of each process is determined by its function and internal logic, and should not constitute any limitation on the implementation process of this application embodiment.
[0031] The application content will be further explained below with reference to the accompanying drawings and the description of the embodiments.
[0032] This embodiment provides a grading method for diabetic retinopathy, such as... Figure 1 and Figure 2As shown, the grading method for diabetic retinopathy specifically includes: S10. Obtain the image patch embedding feature set corresponding to the training fundus image through the initial hierarchical network model, add ordered representations to the image patch embedding features in the image patch embedding feature set to obtain the image features corresponding to the training fundus image, and determine the prediction probability vector corresponding to the training fundus image based on the image features.
[0033] Specifically, the initial hierarchical network model is a neural network model to be trained, used for feature extraction and probability prediction of the input training fundus images. The image patch embedding feature set is determined by feature extraction from the training fundus images through the initial hierarchical network model. The image features corresponding to the training fundus images are determined by adding ordered representations to the image patch embedding features in the image patch embedding feature set. These ordered representations are used to embed the progressive manifold of diabetic retinopathy severity into the latent space. In other words, after obtaining the image patch embedding feature set, this application adds ordered representations to the image patch embedding features in the image patch embedding feature set to embed the progressive manifold of diabetic retinopathy severity into the latent space, so that the obtained image features contain an ordered progressive relationship between different diabetic retinopathy levels.
[0034] In one embodiment, the initial hierarchical network model includes a feature extraction component and a classification component. The feature extraction component extracts image features corresponding to training fundus images, and the classification component determines the predicted probability vector corresponding to the training fundus images based on the image features. The feature extraction component may include a visual feature extraction module (e.g., using a Visual Transformer (ViT) backbone network), or it may include visual feature extraction (e.g., using a Visual Transformer (ViT) backbone network) and an ordered inductive representation learning module (e.g., an attention module). The visual feature extraction module extracts image patch embedding features from the training fundus images, and the ordered inductive representation learning module adds ordered representations to the image patch embedding features to directly embed the asymptotic manifold of DR severity into the latent space, enabling the initial hierarchical network model to learn the ordered progressive relationship between different levels from the image features. The hierarchical component may include a hierarchical module, such as a linear classification layer; this hierarchical module performs probability prediction to determine the predicted probability vector corresponding to the training fundus images. Ordered representations can be grade biases for each diabetic retinopathy grade, attention features determined based on grade biases for each diabetic retinopathy grade, or attention weights for image patch embedding features based on grade biases for each diabetic retinopathy grade.
[0035] In one embodiment, such as Figure 2As shown, the feature extraction component includes a visual feature extraction module and an ordered inductive representation learning module. Correspondingly, the initial hierarchical network model includes a visual feature extraction module and an ordered inductive representation learning module. The specific steps of obtaining image features corresponding to the training fundus image through the initial hierarchical network model include: S11. Use the visual feature extraction module to extract the image patch embedding feature set corresponding to the training fundus image; S12. The ordered inductive representation learning module is used to determine the grade bias of each diabetic retinopathy grade based on the learnable interval. S13. For each image patch embedding feature in the image patch embedding feature set, the ordered inductive representation learning module is used to determine the attention weight of the image patch embedding feature in each diabetic retinopathy level based on the level bias. S14. The ordered inductive representation learning module fuses all image patch embedding features in the image patch embedding feature set based on the attention weights to determine the image features corresponding to the training fundus image.
[0036] In step S11, the image patch embedding feature in the image patch embedding feature set corresponds to an image patch in the training fundus image, and it is obtained by visual feature extraction of that training fundus image. That is, when extracting the image patch embedding feature set using the visual feature extraction module, the training fundus image is divided into... Each image patch is analyzed, and visual features are extracted from each patch to obtain an image patch embedding feature set. This image patch embedding feature set includes... Image patch embedding features, which can be represented as , Represents the image patch embedding feature set. Indicates the first Image patch embedding features, This indicates the number of image patch embedding features.
[0037] In step S12, the learnable interval is a non-negative learnable interval. The accumulation of learnable intervals ensures that the grade biases of each diabetic retinopathy level satisfy the monotonicity constraint. Specifically, the grade biases of each diabetic retinopathy level satisfy the monotonicity constraint, and the grade biases of each diabetic retinopathy level are positively correlated with the severity of each diabetic retinopathy level; that is, the higher the level, the larger the corresponding grade bias value. This directly introduces the progressively ordered relationship of the levels into the feature space.
[0038] In one embodiment, a configuration is provided. The diabetic retinopathy grades are denoted as 0, 1, ... The minimum diabetic retinopathy grade bias can be preset. Specifically, such as... Figure 3 As shown, diabetic retinopathy is clinically classified into five grades: Grade 0, Grade 1, Grade 2, Grade 3, and Grade 4. The minimum grade (Grade 0) can be set to a default value. Then, based on the grade difference between the current grade and the minimum grade, a learnable interval is accumulated at the minimum grade to determine the grade bias for each grade of diabetic retinopathy, excluding the minimum grade.
[0039] For example, the grade bias of diabetic retinopathy can be expressed as: , in, Indicates the first Grade bias of diabetic retinopathy levels This indicates the grade bias of the minimum diabetic retinopathy level. Indicates the learning interval.
[0040] It should be noted that in practical applications, other methods can also be used to determine the grade bias of diabetic retinopathy, as long as the grade biases of each diabetic retinopathy grade satisfy the monotonicity constraint.
[0041] In step S13, attention weights are used as ordered features, which can characterize the contribution of different image patch embedding features to each diabetic retinopathy grade. The closer the lesion features contained in the image patch are to the image features corresponding to the diabetic retinopathy grade, the higher the attention weights are obtained. By determining the attention weights through grade bias, ordered priors of DR severity can be incorporated into the calculation of attention, allowing the graded network model to distinguish the value of different image patches to different graded tasks during the feature fusion stage, which is more in line with the clinical pattern of DR grade progression.
[0042] In one embodiment, the attention weights of image patch embedding features at each diabetic retinopathy level can be represented as: , in, Indicates the first The first image patch embedding feature Attention weighting for each diabetic retinopathy grade. Indicates the temperature coefficient. Represents a linear mapping. express Activation function Indicates the first Image patch embedding features, Indicates the first Grade bias of diabetic retinopathy levels.
[0043] In step S14, after obtaining the attention weights of the image patch embedding features at each diabetic retinopathy level, these attention weights can be used to fuse the image patch embedding features to obtain image features. Specifically, during the fusion process, the attention weights of the image patch embedding features at each diabetic retinopathy level can be used to fuse the image patch embedding features at each diabetic retinopathy level to obtain layer embedding features corresponding to each diabetic retinopathy level. These layer embedding features follow an ordered manifold based on the severity of diabetic retinopathy. Then, the layer embedding features corresponding to each diabetic retinopathy level are fused to obtain the image features.
[0044] For example, the step of using the ordered inductive representation learning module to fuse all image patch embedding features in the image patch embedding feature set based on the attention weights to determine the image features corresponding to the training fundus image specifically includes: For each diabetic retinopathy level, the ordered inductive representation learning module uses the attention weight of each image patch embedding feature at the diabetic retinopathy level to weight all image patch embedding features in the image patch embedding feature set to obtain the layer embedding feature corresponding to the diabetic retinopathy level. The mean of the layer embedding features corresponding to all diabetic retinopathy levels, the maximum layer embedding feature among all layer embedding features corresponding to all diabetic retinopathy levels, and the learnable class vector are concatenated to obtain the image features corresponding to the training fundus image.
[0045] Specifically, the layer embedding feature is a feature representation that integrates the embedding feature information of all image patches and the ordered prior of the severity of diabetic retinopathy. Each layer embedding feature corresponds to a diabetic retinopathy level. The layer embedding feature can be represented as: , in, Indicates the first Layer embedding features for each diabetic retinopathy grade, Indicates the first The first image patch embedding feature Attention weighting for each diabetic retinopathy grade. Indicates the first Image patch embedding features.
[0046] Furthermore, after obtaining the layer embedding features corresponding to the diabetic retinopathy (DR) severity levels, the mean of all layer embedding features corresponding to all DR severity levels, the maximum layer embedding feature among all layer embedding features corresponding to all DR severity levels, and the learnable class vectors are all extracted. This approach simultaneously preserves the overall distribution information of features at different levels and the most prominent lesion features. It reflects the orderly progression of DR severity and ensures that the final image features fully serve the subsequent grading prediction task, guaranteeing that the feature extraction process fully incorporates prior knowledge of DR grading. The image features can be represented as follows: , in, Representing image features, The number indicating the grade of diabetic retinopathy. This indicates a splicing operation. Represents a linear mapping. Represents a learnable category vector. Indicates the first Layer embedding features for each diabetic retinopathy grade.
[0047] S20. Based on the image features, determine the prediction vector of the cumulative probability space corresponding to the fundus image through a preset classifier set, and pre-train the initial hierarchical network model based on the prediction probability vector and the prediction vector to obtain a pre-trained hierarchical network model.
[0048] Specifically, the preset classifier set is a hierarchical component independent of the initial hierarchical network model. Its input data is image features, and its output is a prediction vector in the cumulative probability space. This prediction vector is used to learn the ordered progression of diabetic retinopathy levels. The preset classifier set includes... Each classifier pair A grading task model is used to model the levels of diabetic retinopathy to capture ordered relationships. Specifically, the first classifier in the pre-defined classifier set... The classifier is used to determine whether the predicted level is greater than or equal to the first classifier. The first diabetic retinopathy grade, i.e., when the first The classifier is used to determine whether the predicted level is greater than or equal to the first one. When the diabetic retinopathy grade is 1, the first The prediction result of the i-th classifier is 1, and vice versa. The classifier is used to determine the predicted level as less than the first one. When the diabetic retinopathy grade is 1, the first The prediction result of each classifier is 0. That is to say, The predicted vectors output by each classifier are The predicted vector of dimension, The prediction vector of dimension includes The prediction result of each classifier in the 1 classifiers.
[0049] Furthermore, regarding the grading labels for diabetic retinopathy. , The cumulative probability labels of the classifiers are: , , in, express The cumulative probability labels of each classifier Indicates the first The cumulative probability label of each classifier.
[0050] Therefore, after obtaining the prediction probability vector and the prediction vector, a loss function can be constructed based on the prediction probability vector and the prediction vector, and this loss function can be used to pre-train the initial hierarchical network model to obtain a pre-trained hierarchical network model. The loss function can be expressed as: , in, Represents the loss function. Represents the cross-entropy loss term. express The average value of the cross-entropy loss term of each classifier. Denotes divergence, express The prediction vectors of each classifier are transformed into Discrete probability distributions at each level Represents the prediction vector. Represents the predicted probability vector. This represents a probability vector label determined based on the diabetic retinopathy grade label.
[0051] S30. Use the pre-trained hierarchical network model as a policy network, and perform reinforcement learning on the policy network to obtain a hierarchical network model.
[0052] It is understandable that pre-training establishes a probabilistic representation for the pre-trained hierarchical network model. However, because the pre-training process aims at average accuracy, it is difficult for the pre-trained hierarchical network model to capture the inherent ordinal risk asymmetry in clinical practice. Therefore, after obtaining the pre-trained hierarchical network model, reinforcement learning is used to optimize it to directly embed clinical safety constraints into the reinforcement learning process, thereby achieving ordered risk alignment in the hierarchical network model.
[0053] Specifically, in the process of reinforcement learning on the policy network, a clinically guided reward function can be constructed first, then the reward can be determined using the clinically guided reward function, and an optimization objective can be determined based on the reward. Finally, the policy network can be optimized using the optimization objective.
[0054] In one embodiment, performing reinforcement learning on the policy network to obtain a hierarchical network model specifically includes: S31. The prediction probability vector corresponding to the training fundus image is determined through the policy network, and the prediction probability vector is sampled to generate a sampled prediction probability vector group. S32. Obtain the reward for each sampled prediction probability vector in the sampled prediction probability vector group, and use the reward to determine the ordered risk advantage of each sampled prediction probability vector; S33. Construct an optimization objective using the advantage of ordered risk, and use the optimization objective to optimize the policy network to obtain a hierarchical network model.
[0055] In step S31, the sampled prediction probability vector group includes several sampled prediction probability vectors. Each sampled prediction probability vector is obtained by sampling from the prediction probability vector through a multinomial distribution, and each sampled prediction probability vector corresponds to a classification result. The generation process of the sampled prediction probability vector group can be as follows: after obtaining the prediction probability vector, a temperature coefficient is used... The multinomial sampling strategy uniformly samples the predicted probability vector to generate a vector containing... A set of sampled prediction probability vectors. This set of sampled prediction probability vectors can be represented as: , , in, Represents a set of sampled prediction probability vectors. Indicates the first Each sampled prediction probability vector The expression strategy network determines the predicted probability vector. Indicates the temperature coefficient. express Activation function Represents polynomial operations.
[0056] This application embodiment uses a temperature coefficient to perform multinomial sampling on the predicted probability vector determined by the policy network, which forces the policy network to explore adjacent levels and exposes the insecure decision boundary to the reward function. This avoids the problem of the group variance returning to zero due to the overconfident predicted probability vector generated by the policy network, thereby ensuring that the reinforcement learning process can converge stably and will not fall into a local optimum due to insufficient exploration.
[0057] In step S32, the reward is determined jointly by the predicted grading result corresponding to the sampled predicted probability vector and the diabetic retinopathy grade label. This reward includes an accuracy reward, an ordered consistency reward, and an asymmetric safety penalty reward. The accuracy reward encourages consistency between the predicted grading result and the diabetic retinopathy grade label; the ordered consistency reward ensures the ordered nature of diabetic retinopathy severity; and the asymmetric safety penalty reward penalizes erroneous grading such as missing severe lesions or misclassifying severe cases as mild cases, thereby reducing the risk of underestimating clinically high-risk cases.
[0058] In one embodiment, obtaining the reward for each sampled prediction probability vector in the sampled prediction probability vector group specifically includes: For each sampling prediction probability vector in the sampling prediction probability vector group, determine the sampling prediction level corresponding to the sampling prediction probability vector; The accuracy reward is determined when the sampled prediction level is the same as the level label corresponding to the training fundus image; The ordered consistency reward is determined by the difference between the sampled prediction level and the level label. The asymmetric security penalty reward is determined by using the level difference and the penalty item that the sampled predicted level is less than the level label; The reward for the sampling prediction probability vector is determined based on the accuracy reward, the ordered consistency reward, and the asymmetric security penalty reward.
[0059] Specifically, accuracy rewards It can be represented as: ,in, Indicates an indicator function, Represents the sampled prediction probability vector Sampling prediction level, Represents the sampled prediction probability vector The corresponding grade label. The ordered consistency reward is negatively correlated with the difference between the sampled prediction grade and the grade label, which can be expressed as: ,in, Indicates the first exponent coefficient. This represents a distance-based penalty. Asymmetric security penalty-reward system. This method, used to penalize high-risk grading behaviors such as missed diagnoses and misjudgments to meet the clinical need for risk asymmetry, can be represented as: , This indicates that serious cases of missed diagnoses will be the primary focus of punishment. This represents the second exponential coefficient.
[0060] Based on this, the reward for the sampled prediction probability vector can be expressed as: , in, Represents the sampled prediction probability vector The reward.
[0061] Furthermore, in step S32 above, after obtaining each sampled prediction probability vector... After awarding the reward, to distribute the relative quality of each sampled prediction probability vector, the reward mean and reward variance of the sampled prediction probability vector group can be used to optimize the quality of each sampled prediction probability vector. The reward is processed to obtain the predicted probability vector for each sample. The orderly risk advantage.
[0062] Based on this, the determination of the ordered risk advantage of each sampled prediction probability vector using the reward specifically includes: The mean and standard deviation of the reward for the sampled prediction probability vector group are obtained based on the rewards of all sampled prediction probability vectors. The reward difference for each sampled prediction probability vector is determined by using the reward of each sampled prediction probability vector and the mean of the rewards, and the ordered risk advantage of each sampled prediction probability vector is determined by using the reward difference of each sampled prediction probability vector and the standard deviation of the rewards.
[0063] Specifically, after obtaining the reward mean and reward standard deviation, the sampled prediction probability vector can be standardized using the reward mean and reward standard deviation. In the process of standardization, in order to avoid calculation errors caused by the reward standard deviation being zero, a numerical constant is added to the reward standard deviation. This numerical constant is used to avoid division by zero errors when the reward standard deviation is zero.
[0064] Based on this, the ordered risk advantage of the sampled prediction probability vector can be expressed as: , in, Represents the sampled prediction probability vector The orderly risk advantage, Represents the sampled prediction probability vector The reward This represents the difference in rewards. Indicates the standard deviation of the reward. Represents a numerical constant.
[0065] It should be noted that, in practical applications, the reward of each sampled prediction probability vector can also be directly regarded as the ordered risk advantage of each sampled prediction probability vector.
[0066] Furthermore, in step S34, after obtaining the ordered risk advantage of each sampled prediction probability vector, a truncation function is used to replace the objective function to maximize the expected advantage, thereby ensuring monotonic improvement without destructive parameter updates. Simultaneously, the probability ratio of the current policy network to the previous policy network is incorporated into the optimization objective to avoid excessive changes in policy parameters during a single update, ensuring the stability of the training process.
[0067] Based on this, the optimization objective is expressed as: , in, Indicates the optimization objective. Indicates the first Zhang training fundus images, Indicates the first Each sampled prediction probability vector This represents the ratio of the probability of the policy network at the current time step to that of the policy network at the previous time step. This indicates an advantage in orderly risk management. This represents the shear hyperparameter. Indicates the regularization strength. Represents the policy network, This represents the reference policy network. Represents the divergence function. This represents the truncation function. Represents the expectation operator. This represents the function that takes the minimum value.
[0068] Both the policy network and the reference policy network are determined based on the aforementioned pre-trained hierarchical network model. The parameters of the policy network are updated during reinforcement learning, while the parameters of the reference policy network remain frozen during this process. The reference policy network constrains the parameter updates of the policy network, preventing training instability due to excessively large update magnitudes and ensuring stable optimization of the policy network during iterations. This embodiment, through the dual constraints of a truncation function and divergence regularization, can effectively optimize the policy network while ensuring training stability, ultimately obtaining a hierarchical network model capable of accurately classifying diabetic retinopathy.
[0069] Furthermore, it should be noted that in practical applications, sampling the predicted probability vector is not necessary. Instead, the reward for the predicted probability vector can be determined directly using the aforementioned process of obtaining the reward from sampling the predicted probability vector. This reward can then be used as the ordered risk advantage of the predicted probability vector, and the optimization objective can be constructed using this ordered risk advantage. In other words, the reinforcement learning in this embodiment can be executed using the proximal policy optimization (PPO) algorithm, or other deep reinforcement learning proximal policy optimization variants. This embodiment does not specifically limit the specific implementation of this algorithm.
[0070] S40. Use the grading network model to determine the diabetic retinopathy grade of the fundus image to be graded.
[0071] Specifically, after obtaining the grading network model, the model is used to predict the diabetic retinopathy level of the fundus image to be graded. In other words, after obtaining the grading network model, the fundus image to be graded is input into the model, and the model outputs the diabetic retinopathy level.
[0072] In summary, this embodiment provides a grading method for diabetic retinopathy based on the above-mentioned method. This method introduces a strictly monotonic grading bias through ordered inductive feature learning, directly embedding the asymptotic manifold of DR severity into the latent space to obtain image features. Then, it uses this image to determine the prediction probability vector and prediction vector, and optimizes the initial grading network model based on these vectors to enforce probabilistic consistency of severity thresholds. Finally, it introduces an asymmetric penalty for high-risk misclassification into the reward used in the reinforcement learning process, and determines the sampled prediction probability vector set by sampling the prediction probability vector, thereby forcing the policy network to explore adjacent levels and expose unsafe decision boundaries, thus improving accuracy while ensuring safety.
[0073] To further illustrate the effectiveness of the diabetic retinopathy grading method provided in this application, this application compares the diabetic retinopathy grading method provided in this application (denoted as RAO-RL) with five other diabetic retinopathy grading methods on four diabetic retinopathy datasets. The comparison results are shown in Table 1. The five diabetic retinopathy grading methods are Triple-DRNet, CLIP-DR, AOR-DR, CSFHANet, and LECNet. The four diabetic retinopathy datasets are APTOS, DDR, Messidor-2, and EyePACS. The evaluation metrics are QWK (Quadratic Weighted Kappa), F1 (F1-Score), and ACC (Accuracy).
[0074] Table 1 Comparison Results
[0075] As shown in Table 1, RAO-RL achieved the best results in QWK, F1, and ACC. The improvement in QWK was particularly significant, with gains of +3.1% (APTOS), +5.7% (DDR), +5.0% (Messidor-2), and +5.6% (EyePACS) across the various datasets compared to the second-best method. The improvements in F1 ranged from +2.0% to +8.5% across the datasets, and the improvements in ACC ranged from +3.4% to +6.1% across the datasets. Furthermore, the improvements in F1 and ACC were consistent with the gains in QWK. Therefore, RAO-RL demonstrates a reliable improvement over existing methods, supporting its effectiveness in DR grading in real-world clinical scenarios.
[0076] Furthermore, to assess the clinical safety of RAO-RL, this application provides evaluation results of RAO-RL and five existing methods at various grades of diabetic retinopathy, as shown in the embodiments below. Figure 4 As shown. By Figure 4 As shown in Figure (a), RAO-RL exhibits the highest correct classification rate and high discriminative power in APTOS, particularly for diabetic retinopathy grades 1 to 4. Furthermore, RAO-RL significantly reduces the misclassification rate for the most clinically hazardous distant categories, ensuring that most misclassified samples still belong to adjacent categories, making it more acceptable in screening scenarios.
[0077] In assessing the clinical safety of RAO-RL, this application also used absolute underestimation rate (AUR) and weighted underestimation rate (WUR) to evaluate diabetic retinopathy grades 1 to 4. Figure 4 As shown in Figure (b), the RAO-RL model consistently maintains an extremely low error rate. Even in cases of grade 4 diabetic retinopathy, the RAO-RL model significantly reduces the WUR value, indicating that even when misclassified cases occur, they are primarily classified as adjacent grade 3 diabetic retinopathy cases, rather than being incorrectly identified as normal or mild cases.
[0078] Based on the above-mentioned grading method for diabetic retinopathy, this embodiment provides a grading device for diabetic retinopathy, such as... Figure 5 As shown, the grading device for diabetic retinopathy specifically includes: The construction module 100 is used to obtain an image patch embedding feature set corresponding to the training fundus image through an initial hierarchical network model, and add ordered representations to the image patch embedding features in the image patch embedding feature set to obtain image features corresponding to the training fundus image; determine the prediction probability vector corresponding to the training fundus image based on the image features; determine the prediction vector of the cumulative probability space corresponding to the fundus image based on the image features through a preset classifier set, and pre-train the initial hierarchical network model based on the prediction probability vector and the prediction vector to obtain a pre-trained hierarchical network model; use the pre-trained hierarchical network model as a policy network, and perform reinforcement learning on the policy network to obtain a hierarchical network model; wherein, the ordered representation is used to embed the progressive manifold of the severity of diabetic retinopathy into the latent space; the prediction vector of the cumulative probability space is used to learn the ordered progressive relationship of the diabetic retinopathy level, and the rewards used in reinforcement learning include accuracy rewards, ordered consistency rewards, and asymmetric safety penalty rewards; The control module 200 is used to determine the diabetic retinopathy grade of the fundus image to be graded using the hierarchical network model.
[0079] Based on the above-described grading method for diabetic retinopathy, this embodiment provides a computer-readable storage medium storing one or more programs that can be executed by one or more processors to implement the steps in the grading method for diabetic retinopathy as described in the above embodiment.
[0080] Based on the above-mentioned grading method for diabetic retinopathy, this application also provides a terminal device, such as... Figure 6As shown, it includes at least one processor 20; a display screen 21; and a memory 22, and may also include a communications interface 23 and a bus 24. The processor 20, display screen 21, memory 22, and communications interface 23 can communicate with each other via the bus 24. The display screen 21 is configured to display a preset user guide interface in the initial setup mode. The communications interface 23 can transmit information. The processor 20 can invoke logical instructions in the memory 22 to execute the methods described in the above embodiments.
[0081] Furthermore, the logical instructions in the aforementioned memory 22 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium.
[0082] The memory 22, as a computer-readable storage medium, can be configured to store software programs, computer-executable programs, such as program instructions or modules corresponding to the methods in the embodiments of this disclosure. The processor 20 executes functional applications and data processing by running the software programs, instructions, or modules stored in the memory 22, thereby implementing the methods in the above embodiments.
[0083] The memory 22 may include a program storage area and a data storage area. The program storage area may store the operating system and application programs required for at least one function; the data storage area may store data created based on the use of the terminal device. Furthermore, the memory 22 may include high-speed random access memory (RAM) and non-volatile memory. Examples include various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks, as well as transient storage media.
[0084] Furthermore, the specific process of loading and executing multiple instruction processors in the aforementioned storage medium and terminal device has been described in detail in the above method, and will not be repeated here.
[0085] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application.
Claims
1. A grading method for diabetic retinopathy, characterized in that, The aforementioned grading method for diabetic retinopathy specifically includes: The image patch embedding feature set corresponding to the training fundus image is obtained through an initial hierarchical network model, and ordered representations are added to the image patch embedding features in the image patch embedding feature set to obtain the image features corresponding to the training fundus image. The ordered representations are used to embed the progressive manifold of the severity of diabetic retinopathy into the latent space. Based on the image features, determine the prediction probability vector corresponding to the training fundus image; Based on the image features, a prediction vector for the cumulative probability space corresponding to the fundus image is determined using a preset classifier set; The initial hierarchical network model is pre-trained based on the predicted probability vector and the predicted vector to obtain a pre-trained hierarchical network model, wherein the predicted vector in the cumulative probability space is used to learn the ordered progressive relationship of diabetic retinopathy levels. The pre-trained hierarchical network model is used as a policy network, and reinforcement learning is performed on the policy network to obtain a hierarchical network model. The rewards used in reinforcement learning include accuracy rewards, ordered consistency rewards, and asymmetric security penalty rewards. The graded network model is used to determine the diabetic retinopathy level of the fundus images to be graded.
2. The grading method for diabetic retinopathy according to claim 1, characterized in that, The initial hierarchical network model includes a visual feature extraction module and an ordered inductive representation learning module; the specific steps of obtaining image features corresponding to the training fundus images through the initial hierarchical network model include: The visual feature extraction module is used to extract the image patch embedding feature set corresponding to the training fundus image; The ordered inductive representation learning module is used to determine the grade bias of each diabetic retinopathy grade based on the learnable interval, wherein the grade biases of each diabetic retinopathy grade satisfy the monotonicity constraint. For each image patch embedding feature in the image patch embedding feature set, the ordered inductive representation learning module determines the attention weight of the image patch embedding feature in each diabetic retinopathy level based on the rank bias; The ordered inductive representation learning module uses the attention weights to fuse all image patch embedding features in the image patch embedding feature set to determine the image features corresponding to the training fundus image.
3. The grading method for diabetic retinopathy according to claim 2, characterized in that, The step of using the ordered inductive representation learning module to fuse all image patch embedding features in the image patch embedding feature set based on the attention weights to determine the image features corresponding to the training fundus image specifically includes: For each diabetic retinopathy level, the ordered inductive representation learning module uses the attention weight of each image patch embedding feature at the diabetic retinopathy level to weight all image patch embedding features in the image patch embedding feature set to obtain the layer embedding feature corresponding to the diabetic retinopathy level. The mean of the layer embedding features corresponding to all diabetic retinopathy levels, the maximum layer embedding feature among all layer embedding features corresponding to all diabetic retinopathy levels, and the learnable class vector are concatenated to obtain the image features corresponding to the training fundus image.
4. The grading method for diabetic retinopathy according to claim 1, characterized in that, The reinforcement learning of the policy network to obtain the hierarchical network model specifically includes: The policy network determines the prediction probability vector corresponding to the training fundus image, and samples the prediction probability vector to generate a sampled prediction probability vector group. Obtain the reward for each sampled prediction probability vector in the sampled prediction probability vector group, and use the reward to determine the ordered risk advantage of each sampled prediction probability vector; An optimization objective is constructed by leveraging the advantages of ordered risk, and the policy network is optimized using the optimization objective to obtain a hierarchical network model.
5. The grading method for diabetic retinopathy according to claim 4, characterized in that, The reward for obtaining each sampled prediction probability vector in the sampled prediction probability vector group specifically includes: For each sampling prediction probability vector in the sampling prediction probability vector group, determine the sampling prediction level corresponding to the sampling prediction probability vector; The accuracy reward is determined when the sampled prediction level is the same as the level label corresponding to the training fundus image; The ordered consistency reward is determined by the difference between the sampled prediction level and the level label. The asymmetric security penalty reward is determined by using the level difference and the penalty item that the sampled predicted level is less than the level label; The reward for the sampling prediction probability vector is determined based on the accuracy reward, the ordered consistency reward, and the asymmetric security penalty reward.
6. The grading method for diabetic retinopathy according to claim 4, characterized in that, The method of using the reward to determine the ordered risk advantage of each sampled prediction probability vector specifically includes: The mean and standard deviation of the reward for the sampled prediction probability vector group are obtained based on the rewards of all sampled prediction probability vectors. The reward difference for each sampled prediction probability vector is determined by using the reward of each sampled prediction probability vector and the mean of the rewards, and the ordered risk advantage of each sampled prediction probability vector is determined by using the reward difference of each sampled prediction probability vector and the standard deviation of the rewards.
7. The grading method for diabetic retinopathy according to claim 4, characterized in that, The optimization objective is expressed as: , in, Indicates the optimization objective. Indicates the first Zhang training fundus images, Indicates the first Each sampled prediction probability vector This represents the ratio of the probability of the policy network at the current time step to that of the policy network at the previous time step. This indicates an advantage in orderly risk management. This represents the shear hyperparameter. Indicates the regularization strength. Represents the policy network, Represents the reference policy network. Represents the divergence function. This represents the truncation function. Represents the expectation operator. This represents the function that takes the minimum value.
8. A grading device for diabetic retinopathy, characterized in that, The aforementioned grading device for diabetic retinopathy specifically includes: A construction module is used to obtain an image patch embedding feature set corresponding to the training fundus image through an initial hierarchical network model, and add ordered representations to the image patch embedding features in the image patch embedding feature set to obtain image features corresponding to the training fundus image; determine the prediction probability vector corresponding to the training fundus image based on the image features; determine the prediction vector of the cumulative probability space corresponding to the fundus image based on the image features through a preset classifier set, and pre-train the initial hierarchical network model based on the prediction probability vector and the prediction vector to obtain a pre-trained hierarchical network model; use the pre-trained hierarchical network model as a policy network, and perform reinforcement learning on the policy network to obtain a hierarchical network model; wherein, the ordered representation is used to embed the progressive manifold of the severity of diabetic retinopathy into the latent space; the prediction vector of the cumulative probability space is used to learn the ordered progressive relationship of the diabetic retinopathy level, and the rewards used in reinforcement learning include accuracy rewards, ordered consistency rewards, and asymmetric safety penalty rewards; A control module is used to determine the diabetic retinopathy grade of the fundus image to be graded using the hierarchical network model.
9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores one or more programs, which can be executed by one or more processors to implement the steps in the grading method for diabetic retinopathy as described in any one of claims 1-7.
10. A terminal device, characterized in that, include: Processor and memory; The memory stores a computer-readable program that can be executed by the processor; When the processor executes the computer-readable program, it implements the steps in the grading method for diabetic retinopathy as described in any one of claims 1-7.