Quality control method and system for data labeling of fundus images

By standardizing fundus images and evaluating the self-consistency and gold standard consistency of multi-doctor annotation results, the problem of low annotation accuracy of fundus image data was solved, achieving higher annotation accuracy and meeting the training requirements of machine learning models.

CN114693587BActive Publication Date: 2026-06-16SHENZHEN SIBRIGHT TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHENZHEN SIBRIGHT TECH CO LTD
Filing Date
2020-12-28
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

In existing technologies, the accuracy of data annotation results for fundus images is low, making it difficult to meet the training requirements of machine learning models.

Method used

By acquiring multiple fundus images, performing standardized processing, and conducting preliminary screening, a target fundus image set is prepared. Self-consistency and gold standard consistency are calculated using the annotation results of multiple annotating doctors. Doctor annotation results that meet the preset conditions are obtained, and the annotation accuracy is improved by summarizing the results.

🎯Benefits of technology

This improved the accuracy of fundus image data annotation, ensured the quality of training data, and met the training requirements of machine learning models.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN114693587B_ABST
    Figure CN114693587B_ABST
Patent Text Reader

Abstract

The disclosure describes a quality control method for data labeling of fundus images. It includes obtaining multiple fundus images; performing standardization processing on each fundus image to obtain multiple standardized fundus images; performing preliminary screening on the quality of each standardized fundus image to obtain multiple qualified fundus images; preparing a target fundus image set; labeling each image in the target fundus image set by multiple first labeling doctors to obtain multiple sets of doctor labeling results; calculating the self-consistency and gold standard consistency of the corresponding first labeling doctor based on the doctor labeling results, and obtaining the doctor labeling results of the first labeling doctor meeting the preset condition as the target labeling result; and summarizing multiple sets of target labeling results to obtain the final labeling result. According to the disclosure, a quality control method and system for data labeling of fundus images with high accuracy can be provided.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to a quality control method and a quality control system for data annotation of fundus images. Background Technology

[0002] With the development of artificial intelligence technology, supervised learning techniques based on machine learning are being applied to more and more fields, especially in the field of medical imaging, where they have achieved great success. In supervised learning, machine learning models are trained using a training set consisting of training data (e.g., fundus images) and the labeled results of the training data (e.g., diabetic retinopathy staging). Therefore, the quality of the data labeling in the training data is crucial to the training of the model.

[0003] Currently, to improve the accuracy of training data annotation, professional annotators, such as ophthalmologists, are often involved in labeling the training data, and quality control methods are used to monitor the annotation results. For example, literature (CN110991486A) discloses a method for quality control of collaborative image annotation. This method incorporates gold standard data into the annotation package at a preset ratio to verify the annotation quality of any annotation user for any given package. In the multi-user fitting step, an image is distributed to multiple users, and their annotation results are collected. After obtaining duplicate labels, the true labels are obtained. However, the accuracy of training data annotation still needs improvement. Summary of the Invention

[0004] This disclosure is made in view of the above-mentioned situation, and its purpose is to provide a quality control method and quality control system for data annotation of fundus images with high accuracy.

[0005] To this end, the first aspect of this disclosure provides a quality control method for data annotation of fundus images, comprising: acquiring multiple fundus images; performing standardization processing on each of the fundus images to obtain multiple standardized fundus images; performing preliminary screening on the quality of each of the standardized fundus images to obtain multiple qualified fundus images; preparing a target fundus image set, the target fundus image set comprising a dataset to be calibrated including the multiple qualified fundus images, a gold standard dataset comprising a first preset number of gold standard fundus images with known correct annotation results, and a self-consistency judgment dataset consisting of at least one image from the dataset to be calibrated, wherein each image in the target fundus image set is a target fundus image; having multiple first annotation doctors annotate each image in the target fundus image set to obtain multiple sets of doctor annotation results, the doctor annotation results including at least one judgment result, the judgment result including at least disease information of no obvious abnormalities or one disease; based on the... The self-consistency and gold standard consistency of the doctor's annotation results are calculated to obtain the doctor's annotation results of the first annotator that meet the preset conditions as the target annotation results. The self-consistency is obtained by evaluating two sets of annotation results: one set of doctor's annotation results for each image in the self-consistency judgment dataset and the other set of doctor's annotation results for images in the unlabeled dataset that overlap with each image in the self-consistency judgment dataset, using either set as the first set and the other as the second set. The gold standard consistency is obtained by evaluating the correct annotation results in the gold standard dataset as the first set and the doctor's annotation results for each image in the gold standard dataset as the second set. Multiple sets of target annotation results are then summarized to obtain the final annotation result. In this case, the doctor's annotation results of the first annotator that meet the preset conditions can be obtained based on the gold standard dataset and the self-consistency judgment dataset as the target annotation results and summarized. This improves the accuracy of data annotation for fundus images.

[0006] Furthermore, in the quality control method according to the first aspect of this disclosure, optionally, the preset condition is that the self-consistency is greater than the self-consistency threshold and the gold standard consistency is greater than the gold standard consistency threshold. Thus, the preset condition can be determined based on the self-consistency threshold and the gold standard consistency threshold.

[0007] Furthermore, in the quality control method disclosed in the first aspect, optionally, the doctor annotation results of the first annotating doctor that do not meet the preset conditions are re-annotated by a second annotating doctor on each image in the target fundus image set until a doctor annotation result that meets the preset conditions is obtained as the target annotation result. Thus, the target annotation result can be obtained.

[0008] Furthermore, in the quality control method disclosed in the first aspect, optionally, the self-consistency judgment method involves calculating the disease self-consistency of each first-labeled physician's diagnosis of each of the stated diseases using a quadratic weighted kappa coefficient, and weighting the self-consistency of each of the stated diseases to calculate the self-consistency of each first-labeled physician; the gold standard consistency judgment method involves calculating the disease gold standard consistency of each first-labeled physician's diagnosis of each of the stated diseases using a quadratic weighted kappa coefficient, and weighting the gold standard consistency of each of the stated diseases to calculate the gold standard consistency of each first-labeled physician. Thus, it is possible to calculate the self-consistency of each first-labeled physician based on the self-consistency judgment method and the gold standard consistency of each first-labeled physician based on the gold standard consistency judgment method.

[0009] Additionally, in the quality control method disclosed in the first aspect, optionally, the quadratic weighted kappa coefficient κ is... Among them, W ij X represents the second-order weighting coefficient. ij E represents the number of target fundus images for which the judgment result in the first group of annotation results is i and the judgment result in the second group of annotation results is j. ij This represents the expected number of target fundus images for which the judgment result in the first group of annotation results is i and the judgment result in the second group of annotation results is j. Therefore, a consistency check can be performed on the first group of annotation results and the second group of annotation results.

[0010] Furthermore, in the quality control method according to the first aspect of this disclosure, optionally, the self-consistency and gold standard consistency of physicians labeled with different thresholds are analyzed, and the self-consistency threshold and the gold standard consistency threshold are determined by anomaly detection. Thus, the self-consistency threshold and the gold standard consistency threshold can be determined.

[0011] Furthermore, in the quality control method according to the first aspect of this disclosure, optionally, the anomaly detection method involves obtaining the target self-consistency of doctors labeled with different thresholds and calculating the mean self-consistency μ0 and the variance of self-consistency σ0. Under the assumption that the target self-consistency follows a Gaussian distribution, the self-consistency threshold is μ0 - 1.96 × σ0. Additionally, the method involves obtaining the target gold standard consistency of doctors labeled with different thresholds and calculating the mean gold standard consistency μ1 and the variance of gold standard consistency σ1. Under the assumption that the target gold standard consistency follows a Gaussian distribution, the gold standard consistency threshold is μ1 - 1.96 × σ1. Thus, the self-consistency threshold and the gold standard consistency threshold can be determined.

[0012] Furthermore, in the quality control method disclosed in the first aspect, optionally, the summarization involves comparing the annotation results of each target fundus image in multiple sets of target annotation results using an absolute majority voting method to determine the final annotation result of each target fundus image. If the final annotation result cannot be determined, the target fundus image is marked as a difficult fundus image. The difficult fundus images are then annotated and arbitrated to obtain the final annotation result. Thus, the final annotation result can be obtained based on the absolute majority voting method.

[0013] In addition, in the quality control method according to the first aspect of this disclosure, optionally, the summarization involves comparing the annotation results of each target fundus image in multiple sets of target annotation results. If the annotation results are consistent, the annotation result is taken as the final annotation result of the target fundus image. If the annotation results are inconsistent, if multiple annotation results simultaneously include the same judgment result and only one annotation result includes a judgment result not identified in other annotation results, then the target fundus image is marked as a fundus image to be quality controlled; otherwise, the target fundus image is marked as a difficult fundus image. Quality control is performed on the fundus image to be quality controlled, and the final annotation result is obtained. The difficult fundus image is also annotated and arbitrated to obtain the final annotation result. In this case, by comparing the annotation results of each target fundus image in multiple sets of target annotation results, the target fundus images can be divided into target fundus images with final annotation results, fundus images to be quality controlled, and difficult fundus images, and the final annotation result can be obtained.

[0014] Furthermore, in the quality control method according to the first aspect of this disclosure, optionally, the fundus image to be quality controlled is subjected to quality control. If it is determined that the unidentified determination result does not exist, then the same determination result is used as the final annotation result. If it is determined that the unidentified determination result exists, then the fundus image to be quality controlled is marked as a difficult fundus image, and the difficult fundus image is annotated and arbitrated to obtain the final annotation result. Thus, it is possible to divide the fundus image to be quality controlled into fundus images with final annotation results and difficult fundus images, and obtain the final annotation result.

[0015] Furthermore, in the quality control method disclosed in the first aspect, optionally, an arbitrator physician may annotate and arbitrate the complex fundus images to obtain the final annotation result. This allows for the acquisition of the final annotation result for the complex fundus images.

[0016] Furthermore, in the quality control method disclosed in the first aspect, the standardization process optionally includes at least one of the following: classifying the fundus images by patient, standardizing the naming format of the fundus images, filtering out non-fundus images, standardizing the image format of the fundus images, and standardizing the background of the fundus images. This enables the standardization of fundus images.

[0017] Furthermore, in the quality control method according to the first aspect of this disclosure, optionally, the preliminary screening includes dividing the standardized fundus images into at least two image quality levels: qualified and unqualified, wherein the qualified fundus images are the standardized fundus images with a qualified image quality level. Thus, the quality of standardized fundus images can be preliminarily screened to quickly obtain qualified fundus images.

[0018] Furthermore, in the quality control method disclosed in the first aspect, optionally, the annotation process further categorizes each image in the target fundus image set into five image quality levels: very good, good, average, poor, and extremely poor. In this case, the final annotation result can be determined by combining more detailed image quality levels.

[0019] Furthermore, in the quality control method disclosed in the first aspect, optionally, the disease includes at least one of diabetic retinopathy, hypertensive retinopathy, glaucoma, retinal vein occlusion, retinal artery occlusion, age-related macular degeneration, high myopic macular degeneration, retinal detachment, optic nerve disease, and congenital optic disc developmental abnormalities. Thus, at least one disease can be labeled.

[0020] Additionally, in the quality control method disclosed in the first aspect, optionally, the preset condition is d. self ≤D and d gold ≤D, where dself For the self-evaluation index based on the aforementioned self-consistency, d gold The gold standard evaluation index is based on the gold standard consistency, and D is the evaluation index threshold; the self-evaluation index d self Satisfying the formula: d self =|J self -κ self | / κ self ×100%, where J self =SE self +SP self -1, SE self For the sensitivity of the first annotating physician obtained based on two sets of annotation results used to assess the self-consistency, SP self κ represents the specificity of the first annotating physician obtained based on two sets of annotation results used to assess the self-consistency. self The self-consistency of the first labeled physician; the gold standard evaluation index d gold Satisfying the formula: d gold =|J gold- κ gold | / κ gold ×100%, where J gold =SE gold +SP gold -1, SE gold For the sensitivity of the first annotating physician obtained based on two sets of annotation results used to assess the consistency of the gold standard, SP gold κ represents the specificity of the first annotating physician obtained based on two sets of annotation results used to assess the consistency of the gold standard. gold This ensures consistency with the gold standard stated by the first labeled physician. Therefore, preset conditions can be determined based on evaluation index thresholds.

[0021] The second aspect of this disclosure provides a quality control system for data annotation of fundus images, comprising: an acquisition module for acquiring multiple fundus images; a standardization processing module for standardizing each of the fundus images to obtain multiple standardized fundus images; a preliminary screening module for performing preliminary screening on the quality of each of the standardized fundus images to obtain multiple qualified fundus images; a data preparation module for preparing a target fundus image set, the target fundus image set including a dataset to be calibrated comprising the multiple qualified fundus images, a gold standard dataset including a first preset number of gold standard fundus images with known correct annotation results, and a self-consistency judgment dataset consisting of at least one image from the dataset to be calibrated, wherein each image in the target fundus image set is a target fundus image; and an annotation module for acquiring multiple sets of doctor annotation results from multiple first annotation doctors who annotate each image in the target fundus image set, the doctor annotation results including at least one judgment result, the judgment result including at least no obvious abnormalities. The system can provide disease information for a specific disease. An evaluation module calculates the self-consistency and gold standard consistency of the corresponding first annotating doctor based on the doctor's annotation results. The first annotating doctor's annotation results meeting preset conditions are used as the target annotation results. Self-consistency is obtained by evaluating two sets of annotation results: one set of doctor's annotation results for each image in the self-consistency judgment dataset and the other set of doctor's annotation results for images in the unlabeled dataset that overlap with each image in the self-consistency judgment dataset. Gold standard consistency is obtained by evaluating the correct annotation results in the gold standard dataset as the first set of annotation results and the doctor's annotation results for each image in the gold standard dataset as the second set of annotation results. A summarization module summarizes multiple sets of target annotation results to obtain the final annotation result. In this case, the first annotating doctor's annotation results meeting preset conditions can be obtained based on the gold standard dataset and the self-consistency judgment dataset as the target annotation results and summarized. This improves the accuracy of data annotation for fundus images.

[0022] According to this disclosure, a quality control method and quality control system for data annotation of fundus images with high accuracy can be provided. Attached Figure Description

[0023] This disclosure will now be explained in further detail by way of example only with reference to the accompanying drawings, in which:

[0024] Figure 1This is a diagram illustrating a use case of the quality control method for data annotation of fundus images involved in the examples of this disclosure.

[0025] Figure 2 This is a flowchart illustrating a quality control method for data annotation of fundus images as described in this disclosure.

[0026] Figure 3 This is a block diagram illustrating the target fundus atlas involved in the examples of this disclosure.

[0027] Figure 4 This is a flowchart illustrating the determination of the self-consistency threshold involved in the examples of this disclosure.

[0028] Figure 5 This is a statistical chart illustrating the self-consistency of the objectives and the consistency of the gold standard of the objectives involved in the examples disclosed herein.

[0029] Figure 6 This is a flowchart illustrating the summarization method involved in the examples of this disclosure.

[0030] Figure 7 This is a flowchart illustrating the process of performing quality control on fundus images to be quality controlled and obtaining the final annotation results, as described in this disclosure example.

[0031] Figure 8 This is a block diagram illustrating a quality control system for data annotation of fundus images as described in this disclosure example.

[0032] Explanation of main labels:

[0033] 100…Use scenario, 110…Human eye, 120…Acquisition device, 130…Doctor annotation results, 140…Target annotation results, 150…Final annotation results, A…First annotating doctor, A1…First annotating doctor, A2…First annotating doctor, A3…First annotating doctor, B…Second annotating doctor, 200…Target fundus image set, 210…Data set to be calibrated, 220…Gold standard dataset, 230…Self-consistency judgment dataset, D1…First region, D2…Second region, D3…Third region, D4…Fourth region, 300…Quality control system, 310…Acquisition module, 320…Standardization module, 330…Initial screening module, 340…Data preparation module, 350…Annotation module, 360…Evaluation module, 370…Summary module. Detailed Implementation

[0034] The preferred embodiments of this disclosure are described in detail below with reference to the accompanying drawings. In the following description, the same reference numerals are used for the same components, and repeated descriptions are omitted. Furthermore, the drawings are merely schematic diagrams, and the proportions of the components or the shapes of the components may differ from actual figures. It should be noted that the terms "comprising" and "having," and any variations thereof, in this disclosure, do not necessarily limit the process, method, system, product, or apparatus to the explicitly listed steps or units, but may include or have other steps or units not explicitly listed or inherent to these processes, methods, products, or apparatuses. All methods described in this disclosure may be performed in any suitable order unless otherwise indicated herein or clearly contradicted by the context.

[0035] Figure 1 This diagram illustrates a use case of the quality control method for data annotation of fundus images according to examples of this disclosure. In some examples, the quality control method for data annotation of fundus images according to this disclosure (sometimes simply referred to as the quality control method) can be applied to, for example... Figure 1 In the illustrated use case 100, firstly, multiple first annotation doctors A, such as three first annotation doctors A, can annotate multiple fundus images of multiple human eyes 110 to obtain doctor annotation results 130 (described later). Secondly, the doctor annotation results 130 that meet preset conditions (described later) can be used as target annotation results 140 (described later). For example, such as... Figure 1 As shown, assuming that the annotation results 130 of the first annotating physicians A2 and A3 meet the requirements of self-consistency and gold standard consistency, the annotation results 130 of the first annotating physicians A2 and A3 can be used as the target annotation results 140. Finally, the target annotation results 140 can be summarized to obtain the final annotation result 150 (described later).

[0036] In some examples, the fundus of human eye 110 refers to the tissues in the posterior part of the eyeball, which may include the inner membrane of the eyeball, the retina, the macula, and blood vessels. In some examples, the fundus image of human eye 110 may be acquired by acquisition device 120. In some examples, acquisition device 120 may be, but is not limited to, a camera. The camera may be, for example, a color fundus camera.

[0037] In some examples, doctor annotation results 130 that do not meet the preset conditions can be re-annotated by a second annotator B to finally obtain the target annotation result 140. However, the examples disclosed herein are not limited to this; in other examples, doctor annotation results 130 that do not meet the preset conditions can be filtered out. In some examples, there can be one or more second annotators B. In some examples, if the doctor annotation result 130 re-annotated by the second annotator B does not meet the preset conditions, another second annotator B can continue to re-annotate until a doctor annotation result 130 that meets the preset conditions is obtained as the target annotation result 140. In some examples, the second annotator B may be different from the first annotator A. In some examples, the second annotator B and the first annotator A may, but are not limited to, professional ophthalmologists or experienced physicians.

[0038] The quality control methods involved in this disclosure are described in detail below with reference to the accompanying drawings. Figure 2 This is a flowchart illustrating a quality control method for data annotation of fundus images as described in this disclosure. In some examples, such as Figure 2 As shown, the quality control method may include acquiring multiple fundus images (step S110), standardizing each fundus image to obtain multiple standardized fundus images (step S120), performing preliminary screening on the quality of each standardized fundus image to obtain multiple qualified fundus images (step S130), and preparing a dataset to be calibrated (see...). Figure 3 The process involves obtaining a target fundus image set from the gold standard dataset and the self-consistency judgment dataset (step S140). Multiple first-annotation doctors annotate each image in the target fundus image set to obtain multiple sets of doctor annotation results (step S150). Based on these multiple sets of doctor annotation results that meet preset conditions, multiple sets of target annotation results are obtained (step S160), and these multiple sets of target annotation results are then summarized to obtain the final annotation result (step S170). In this case, the doctor annotation results from the first-annotation doctors that meet preset conditions, based on the gold standard dataset and the self-consistency judgment dataset, can be used as the target annotation results and then summarized. This improves the accuracy of fundus image data annotation.

[0039] In some examples, multiple fundus images can be acquired in step S110. In some examples, the fundus images can be color fundus images. Color fundus images can clearly present rich fundus information such as the optic disc, optic cup, macula, and blood vessels. Additionally, the fundus images can be images in RGB or grayscale mode, etc. In some examples, the fundus images can be fundus images acquired by the acquisition device 120. In other examples, the fundus images can be images pre-stored on a server. In some examples, the multiple fundus images can be, for example, 50,000 to 200,000 fundus images from cooperating hospitals after removing patient information.

[0040] In some examples, in step S120, the fundus images may be standardized to obtain multiple standardized fundus images. In some examples, the standardization process may include at least one of the following: segmenting fundus images by patient, standardizing the naming format of fundus images, filtering out non-fundus images, standardizing the image format of fundus images, and standardizing the background of fundus images. Additionally, in some examples, non-fundus images may include, but are not limited to, fundus mosaic images or anterior segment images. In some examples, non-fundus images may be images other than 45-degree fundus images centered on the optic disc and macula. Additionally, in some examples, the naming format of fundus images may be standardized, for example, by removing patient information from the fundus image names and standardizing the names. Additionally, in some examples, the names of fundus images may be converted to hash values. Additionally, in some examples, the image format of fundus images may be standardized (e.g., JPG format). Additionally, in some examples, the background of fundus images may be standardized (e.g., the fundus images may be uniformly converted to a black background).

[0041] In some examples, in step S130, the quality of each standardized fundus image can be initially screened to obtain multiple qualified fundus images.

[0042] In some examples, preliminary screening may include classifying standardized fundus images into at least two quality levels: acceptable and unacceptable. This allows for preliminary screening of the quality of standardized fundus images to quickly obtain acceptable fundus images.

[0043] In some examples, the quality of standardized fundus images can be assessed by multiple first-annotation physicians (A) to classify them into various image quality levels. In some examples, standardized fundus images can be graded based on factors affecting their quality. In some examples, factors affecting fundus image quality may include, but are not limited to, at least one of the following: the image capture location, the exposure of the fundus image, and the sharpness of the fundus image. For example, a standardized fundus image with a satisfactory quality level might be an image captured at the correct location, with moderate exposure, and good sharpness. In this case, the quality of the standardized fundus images is graded. This facilitates the acquisition of satisfactory fundus images.

[0044] However, the examples disclosed herein are not limited to this. In other examples, standardized fundus images can be further subdivided during the initial screening. For example, standardized fundus images can be divided into at least five image quality levels. In some examples, the five image quality levels may include very good, good, fair, poor, and very poor. In some examples, the image quality level may also include abnormalities in the imaged area (e.g., non-fundus images), no image, image acquisition technology problems, and other issues that prevent interpretation. In some examples, the image quality level may be acceptable, barely acceptable, and unacceptable.

[0045] In some examples, multiple qualified fundus images can be acquired in step S130. In some examples, the qualified fundus images can be standardized fundus images with a qualified image quality level. However, the examples of this disclosure are not limited to this; in other examples, the qualified fundus images can be standardized fundus images with image quality levels of very good, good, fair, and poor. Qualified fundus images can be standardized fundus images with image quality levels of very good, good, and fair, or they can be standardized fundus images with image quality levels of very good and good. In other examples, qualified fundus images can be standardized fundus images with image quality levels of acceptable and barely acceptable. Thus, qualified fundus images can be acquired.

[0046] Figure 3 This is a block diagram illustrating the target fundus atlas involved in the example of this disclosure. As described above, the quality control method may include step S140 (see...). Figure 2 In some examples, in step S140, a target fundus image atlas 200 may be prepared, comprising a dataset to be calibrated 210, a gold standard dataset 220, and a self-consistency judgment dataset 230. For example... Figure 3 As shown, in some examples, the target fundus atlas 200 may include a dataset to be calibrated 210, a gold standard dataset 220, and a self-consistency judgment dataset 230.

[0047] In some examples, the dataset 210 to be calibrated may include multiple qualified fundus images. In some examples, the dataset 210 to be calibrated may include all qualified fundus images obtained in step S130. In some examples, the dataset 210 to be calibrated may include a subset of qualified fundus images obtained in step S130. In some examples, all qualified fundus images obtained in step S130 may be grouped and each group of fundus images may be considered as a dataset 210 to be calibrated. For example, the qualified fundus images obtained in step S130 may be grouped into sets of 80, 90, or 100 images.

[0048] Additionally, in some examples, the gold standard dataset 220 may include a first preset number of gold standard fundus images. The gold standard fundus images may be fundus images with known correct annotations. In some examples, the gold standard fundus images may be fundus images with known correct annotations from an annotation database. In some examples, the first preset number may be 5 to 20. For example, the first preset number may be 5, 10, 15, or 20, etc. However, the examples in this disclosure are not limited to these, and in other examples, the first preset number may be other values.

[0049] Additionally, in some examples, the self-consistency determination dataset 230 may consist of images from the dataset 210 to be calibrated. In some examples, the number of images in the self-consistency determination dataset 230 may be at least one. In some examples, the number of images in the self-consistency determination dataset 230 may be 5 to 20. For example, the number of images in the self-consistency determination dataset 230 may be 5, 10, 15, or 20, etc. However, the examples of this disclosure are not limited to this, and in other examples, the number of images in the self-consistency determination dataset 230 may be other values. In some examples, the number of images in the self-consistency determination dataset 230 may be less than the number of images in the dataset 210 to be calibrated. Thus, the images in the self-consistency determination dataset 230 may overlap with some images in the dataset 210 to be calibrated. Additionally, in some examples, each image of the target fundus atlas 200 may serve as a target fundus image.

[0050] In some examples, in step S150, multiple first annotation doctors A can annotate each image in the target fundus image set 200 to obtain multiple sets of doctor annotation results 130. For example, assuming three first annotation doctors A annotate the target fundus image set 200 respectively, the three first annotation doctors A can obtain three sets of doctor annotation results 130. In some examples, multiple first annotation doctors A can use an online annotation system to annotate each image in the target fundus image set 200. In some examples, the number of first annotation doctors A can be greater than or equal to three. For example, the number of first annotation doctors A can be 3, 5, 7, or 9, etc.

[0051] In some examples, the physician annotation result 130 for each image in the target fundus image atlas 200 may include at least one determination result. In some examples, the determination result may include disease information such as no obvious abnormalities or a disease. In some examples, if an image in the target fundus image atlas 200 does not contain any disease, the physician annotation result 130 for that image may be "no obvious abnormalities." In some examples, the physician annotation result 130 may be a determination result for multiple diseases. For example, the physician annotation result 130 may be diabetic retinopathy stage I and the presence of glaucoma.

[0052] In some examples, the doctor's annotation result 130 may include the image quality level and eye (e.g., left or right eye) of each image in the target fundus image set 200. In some examples, if the quality of each standardized fundus image was not further refined during the initial screening, the first annotator A may refine the quality of each image in the target fundus image set 200 during the annotation process. In other examples, if the quality of each standardized fundus image was further refined during the initial screening, the first annotator A may further refine the quality of each image in the target fundus image set 200 during the annotation process. See the description of refining the standardized fundus images for details. In this case, the final annotation result 150 can be determined by combining more detailed image quality levels. In some examples, the image quality level in the doctor's annotation result 130 may include the image quality level obtained from the initial screening and the image quality level obtained during the annotation process.

[0053] In some examples, the disease may include at least one of diabetic retinopathy, hypertensive retinopathy, glaucoma, retinal vein occlusion, retinal artery occlusion, age-related macular degeneration, high myopic macular degeneration, retinal detachment, optic nerve disease, and congenital optic disc developmental abnormalities. Thus, at least one disease can be labeled. However, the examples of this disclosure are not limited to this, and the quality control method of this disclosure can be readily extended to the quality control of data labeling for other diseases or data labeling in other fields. In some examples, disease information may be based on disease severity staging. For example, diabetic retinopathy can be staged as stage I, II, III, IV, V, and VI. In other examples, disease information may simply be the presence of a disease, such as the presence of glaucoma.

[0054] As described above, the quality control method may include step S160 (see Figure 2In some examples, in step S160, multiple sets of target annotation results 140 can be obtained based on multiple sets of doctor annotation results 130 that meet preset conditions. In some examples, in step S160, the self-consistency and gold standard consistency of the corresponding first annotating doctor A can be calculated based on the doctor annotation results 130.

[0055] As described above, each image in the target fundus image set 200 can be used as a target fundus image. In some examples, self-consistency can be obtained by judging whether the doctor annotation results 130 obtained by each first annotator A annotating the same target fundus image twice are consistent. In some examples, higher self-consistency indicates that the annotation level of the first annotator A is more stable. Specifically, in some examples, when calculating self-consistency, two sets of annotation results can be obtained: the doctor annotation results 130 of each image in the self-consistency judgment dataset 230, and the doctor annotation results 130 of images in the unlabeled dataset 210 that are duplicates of each image in the self-consistency judgment dataset 230. In some examples, self-consistency can be obtained by taking any one of the two sets of annotation results as the first set of annotation results and the other set as the second set of annotation results, and then evaluating them using the self-consistency judgment evaluation method.

[0056] In some examples, the self-consistency assessment method can be to calculate the disease self-consistency of each first-labeling doctor A's judgment on each disease using a quadratic weighted kappa coefficient. In some examples, the quadratic weighted kappa coefficient κ for a single disease can be...

[0057]

[0058] Among them, W ij X can represent the quadratic weighting coefficient. ij E can represent the number of target fundus images where the judgment result in the first group of annotations is i and the judgment result in the second group of annotations is j. ij E can represent the expected number of target fundus images where the decision result is i in the first set of annotations and j in the second set of annotations. In some examples, when i is not equal to j, E ij It can be zero. In some examples, the secondary weighting coefficient W can be set as needed. ij This highlights the importance of a particular judgment result. Therefore, it is possible to perform a consistency check on the first set of annotation results and the second set of annotation results.

[0059] In some examples, the self-consistency assessment method can weight the self-consistency of each disease to calculate the self-consistency of each first-labeling physician A. For example, the weight of the self-consistency of diabetic retinopathy can be set to 1, and the weight of the self-consistency of other diseases can be set to 0.5. Thus, the self-consistency of each first-labeling physician A can be calculated based on the self-consistency assessment method. However, the examples disclosed herein are not limited to this; in other examples, self-consistency can be calculated in other ways.

[0060] As described above, in step S160, the gold standard consistency of the corresponding first annotating doctor A can be calculated based on the doctor annotation results 130. Specifically, in some examples, when calculating gold standard consistency, the correct annotation results of the gold standard dataset 220 can be used as the first set of annotation results, that is, the annotation results of the gold standard fundus images can be used as the first set of annotation results, and the doctor annotation results 130 of each image in the gold standard dataset 220 can be used as the second set of annotation results. In some examples, gold standard consistency can be obtained by evaluating the first set of annotation results and the second set of annotation results using a gold standard consistency judgment and evaluation method.

[0061] In some examples, the gold standard consistency assessment method can be to calculate the disease gold standard consistency of each first-label physician A's diagnosis of each disease using a quadratic weighted kappa coefficient. In some examples, the gold standard consistency of each disease can be weighted to calculate the gold standard consistency of each first-label physician A. Thus, the self-consistency of each first-label physician A can be calculated based on the gold standard consistency assessment method. For details, please refer to the detailed description of the self-consistency assessment method. However, the examples in this disclosure are not limited to these; in other examples, gold standard consistency can be calculated using other methods.

[0062] In some examples, in step S160, the doctor annotation result 130 of the first annotated doctor A that meets the preset conditions can be obtained, and this doctor annotation result 130 can be used as the target annotation result 140. In some examples, the preset conditions can be d. self ≤D and d gold ≤D, where d self d is a self-evaluation index based on self-consistency. gold Here, D is the gold standard evaluation metric based on gold standard consistency, and D is the evaluation metric threshold. In some examples, D ≤ 5%. Therefore, preset conditions can be determined based on the evaluation metric threshold.

[0063] In some examples, the self-evaluation index d self It can satisfy the formula: d self =|J self -κ self | / κself ×100%, where J self =SE self +SP self -1, SE self For the sensitivity of the first annotator, physician A, obtained based on two sets of annotation results used to assess self-consistency, SP self To determine the specificity of the first annotator, physician A, based on two sets of annotation results used to assess self-consistency, κ self For the self-consistency of the first-labeled physician A. In some examples, either of the two sets of annotation results used to assess self-consistency can be used as the gold standard to evaluate the other set to obtain the sensitivity and specificity of the first-labeled physician A.

[0064] In some examples, the gold standard evaluation metric d gold It can satisfy the formula: d gold =|J gold- κ gold | / κ gold ×100%, where J gold =SE gold +SP gold -1, SE gold For the sensitivity of the first annotator A, based on two sets of annotation results used to assess the consistency of the gold standard, SP gold To determine the specificity of the first annotating physician A based on two sets of annotation results used to assess the gold standard consistency, κ gold This serves as the gold standard for consistency with the first annotated physician A. In some examples, the first set of annotations used to assess gold standard consistency can be used as the gold standard to evaluate the second set of annotations to obtain the sensitivity and specificity of the first annotated physician A.

[0065] Figure 4 This is a flowchart illustrating the determination of the self-consistency threshold involved in the examples of this disclosure. Figure 5 This is a statistical chart illustrating the target self-consistency and target gold standard consistency involved in the examples of this disclosure. Regions D1, D2, D3, and D4 are four regions in the statistical chart. In some examples, the preset conditions can be that self-consistency is greater than a self-consistency threshold and gold standard consistency is greater than a gold standard consistency threshold. In some examples, the target self-consistency and target gold standard consistency of physicians labeled with different thresholds can be analyzed, and anomaly detection methods can be used to determine the self-consistency threshold and gold standard consistency threshold. Thus, the self-consistency threshold and gold standard consistency threshold can be determined.

[0066] In some examples, such as Figure 4As shown, the process for determining the self-consistency threshold based on anomaly detection may include obtaining the target self-consistency of doctors labeled with different thresholds (step S161), calculating the self-consistency mean μ0 and self-consistency variance σ0 (step S162), and calculating the self-consistency threshold based on the self-consistency mean μ0 and self-consistency variance σ0 (step S163). Thus, the self-consistency threshold can be determined.

[0067] In some examples, in step S161, the target self-consistency of physicians labeled with different thresholds can be obtained. Specifically, the target self-consistency of physicians labeled with different thresholds can be analyzed. For example, the target self-consistency of qualified physicians with 1 to 4 years of experience, physicians with 5 to 9 years of experience, and physicians with more than 10 years of experience can be analyzed. In some examples, target self-consistency and target gold standard consistency can be obtained simultaneously (described later). As an example of the statistical results of target self-consistency and target gold standard consistency, Figure 5 This chart shows the statistical results of target self-consistency and target gold standard consistency for threshold-labeled physicians with different years of experience. Circles represent target self-consistency and target gold standard consistency for physicians with 1 to 4 years of experience. Squares represent target self-consistency and target gold standard consistency for physicians with 5 to 9 years of experience. Triangles represent target self-consistency and target gold standard consistency for physicians with more than 10 years of experience. Regions D1, D2, D3, and D4 are the four regions in the statistical chart. Figure 5 As can be seen, the statistical results of the goal self-consistency and goal gold standard consistency of doctors with 1 to 4 years of seniority fall into the first region D1 and the fourth region D4, while the statistical results of the goal self-consistency and goal gold standard consistency of doctors with 5 years or more of seniority mainly fall into the second region D2.

[0068] In some examples, in step S162, the self-consistency mean μ0 and the self-consistency variance σ0 can be calculated. In some examples, the self-consistency mean μ0 and the self-consistency variance σ0 of the target self-consistency can be calculated.

[0069] In some examples, in step S163, the self-consistency threshold can be calculated based on the self-consistency mean μ0 and the self-consistency variance σ0. Specifically, in some examples, under the assumption that the target self-consistency follows a Gaussian distribution, the self-consistency threshold can be μ0 - 1.96 × σ0. In this case, the probability of an anomaly occurring is less than 2.5%. In some examples, the self-consistency threshold can be 0.7977.

[0070] In some examples, the process of determining the gold standard consistency threshold based on anomaly detection may include obtaining the target gold standard consistency of physicians labeled with different thresholds, calculating the mean μ1 and variance σ1 of the target gold standard consistency, and calculating the gold standard consistency threshold based on the mean μ1 and variance σ1. This allows the determination of the gold standard consistency threshold. In some examples, under the assumption that the target gold standard consistency follows a Gaussian distribution, the gold standard consistency threshold may be μ1 - 1.96 × σ1. In this case, the probability of anomalies is less than 2.5%. In some examples, the gold standard consistency threshold may be 0.6235. A detailed description of the process for determining the gold standard consistency threshold can be found in the process for determining the self-consistency threshold, and will not be repeated here.

[0071] However, the examples disclosed herein are not limited to these. In other examples, other anomaly detection methods may be used to determine the self-consistency threshold and the gold standard consistency threshold.

[0072] In some examples, in step S160, the annotation result 130 of the first annotator A that does not meet the preset conditions can be re-annotated by the second annotator B for each image in the target fundus atlas 200. In some examples, the annotated results 130 that do not meet the preset conditions can be re-annotated continuously until a annotated result 130 that meets the preset conditions is obtained as the target annotation result 140. In this case, the annotated result 130 of the first annotator A that does not meet the preset conditions is re-annotated. Thus, the target annotation result 140 can be obtained. In some examples, the second annotator may be different from the first annotator in step S150.

[0073] Figure 6 This is a flowchart illustrating the summarization method involved in this disclosure example. As described above, the quality control method may include step S170 (see...). Figure 2 In some examples, in step S170, multiple sets of target annotation results 140 can be summarized to obtain the final annotation result 150.

[0074] In some examples, a majority voting method can be used to compare the annotation results of each target fundus image in multiple sets of target annotation results 140 to determine the final annotation result 150 for each target fundus image. Specifically, when comparing the annotation results using the majority voting method, a annotation result can only be included as part of the final annotation result 150 if the judgment results in more than half of the annotation results are consistent (i.e., a majority of valid votes are required for acceptance). In some examples, if the final annotation result 150 cannot be determined (i.e., the number of valid votes is less than a majority), the target fundus image is marked as a difficult fundus image. In some examples, difficult fundus images can be annotated and arbitrated to obtain the final annotation result 150. Thus, the final annotation result 150 can be obtained based on the majority voting method.

[0075] In some examples, the arbitrator may annotate complex fundus images to obtain an arbitration annotation result. In some examples, the arbitration annotation result may include at least one judgment. In some examples, the arbitration annotation result may be used as the final annotation result 150.

[0076] However, the examples in this disclosure are not limited to these; in other examples, such as Figure 6 As shown, the process of summarizing step S170 may include steps S171 to S179. In this case, by comparing the annotation results of each target fundus image in multiple sets of target annotation results 140, the target fundus images can be divided into target fundus images with final annotation results 150, fundus images to be quality controlled, and difficult fundus images, and the final annotation results 150 can be obtained.

[0077] In some examples, each target fundus image can be acquired in step S171. Specifically, in some examples, each target fundus image in the target fundus image atlas 200 can be sequentially traversed and compared in step S172.

[0078] In some examples, in step S172, the annotation results of each target fundus image obtained in step S171 in multiple sets of target annotation results 140 can be compared. For example, assuming there are three sets of target annotation results 140, then each target fundus image can have three annotation results from each set of target annotation results 140.

[0079] In some examples, in step S173, it can be determined whether the various annotation results are consistent. For example, the consistency of the three annotation results in step S172 above can be compared.

[0080] In some examples, if all annotation results are consistent, the process can proceed to step S174. In some examples, in step S174, the annotation result can be used as the final annotation result 150 of the target fundus image. In some examples, the annotation results can be considered consistent if the judgment results included in each annotation result are completely identical. For example, if all annotation results show no obvious abnormalities, the annotation results can be considered consistent. Another example is if all annotation results indicate stage I diabetic retinopathy and the presence of glaucoma, then the annotation results can be considered consistent.

[0081] In some examples, if multiple annotation results are inconsistent, the process can proceed to step S175. In some examples, in step S175, it can be determined whether each annotation result includes the same judgment result, and only one annotation result includes a judgment result that was not identified in other annotation results. If so, the process can proceed to step S176; otherwise, the process can proceed to step S177.

[0082] For example, suppose the target fundus image has multiple annotation results, namely a first annotation result, a second annotation result, and a third annotation result, where the first annotation result is diabetic retinopathy stage I, the second annotation result is diabetic retinopathy stage I, and the third annotation result is diabetic retinopathy stage I and the presence of glaucoma. In this case, diabetic retinopathy stage I is the same judgment result included in all the annotation results. The presence of glaucoma is an unidentified judgment result, and only one annotation result includes the presence of glaucoma. However, this disclosure is not limited to this example. In other examples, other judgment conditions can be used for judgment. For example, the judgment condition for step S175 can be that all the annotation results include the same judgment result, and at least one annotation result includes a judgment result that is not identified in other annotation results.

[0083] In some examples, in step S176, the target fundus image can be marked as a fundus image to be quality controlled.

[0084] Figure 7 This is a flowchart illustrating the process of quality control of a fundus image to be quality controlled and obtaining the final annotation result, as illustrated in the examples of this disclosure. In some examples, in step S177, the fundus image to be quality controlled can be quality controlled and the final annotation result 150 can be obtained. For example... Figure 7 As shown, in some examples, the process of performing quality control on the fundus images to be controlled and obtaining the final annotation results may include steps S1771 to S1775.

[0085] In some examples, quality control can be performed on the fundus images to be controlled in step S1771. In some examples, quality control can be performed by a quality control physician to obtain quality control judgment results. In some examples, unidentified judgment results in the fundus images to be controlled (e.g., cases where only one annotation result includes the presence of glaucoma, as described in step S175) can be evaluated to obtain quality control judgment results (e.g., whether there are unidentified judgment results or whether there are no unidentified judgment results).

[0086] In some examples, in step S1772, it can be determined whether there is an unidentified judgment result based on the quality control judgment result of step S1771. If there is no unidentified result, the process can proceed to step S1773; otherwise, the process can proceed to step S1774.

[0087] In some examples, in step S1773, the same determination result can be used as the final annotation result 150. For example, suppose the multiple annotation results of the target fundus image are a first annotation result, a second annotation result, and a third annotation result, where the first annotation result is diabetic retinopathy stage I, the second annotation result is diabetic retinopathy stage I, and the third annotation result is diabetic retinopathy stage I and the presence of glaucoma. In this case, diabetic retinopathy stage I is the same determination result included in all the annotation results, and can be used as the final annotation result 150 of the target fundus image.

[0088] In some examples, in step S1774, the fundus images to be quality controlled can be marked as difficult fundus images.

[0089] In some examples, in step S1775, the complex fundus image may be annotated and arbitrated to obtain a final annotation result 150. In some examples, the complex fundus image may be annotated by an arbitrating physician to obtain an arbitrated annotation result. In some examples, the arbitrated annotation result may include at least one judgment result. In some examples, the arbitrated annotation result may be used as the final annotation result 150. In some examples, the final annotation result 150 may be obtained based on multiple target annotation results 140 of the complex fundus image, the quality control judgment result, and the arbitrated annotation result.

[0090] As described above, the process of summarizing in step S170 may include step S178. In some examples, in step S178, the target fundus image may be marked as a difficult fundus image.

[0091] In some examples, in step S179, the complex fundus image can be annotated and arbitrated to obtain a final annotation result 150. In some examples, the final annotation result 150 can be obtained based on multiple target annotation results 140 and the arbitrated annotation result of the complex fundus image. Thus, the final annotation result 150 of the complex fundus image can be obtained. For details, please refer to the relevant description in step S1775.

[0092] In some examples, statistical results can be obtained by analyzing the physician annotation results 130 of the target fundus atlas 200. In some examples, the statistical results may include the percentage of gold standard consistency re-annotations. In some examples, the gold standard consistency re-annotation percentage can be the percentage of target fundus images in the target fundus atlas 200 that were re-annotated due to gold standard consistency failure. In some examples, the statistical results may include the percentage of self-consistency re-annotations. In some examples, it is the percentage of target fundus images in the target fundus atlas 200 that were re-annotated due to self-consistency failure.

[0093] In some examples, quality control of the annotation process can be performed based on statistical results. For instance, if the percentage of re-annotations based on gold standard consistency exceeds a preset value, subsequent annotation tasks assigned to relevant annotators can be reduced or canceled.

[0094] In some examples, a labeling report can be exported. In some examples, the labeling report may include at least one of the following: doctor labeling results 130, target labeling results 140, final labeling results 150, quality control judgment results, arbitration labeling results, and statistical results.

[0095] The following, combined with Figure 8 This disclosure describes in detail the quality control system 300 for data annotation of fundus images. The quality control system 300 for data annotation of fundus images in this disclosure may sometimes be simply referred to as "quality control system 300". The quality control system 300 is used to implement the aforementioned quality control methods. Figure 8 This is a block diagram illustrating a quality control system for data annotation of fundus images as described in this disclosure example.

[0096] In some examples, such as Figure 8As shown, the quality control system 300 may include an acquisition module 310, a standardization processing module 320, a preliminary screening module 330, a data preparation module 340, an annotation module 350, an evaluation module 360, and a summary module 370. The acquisition module 310 can acquire multiple fundus images. The standardization processing module 320 can standardize each fundus image to obtain multiple standardized fundus images. The preliminary screening module 330 can perform preliminary screening of the quality of each standardized fundus image to obtain multiple qualified fundus images. The data preparation module 340 can prepare a target fundus image set 200, including a dataset to be calibrated 210, a gold standard dataset 220, and a self-consistency judgment dataset 230. The annotation module 350 can acquire multiple sets of doctor annotation results 130, where multiple first annotators A annotate each image in the target fundus image set. The evaluation module 360 ​​can acquire multiple sets of target annotation results 140 based on the multiple sets of doctor annotation results that meet preset conditions. The aggregation module 370 can be used to aggregate multiple sets of target annotation results 140 to obtain a final annotation result 150. In this case, the annotation result 130 of the first annotator A that meets the preset conditions can be obtained based on the gold standard dataset and the self-consistency judgment dataset and used as the target annotation result 140 for aggregation. This can improve the accuracy of data annotation for fundus images.

[0097] In some examples, the fundus image in acquisition module 310 can be a color fundus image. A color fundus image can clearly present rich fundus information such as the optic disc, optic cup, macula, and blood vessels. For a detailed description, please refer to the relevant description of step S110, which will not be repeated here.

[0098] In some examples, within the standardization processing module 320, the standardization process may include at least one of the following: segmenting fundus images by patient, standardizing the naming format of fundus images, filtering out non-fundus images, standardizing the image format of fundus images, and standardizing the background of fundus images. This enables the standardization of fundus images. For a detailed description, please refer to the relevant description of step S120, which will not be repeated here.

[0099] In some examples, within the initial screening module 330, preliminary screening may include classifying standardized fundus images into at least two image quality levels: acceptable and unacceptable. This allows for preliminary screening of the quality of standardized fundus images to quickly obtain acceptable fundus images. In some examples, an acceptable fundus image can be a standardized fundus image with an acceptable image quality level. This allows for the acquisition of acceptable fundus images. However, the examples disclosed herein are not limited to this; in other examples, the standardized fundus images may be further subdivided during preliminary screening. For a detailed description, please refer to the relevant description of step S130, which will not be repeated here.

[0100] In some examples, in the data preparation module 340, the target fundus image set may include a dataset to be calibrated 210, a gold standard dataset 220, and a self-consistency judgment dataset 230. In some examples, the dataset to be calibrated 210 may include multiple qualified fundus images. In some examples, the gold standard dataset 220 may include a first preset number of gold standard fundus images. The gold standard fundus images may be fundus images with known correct annotation results. In some examples, the self-consistency judgment dataset 230 may consist of images from the dataset to be calibrated 210. In some examples, the number of images in the self-consistency judgment dataset 230 may be at least one. In some examples, each image in the target fundus image set 200 may serve as a target fundus image. For a detailed description, please refer to the relevant description of step S140, which will not be repeated here.

[0101] In some examples, in annotation module 350, the doctor's annotation result 130 for each image in the target fundus image set 200 may include at least one judgment result. In some examples, the judgment result may include disease information such as no obvious abnormalities or a disease. In some examples, the disease may include at least one of diabetic retinopathy, hypertensive retinopathy, glaucoma, retinal vein occlusion, retinal artery occlusion, age-related macular degeneration, high myopic macular degeneration, retinal detachment, optic nerve disease, and congenital optic disc developmental abnormalities. Thus, at least one disease can be annotated. In some examples, during annotation, each image in the target fundus image set may also be divided into five image quality levels: very good, good, fair, poor, and very poor. In this case, the final annotation result 150 can be determined by combining more detailed image quality levels. For a detailed description, please refer to the relevant description of step S150, which will not be repeated here.

[0102] In some examples, within the evaluation module 360, the annotation result 130 of the first annotating physician A, which meets both self-consistency and gold standard consistency requirements, can be obtained and used as the target annotation result 140. In some examples, the preset condition can be the annotation result 130 of the first annotating physician A, where both self-consistency and gold standard consistency are greater than the gold standard consistency threshold. Thus, the preset condition can be determined based on the self-consistency threshold and the gold standard consistency threshold. In some examples, the preset condition can be d. self ≤D and d gold ≤D, where d self d is a self-evaluation index based on self-consistency. goldD is the gold standard evaluation index based on gold standard consistency, and it is the evaluation index threshold. In some examples, D ≤ 5%. Therefore, preset conditions can be determined based on the evaluation index threshold. In some examples, the doctor annotation result 130 of the first annotator A that does not meet the preset conditions can be re-annotated by the second annotator B on each image in the target fundus atlas 200 until a doctor annotation result 130 that meets the preset conditions is obtained as the target annotation result 140. For a detailed description, please refer to the relevant description of step S160, which will not be repeated here.

[0103] In some examples, in the assessment module 360, the self-evaluation indicator d self It can satisfy the formula: d self =|J self -κ self | / κ self ×100%, where J self =SE self +SP self -1, SE self For the sensitivity of the first annotator, physician A, obtained based on two sets of annotation results used to assess self-consistency, SP self To determine the specificity of the first annotator, physician A, based on two sets of annotation results used to assess self-consistency, κ self For the self-consistency of the first-labeled physician A. In some examples, either of the two sets of annotation results used to assess self-consistency can be used as the gold standard to evaluate the other set to obtain the sensitivity and specificity of the first-labeled physician A.

[0104] In some examples, in the evaluation module 360, the gold standard evaluation metric d gold It can satisfy the formula: d gold =|J gold- κ gold | / κ gold ×100%, where J gold =SE gold +SP gold -1, SE gold For the sensitivity of the first annotator A, based on two sets of annotation results used to assess the consistency of the gold standard, SP gold To determine the specificity of the first annotating physician A based on two sets of annotation results used to assess the gold standard consistency, κ gold This serves as the gold standard for consistency with the first annotated physician A. In some examples, the first set of annotations used to assess gold standard consistency can be used as the gold standard to evaluate the second set of annotations to obtain the sensitivity and specificity of the first annotated physician A.

[0105] In some examples, when calculating self-consistency, the evaluation module 360 ​​can obtain two sets of annotation results: the doctor annotation results 130 for each image in the self-consistency judgment dataset, and the doctor annotation results 130 for images in the dataset to be labeled that overlap with each image in the self-consistency judgment dataset 230. In some examples, self-consistency can be obtained by using either set of annotation results as the first set and the other as the second set, and then evaluating them using the self-consistency judgment evaluation method. Thus, the self-consistency of each first-annotated doctor A can be calculated based on the self-consistency judgment method. For a detailed description, please refer to the relevant description of step S160, which will not be repeated here.

[0106] In some examples, when calculating gold standard consistency, the evaluation module 360 ​​can use the correct annotation results of the gold standard dataset 220 as the first set of annotation results, that is, the annotation results of the gold standard fundus images as the first set of annotation results, and the doctor annotation results 130 of each image in the gold standard dataset 220 as the second set of annotation results. In some examples, self-consistency can be obtained by evaluating based on the first set of annotation results and the second set of annotation results using the gold standard consistency judgment evaluation method. Thus, the self-consistency of each first-annotation doctor A can be calculated based on the gold standard consistency judgment method. For a detailed description, please refer to the relevant description of step S160, which will not be repeated here.

[0107] In some examples, the self-consistency assessment method in the evaluation module 360 ​​can be to calculate the disease self-consistency of each first-labeling doctor A's judgment on each disease using a quadratic weighted kappa coefficient. In some examples, the quadratic weighted kappa coefficient k for a single disease can be...

[0108]

[0109] Among them, W ij X can represent the quadratic weighting coefficient. ij E can represent the number of target fundus images where the judgment result in the first group of annotations is i and the judgment result in the second group of annotations is j. ij E can represent the expected number of target fundus images where the decision result is i in the first set of annotations and j in the second set of annotations. In some examples, when i is not equal to j, E ij It can be zero. In some examples, the secondary weighting coefficient W can be set as needed. ijThis highlights the importance of a particular judgment result. Therefore, a consistency check can be performed on the first set of annotation results and the second set of annotation results. In some examples, in the self-consistency judgment method, the self-consistency of each disease can be weighted to calculate the self-consistency of each first-annotation doctor A. For a detailed description, please refer to the relevant description of step S160, which will not be repeated here.

[0110] In some examples, the gold standard consistency judgment method in evaluation module 360 ​​can be to calculate the disease gold standard consistency of each first-labeling physician A's judgment on each disease using a quadratic weighted kappa coefficient. In some examples, the gold standard consistency of each disease can be weighted to calculate the gold standard consistency of each first-labeling physician A. Thus, the self-consistency of each first-labeling physician A can be calculated based on the gold standard consistency judgment method. For a detailed description, please refer to the relevant description of step S160, which will not be repeated here.

[0111] In some examples, the evaluation module 360 ​​can analyze the target self-consistency and target gold standard consistency of doctors labeled with different thresholds and determine the self-consistency threshold and gold standard consistency threshold using anomaly detection. This allows for the determination of the self-consistency threshold and gold standard consistency threshold. In some examples, anomaly detection for the self-consistency threshold can be achieved by obtaining the target self-consistency of doctors labeled with different thresholds and calculating the mean self-consistency μ0 and variance σ0. Under the assumption that the target self-consistency follows a Gaussian distribution, the self-consistency threshold can be μ0 - 1.96 × σ0. In some examples, the self-consistency threshold can be 0.7977. In some examples, anomaly detection for the gold standard consistency threshold can be achieved by obtaining the target gold standard consistency of doctors labeled with different thresholds and calculating the mean gold standard consistency μ1 and variance σ1. Under the assumption that the target gold standard consistency follows a Gaussian distribution, the gold standard consistency threshold can be μ1 - 1.96 × σ1. In some examples, the gold standard consistency threshold can be 0.6235. For a detailed description, please refer to the relevant description of step S160, which will not be repeated here.

[0112] In some examples, a majority voting method can be used to compare the annotation results of each target fundus image in multiple sets of target annotation results 140 to determine the final annotation result 150 for each target fundus image. Specifically, when comparing the annotation results using the majority voting method, a annotation result can only be included as part of the final annotation result 150 if the judgment results in more than half of the annotation results are consistent (i.e., a majority of valid votes are required for acceptance). In some examples, if the final annotation result 150 cannot be determined (i.e., a majority of valid votes are not obtained), the target fundus image is marked as a difficult fundus image. In some examples, difficult fundus images can be annotated and arbitrated to obtain the final annotation result 150. Thus, the final annotation result 150 can be obtained based on the majority voting method. For a detailed description, please refer to the relevant description of step S170, which will not be repeated here.

[0113] However, the examples disclosed herein are not limited to this. In other examples, the summary can be a comparison of the annotation results of each target fundus image in multiple sets of target annotation results 140. In some examples, when the annotation results are consistent, the annotation result can be used as the final annotation result 150 of the target fundus image. In some examples, when multiple annotation results are inconsistent, if multiple annotation results simultaneously include the same judgment result and only one annotation result includes a judgment result not identified in other annotation results, the target fundus image can be marked as a fundus image to be quality controlled; otherwise, the target fundus image can be marked as a difficult fundus image. In this case, by comparing the annotation results of each target fundus image in multiple sets of target annotation results 140, the target fundus images can be divided into target fundus images with final annotation results 150, fundus images to be quality controlled, and difficult fundus images, and the final annotation result 150 can be obtained. In some examples, quality control can be performed on the fundus images to be quality controlled. In some examples, if the judgment result not identified does not exist, the same judgment result can be used as the final annotation result 150. In some examples, if an unidentified result exists, the fundus image to be quality controlled can be marked as a difficult fundus image. In some examples, the difficult fundus image can be labeled and arbitrated to obtain the final labeling result 150. Thus, the final labeling result 150 of the difficult fundus image can be obtained. For a detailed description, please refer to the relevant description of step S170, which will not be repeated here.

[0114] While the present disclosure has been specifically described above in conjunction with the accompanying drawings and examples, it is to be understood that the foregoing description does not limit the present disclosure in any way. Those skilled in the art can make modifications and variations to the present disclosure as needed without departing from its essential spirit and scope, and all such modifications and variations shall fall within the scope of the present disclosure.

Claims

1. A quality control method for data annotation of fundus images, characterized in that, include: Acquire multiple fundus images; perform standardization processing on each fundus image to obtain multiple standardized fundus images; The quality of each standardized fundus image is initially screened to obtain multiple qualified fundus images; A target fundus image atlas is prepared, comprising a dataset to be calibrated, a gold standard dataset, and a self-consistency judgment dataset. Each image in the target fundus image atlas serves as a target fundus image. The dataset to be calibrated includes multiple qualified fundus images. The gold standard dataset includes a first preset number of gold standard fundus images with known correct annotation results. The self-consistency judgment dataset consists of at least one image from the dataset to be calibrated. Multiple first annotation doctors annotate each image in the target fundus image atlas to obtain multiple sets of doctor annotation results. Each doctor annotation result includes at least one judgment result, which includes at least disease information such as no obvious abnormalities or one disease. Based on the doctor annotation results, the self-consistency and gold standard consistency of the corresponding first annotation doctors are calculated to obtain a first annotation dataset that meets preset conditions. The doctor's annotation results are used as the target annotation results. The self-consistency is obtained by using any one of the two sets of annotation results as the first set of annotation results and the other set as the second set of annotation results, and evaluating them using a self-consistency judgment evaluation method. The gold standard consistency is obtained by using the correct annotation results of the gold standard dataset as the first set of annotation results and the doctor's annotation results of each image in the gold standard dataset as the second set of annotation results, and evaluating them using a gold standard consistency judgment evaluation method. The two sets of annotation results are the doctor's annotation results of each image in the self-consistency judgment dataset and the doctor's annotation results of images in the dataset to be calibrated that are duplicates of each image in the self-consistency judgment dataset. The multiple sets of target annotation results are summarized to obtain the final annotation result.

2. The quality control method according to claim 1, characterized in that: The preset conditions are that the self-consistency is greater than the self-consistency threshold and the gold standard consistency is greater than the gold standard consistency threshold.

3. The quality control method according to claim 1, characterized in that: If the doctor's annotation result of the first annotator does not meet the preset conditions, the second annotator will re-annotate each image in the target fundus image set until a doctor's annotation result that meets the preset conditions is obtained as the target annotation result.

4. The quality control method according to claim 1, characterized in that: The self-consistency judgment method is to calculate the disease self-consistency of each first labeled doctor's judgment on each of the diseases using a quadratic weighted kappa coefficient, and to weight the self-consistency of each of the diseases to calculate the self-consistency of each first labeled doctor. The gold standard consistency judgment method is to calculate the gold standard consistency of each first-labeling doctor's judgment of each of the diseases using a quadratic weighted kappa coefficient, and to weight the gold standard consistency of each of the diseases to calculate the gold standard consistency of each first-labeling doctor.

5. The quality control method according to claim 4, characterized in that: The quadratic weighted kappa coefficient for , in, Indicates the second-order weighting coefficient. This indicates the number of target fundus images for which the judgment result in the first group of annotation results is i and the judgment result in the second group of annotation results is j. This represents the expected number of target fundus images for which the judgment result in the first group of annotation results is i and the judgment result in the second group of annotation results is j.

6. The quality control method according to claim 2, characterized in that: The study analyzes the target self-consistency and target gold standard consistency of doctors under different thresholds and uses anomaly detection to determine the self-consistency threshold and the gold standard consistency threshold.

7. The quality control method according to claim 6, characterized in that: The anomaly detection method involves obtaining the target self-consistency of doctors labeled with different thresholds and calculating the mean self-consistency value µ0 and the variance of self-consistency σ0. Under the assumption that the target self-consistency follows a Gaussian distribution, the self-consistency threshold is µ0 - 1.96 × σ0. Additionally, the method involves obtaining the target gold standard consistency of doctors labeled with different thresholds and calculating the mean gold standard consistency value µ1 and the variance of gold standard consistency σ1. Under the assumption that the target gold standard consistency follows a Gaussian distribution, the gold standard consistency threshold is µ1 - 1.96 × σ1.

8. The quality control method according to claim 1, characterized in that: The summarization process involves using an absolute majority voting method to compare the annotation results of each target fundus image in multiple sets of target annotation results to determine the final annotation result of each target fundus image. If the final annotation result cannot be determined, the target fundus image is marked as a difficult fundus image. The problematic fundus images are labeled and arbitrated to obtain the final labeling result.

9. The quality control method according to claim 1, characterized in that: The preliminary screening includes dividing the standardized fundus images into at least two image quality levels: qualified and unqualified. The qualified fundus images are the standardized fundus images with a qualified image quality level.

10. The quality control method according to claim 1, characterized in that: In the annotation, each image in the target fundus image set is also divided into five image quality levels: very good, good, fair, poor, and very poor.

11. The quality control method according to claim 1, characterized in that: The diseases mentioned include at least one of the following: diabetic retinopathy, hypertensive retinopathy, glaucoma, retinal vein occlusion, retinal artery occlusion, age-related macular degeneration, high myopic macular degeneration, retinal detachment, optic nerve disease, and congenital optic disc developmental abnormalities.

12. The quality control method according to claim 1, characterized in that: The preset conditions are: and ,in, This is a self-evaluation indicator based on the aforementioned self-consistency. This is a gold standard evaluation index based on the gold standard conformity. The self-evaluation index is the threshold value for the evaluation indicator. Satisfying the formula: ,in, , The sensitivity of the first annotating physician is based on two sets of annotation results used to assess the self-consistency. The specificity of the first annotating physician is obtained based on two sets of annotation results used to assess the self-consistency. The self-consistency of the first labeled physician; the gold standard evaluation index Satisfying the formula: ,in, , The sensitivity of the first annotating physician is based on two sets of annotation results used to assess the consistency of the gold standard. The specificity of the first annotating physician is determined based on two sets of annotation results used to assess the consistency of the gold standard. This is consistent with the gold standard stated by the first-labeled doctor.

13. A quality control system for data annotation of fundus images, characterized in that, include: The acquisition module is used to acquire multiple fundus images; The standardization processing module is used to standardize each of the fundus images to obtain multiple standardized fundus images; the preliminary screening module is used to perform preliminary screening of the quality of each of the standardized fundus images to obtain multiple qualified fundus images. The data preparation module prepares a target fundus image set, which includes a dataset to be calibrated, a gold standard dataset, and a self-consistency judgment dataset. Each image in the target fundus image set serves as a target fundus image. The dataset to be calibrated includes multiple qualified fundus images, the gold standard dataset includes a first preset number of gold standard fundus images with known correct annotations, and the self-consistency judgment dataset consists of at least one image from the dataset to be calibrated. The annotation module acquires multiple sets of annotation results from multiple first annotation doctors who annotate each image in the target fundus image set. Each annotation result includes at least one judgment result, which includes at least disease information such as no obvious abnormalities or one disease. The evaluation module calculates the self-consistency and gold standard consistency of the corresponding first annotation doctors based on the annotation results to obtain a consistent result. The doctor's annotation results of the first annotating doctor under preset conditions are used as the target annotation results. The self-consistency is obtained by using any one of the two sets of annotation results as the first set of annotation results and the other set as the second set of annotation results, and evaluating them using the self-consistency judgment evaluation method. The gold standard consistency is obtained by using the correct annotation results of the gold standard dataset as the first set of annotation results and the doctor's annotation results of each image in the gold standard dataset as the second set of annotation results, and evaluating them using the gold standard consistency judgment evaluation method. The two sets of annotation results are the doctor's annotation results of each image in the self-consistency judgment dataset and the doctor's annotation results of images in the dataset to be calibrated that are duplicates of each image in the self-consistency judgment dataset. The summarization module is used to summarize multiple sets of target annotation results to obtain the final annotation result.