Active learning methods, devices, electronic devices and readable storage media

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By enhancing unlabeled samples and using query model prediction, the uncertainty of target instances is determined, which solves the problem that existing technologies fail to consider the differences in labeling costs, achieves a more efficient labeling method, and reduces labeling costs.

CN114298304BActive Publication Date: 2026-06-30HANGZHOU HIKVISION DIGITAL TECHNOLOGY CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: HANGZHOU HIKVISION DIGITAL TECHNOLOGY CO LTD
Filing Date: 2021-12-29
Publication Date: 2026-06-30

Smart Images

Figure CN114298304B_ABST

Patent Text Reader

Abstract

This application provides an active learning method, apparatus, electronic device, and readable storage medium. The active learning method includes: for any sample in an unlabeled sample set, augmenting the sample using a preset augmentation method to obtain N different augmented samples; using a query model to predict the N different augmented samples, obtaining the prediction result of the target instance in each augmented sample; for any target instance in the sample, determining the uncertainty of the target instance based on the prediction result of the target instance in each augmented sample; and determining whether the target instance uses a pseudo-label or needs manual annotation based on the uncertainty of the target instance. This method can reduce the annotation cost for annotators, improve the annotation efficiency of annotators, and reduce the annotation cost of the dataset.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of machine learning technology, and in particular to an active learning method, apparatus, electronic device, and readable storage medium. Background Technology

[0002] Supervised learning within the deep learning framework has achieved remarkable success in computer vision, thanks not only to increasingly sophisticated model architectures but also to the vast amounts of training data available. In practical applications, labeling training data is costly, incurring costs in terms of time and manpower. To train models with the highest possible performance using the least amount of labeling, active learning techniques have gained increasing attention.

[0003] Active learning is of great significance for the construction of datasets, which can improve the efficiency of dataset collection, improve model performance, reduce manual costs, and has applications in various fields (classification, object detection, segmentation, etc.).

[0004] Current active learning methods assume that the annotation cost required to annotate each image is the same, without considering the differences in annotation cost when manually annotating different images. Summary of the Invention

[0005] In view of this, this application provides an active learning method, apparatus, electronic device, and readable storage medium to reduce the annotation cost of datasets.

[0006] Specifically, this application is implemented through the following technical solution:

[0007] According to a first aspect of the embodiments of this application, an active learning method is provided, comprising:

[0008] For any sample in the unlabeled sample set, the sample is augmented using a preset augmentation method to obtain N different augmented samples; the semantic information of these N different augmented samples is consistent.

[0009] The query model is used to predict the N different augmented samples, and the prediction results of the target instances in each augmented sample are obtained respectively.

[0010] For any target instance in the sample, the uncertainty of the target instance is determined based on the prediction results of the target instance in each augmented sample;

[0011] Based on the uncertainty of the target instance, determine whether the target instance should use a pseudo-label or requires manual labeling.

[0012] According to a second aspect of the embodiments of this application, an active learning device is provided, comprising:

[0013] The enhancement processing unit is used to enhance any sample in the unlabeled sample set using a preset enhancement method to obtain N different enhanced samples; the semantic information of the N different enhanced samples is consistent.

[0014] The prediction unit is used to make predictions for the N different augmented samples using the query model, and obtain the prediction results of the target instance in each augmented sample.

[0015] The first determining unit is used to determine the uncertainty of any target instance in the sample based on the prediction results of the target instance in each enhanced sample.

[0016] The second determining unit is used to determine whether the target instance uses a pseudo-label or needs to be manually labeled based on the uncertainty of the target instance.

[0017] According to a third aspect of the present application, an electronic device is provided, including a processor and a memory, the memory storing machine-executable instructions executable by the processor, the processor being configured to execute the machine-executable instructions to implement the method provided in the first aspect.

[0018] According to a fourth aspect of the embodiments of this application, a machine-readable storage medium is provided, wherein machine-executable instructions are stored therein, and when the machine-executable instructions are executed by a processor, the method provided in the first aspect is implemented.

[0019] The technical solution provided in this application can bring at least the following beneficial effects:

[0020] By using a preset augmentation method to augment the samples, multiple augmented samples are obtained. The inconsistency of the prediction results of the query model on the multiple augmented samples is used to determine the uncertainty of each target instance in the sample. Then, based on the uncertainty of the target instance, it can be determined whether the target instance uses pseudo-labels or needs manual annotation. By taking the target instance as the basic annotation unit and considering the annotation cost according to the target instance, the target instances are divided into target instances that use pseudo-labels (i.e., do not need manual annotation) or target instances that need manual annotation. This reduces the annotation cost for annotators, improves the annotation efficiency of annotators, and reduces the annotation cost of the dataset. Attached Figure Description

[0021] Figure 1 This is a flowchart illustrating an active learning method according to an exemplary embodiment of this application;

[0022] Figure 2 This is a schematic diagram illustrating the overall process of an active learning method in a specific application scenario, as shown in an exemplary embodiment of this application.

[0023] Figure 3 This is a schematic diagram of the structure of an active learning device shown in an exemplary embodiment of this application;

[0024] Figure 4 This is a schematic diagram of the hardware structure of an electronic device illustrated in an exemplary embodiment of this application. Detailed Implementation

[0025] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this application as detailed in the appended claims.

[0026] The terminology used in this application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The singular forms “a,” “the,” and “the” used in this application and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise.

[0027] To enable those skilled in the art to better understand the technical solutions provided in the embodiments of this application, and to make the above-mentioned objectives, features and advantages of the embodiments of this application more apparent and understandable, the technical solutions in the embodiments of this application will be further described in detail below with reference to the accompanying drawings.

[0028] Please see Figure 1 This is a flowchart illustrating an active learning method provided in an embodiment of this application, as shown below. Figure 1 As shown, the target detection method may include the following steps:

[0029] Step S100: For any sample in the unlabeled sample set, use a preset enhancement method to enhance the sample to obtain N different enhanced samples; the semantic information of the N different enhanced samples is consistent, and N≥2.

[0030] For example, the preset enhancement method may include, but is not limited to, one or more of the following enhancement methods: identity transformation, horizontal flipping, scaling transformation, Gaussian noise, etc.

[0031] For example, the above N different augmented samples may include the original sample (i.e., the original sample is augmented using N-1 augmentation methods respectively), or may not include the original sample (i.e., the original sample is augmented using N augmentation methods respectively).

[0032] Step S110: Use the query model to predict the N different augmented samples and obtain the prediction results of the target instances in each augmented sample.

[0033] Step S120: For any target instance in the sample, determine the uncertainty of the target instance based on the prediction results of the target instance in each enhanced sample.

[0034] For example, each target to be detected included in the image (such as the enhanced sample above) corresponds to a target instance.

[0035] For example, the target to be detected may include, but is not limited to, people, vehicles, or animals.

[0036] In this embodiment of the application, considering that the number of target instances in each image is different and the annotation cost is different, the entire frame image is no longer used as the basic annotation unit when determining the sample annotation, but the target instance is used as the basic annotation unit.

[0037] In this embodiment of the application, the uncertainty of the target instance within the sample is determined by the inconsistency of the prediction results of the query model for the same target instance in different augmented samples corresponding to the same sample.

[0038] It should be noted that the query model used to predict different augmented samples can be a pre-trained query model.

[0039] For example, if there are initially labeled samples, the query model can be pre-trained based on the labeled sample set; if there are initially no labeled samples, a portion can be randomly selected from the unlabeled sample set, manually labeled, and then the labeled samples can be used to pre-train the query model.

[0040] Step S130: Determine whether the target instance uses a pseudo-label or requires manual labeling based on the uncertainty of the target instance.

[0041] In this embodiment of the application, the uncertainty of the target instance can be characterized by the inconsistency of the prediction results of the same target instance in different augmented samples corresponding to the same sample using the same query model.

[0042] For example, the higher the inconsistency in the prediction results of the query model for the same target instance in different augmented samples, the lower the reliability of the query model's prediction results for that data, and the less knowledge the query model itself learns to predict that data. Therefore, that data should be labeled first.

[0043] Accordingly, the uncertainty of the target instance can be used to determine whether the target instance should use a pseudo-label or require manual labeling.

[0044] For example, target instances with high uncertainty require manual labeling, while targets with low uncertainty can use model predictions as pseudo-labels.

[0045] It can be seen that, in Figure 1 In the illustrated method, samples are augmented using a preset augmentation method to obtain multiple augmented samples. The inconsistency of prediction results for these augmented samples is used to determine the uncertainty of each target instance in the sample. Based on the uncertainty of the target instance, it can be determined whether the target instance should use a pseudo-label or require manual annotation. By using the target instance as the basic annotation unit and considering the annotation cost according to the target instance, the target instances are divided into target instances that use pseudo-labels (i.e., do not require manual annotation) or target instances that require manual annotation. This reduces the annotation cost for annotators, improves the annotation efficiency of annotators, and reduces the annotation cost of the dataset.

[0046] In some embodiments, step S130, determining whether the target instance uses a pseudo-label or requires manual labeling based on the uncertainty of the target instance, may include:

[0047] If the uncertainty of the target instance is greater than the first uncertainty threshold, then the target instance needs to be manually labeled.

[0048] If the uncertainty of the target instance is less than or equal to the first uncertainty threshold, then the target instance is determined to use a pseudo-label.

[0049] For example, considering that the higher the uncertainty of a target instance, the lower the reliability and accuracy of the query model's prediction results for that target instance, an uncertainty threshold (referred to as the first uncertainty threshold in this paper) can be set according to requirements. For target instances with uncertainty greater than the first uncertainty threshold, manual labeling can be determined; for target instances with uncertainty less than or equal to the first uncertainty threshold, pseudo-labels can be used.

[0050] It should be noted that if the first uncertainty threshold is set too high, target instances with high uncertainty will be identified as requiring pseudo-labels. Since the prediction accuracy of target instances with high uncertainty is low, the performance of the trained query model will be poor in this case. If the first uncertainty threshold is set too low, most target instances will be identified as requiring manual annotation, resulting in low annotation efficiency and increased annotation costs for annotators. Therefore, a reasonable first uncertainty threshold can be set to balance annotation costs and model performance based on actual needs.

[0051] In some embodiments, the query model described above is an object detection model.

[0052] In step S120, for any target instance in the sample, the uncertainty of the target instance is determined based on the prediction results of the target instance in each augmented sample, which may include:

[0053] Select any detection box from any augmentation sample among the N different augmentation samples, and use the detection box in the augmentation sample as the cluster center. Based on the overlap between the detection boxes in other augmentation samples and the detection box, perform clustering to obtain the cluster cluster corresponding to the cluster center.

[0054] Based on the detection boxes in the cluster, the uncertainty of the target instance corresponding to the cluster center is determined.

[0055] For example, in the task of object detection, an object in a sample can be an object instance.

[0056] Since the detection result of a sample is a set of detection boxes, and the detection result of a sample can include one or more detection boxes, in order to calculate the inconsistency of the detection results of different augmented samples, the detection boxes in different augmented samples need to be correlated with each other, that is, to determine which detection boxes belong to the same target, so as to determine the inconsistency of the detection results of different augmented samples based on the detection boxes corresponding to the same target in different augmented samples.

[0057] For example, based on the overlap of detection boxes in different augmented samples, the detection boxes in different augmented samples can be clustered to obtain the detection boxes in each augmented sample corresponding to the same target.

[0058] For example, any detection box in any augmented sample can be used as the cluster center, and the detection boxes can be clustered based on the intersection-union ratio (IOU) between the detection boxes in other augmented samples and the detection box, so as to determine the detection boxes in each augmented sample corresponding to the same target instance.

[0059] For example, a target instance can correspond to a cluster.

[0060] For example, the detection boxes in the enhanced samples corresponding to the same target instance belong to the same cluster.

[0061] For example, the uncertainty of the target instance corresponding to the same cluster can be determined based on the detection boxes in the same cluster.

[0062] In one example, determining whether a target instance should use a pseudo-label or require manual annotation based on its uncertainty also includes:

[0063] Based on the uncertainty of the target instance, determine the type of the target instance;

[0064] The target instance types include the first type using pseudo-labels, the second type requiring partial annotation, and the third type requiring full manual annotation.

[0065] The uncertainty of the first type of target instance, the second type of target instance, and the third type of target instance increases sequentially.

[0066] For example, in order to further reduce the annotation cost while ensuring model performance, target instances that are determined to require manual annotation can be further divided according to uncertainty into those requiring full manual annotation, such as both location and category requiring manual annotation; and those requiring partial annotation, such as location or category requiring manual annotation.

[0067] Accordingly, based on the uncertainty of the target instance, the target instance is classified into multiple types.

[0068] For example, the types of target instances may include those that require pseudo-labels (referred to as the first type in this document), those that require partial annotation (referred to as the second type in this document), and those that require full manual annotation (referred to as the third type in this document).

[0069] For example, the uncertainty of the first type of target instance, the second type of target instance, and the third type of target instance increases sequentially.

[0070] For example, two uncertainty thresholds can be preset (such as a first uncertainty threshold and a second uncertainty threshold, where the first uncertainty threshold is less than the second uncertainty threshold). Target instances with uncertainty higher than the second uncertainty threshold are classified as type 3; target instances with uncertainty less than or equal to the second uncertainty threshold and greater than the first uncertainty threshold are classified as type 2; and target instances with uncertainty less than or equal to the first uncertainty threshold are classified as type 1.

[0071] As an example, the second type of target instance is categorized through manual annotation;

[0072] The category loss for the second type of target instance is used for supervised training of the target detection model.

[0073] For example, some categories are labeled manually.

[0074] Since the location reliability of second-type targets is poor, when training the target detection model based on labeled samples, for second-type target instances, their category loss can be used for supervised training of the target detection model, instead of their location loss, in order to improve model performance.

[0075] In some embodiments, the above query model is a segmentation model;

[0076] In step S120, for any target instance in the sample, the uncertainty of the target instance is determined based on the prediction results of the target instance in each augmented sample, which may include:

[0077] Using a pixel block of a preset size as the target instance, for any target instance in the sample, the uncertainty of the target instance is determined based on the prediction results of pixel blocks at the same position in each enhanced sample.

[0078] For example, taking a segmentation task as an example, a pixel block of a preset size (such as K*K) can be used as the target instance, where K≥2.

[0079] For any target instance, the uncertainty of the target instance can be determined based on the prediction results of pixel blocks at the same position in each enhanced sample.

[0080] In one example, determining the uncertainty of any target instance in the sample, based on the prediction results of target instances at the same location in each augmented sample, may include:

[0081] For any pixel in the target instance, the uncertainty of the pixel is determined based on the classification results of pixels at the same position in each enhanced sample;

[0082] The uncertainty of the target instance is determined based on the average uncertainty of each pixel in the target instance.

[0083] For example, considering that the output of a segmentation task is the classification result for each pixel, it does not require the complex association of detection boxes as in a detection task. Different segmentation results can be associated based on the pixel's position.

[0084] For example, for any pixel in any target instance, the uncertainty of that pixel can be determined based on the classification results of pixels at the same position in each augmented sample.

[0085] For any target instance, the uncertainty of the target instance can be determined based on the average uncertainty of each pixel in the target instance.

[0086] In some embodiments, for a target instance using a pseudo-label, the weights of the target instance's loss used for supervised training of the query model are determined based on the uncertainty of the target instance; wherein the weights are negatively correlated with the uncertainty of the target instance.

[0087] For example, considering that the higher the uncertainty of a target instance, the worse the reliability and accuracy of its pseudo-label, when using target instances with pseudo-labels for model training, the weights of the target instance loss for model training can be determined based on the uncertainty of the target instance.

[0088] For example, the higher the uncertainty of the target instance, the lower the weight of its loss used for model training. The specific implementation can be illustrated with specific examples below.

[0089] To enable those skilled in the art to better understand the technical solutions provided in the embodiments of this application, the technical solutions provided in the embodiments of this application are described below in conjunction with specific embodiments.

[0090] In this embodiment, the cost of manually annotating each image varies depending on the task, such as object detection and segmentation. For example, annotating an image with multiple objects takes longer than annotating an image with only one object. Therefore, object instances are used as the basic annotation unit, rather than images, to reduce annotation costs through multi-level annotation.

[0091] In this embodiment, the uncertainty of the target instance within a sample is determined by the inconsistency in the prediction results of the query model for the same target instance in different augmented samples corresponding to the same sample.

[0092] For example, enhancement methods for sample enhancement may include, but are not limited to: identity transformation, horizontal flipping, scaling transformation, Gaussian noise, etc.

[0093] For example, target instances with high uncertainty require manual labeling, while targets with low uncertainty use model predictions as pseudo-labels.

[0094] For example, target instances that require manual annotation can be divided into target instances that require full annotation (i.e., require full manual annotation) and partially annotated targets based on their uncertainty.

[0095] For example, for pseudo-labels, since the accuracy of pseudo-labels is not fixed, uncertainty can be used to weight pseudo-labels. Pseudo-labels with lower uncertainty are more likely to be correct. When training the model, among the target instances using pseudo-labels, the smaller the uncertainty of the target instance, the greater the weight of the target instance when participating in model training.

[0096] The new query model requires the participation of fully labeled, partially labeled, and pseudo-labeled target instances in the query model training.

[0097] When all three types of target instances mentioned above participate in training, one round of active learning sample selection is completed.

[0098] For example, multiple rounds of proactive learning and selection can be conducted according to actual needs, and the overall process can be as follows: Figure 2 As shown.

[0099] The following sections will use object detection and segmentation as examples to explain the implementation details of the above process.

[0100] Example 1: Target Detection Task

[0101] 1.1 Scoring Function

[0102] The purpose of a scoring function is to evaluate the value of target instances in a sample. Its basic principle is that in object detection, assuming that different data augmentations are applied to the same data while preserving semantic information, multiple detection results are obtained. The inconsistency between these results characterizes the model's uncertainty regarding that data: the higher the inconsistency, the lower the reliability of the object detection model's detection results for that data, and the less knowledge the object detection model itself has learned to detect that data. Therefore, such data should be manually labeled first.

[0103] For example, for input data (i.e. samples in the unlabeled sample set), different data augmentation methods (such as flipping, Gaussian noise, etc.) can be used to obtain multiple augmented samples, and different detection results can be obtained by passing them through the same target detection model. Then, the inconsistency between these detection results is calculated to obtain the uncertainty of each target instance.

[0104] Since the detection result of a sample is a set of detection boxes, the detection result of a sample may include one or more detection boxes. When the detection result of a sample includes multiple detection boxes, if we want to calculate the inconsistency of different detection results, we need to associate the detection boxes in different detection results with each other, that is, which detection boxes correspond to the same target instance, and then calculate the inconsistency of different detection results.

[0105] For example, suppose that for any sample, there are N enhanced samples (including the original sample itself) after enhancement processing, and cluster the detection boxes that belong to the same category in all detection results.

[0106] For example, a randomly selected unclustered detection box in an augmented sample is used as the cluster center. Detection boxes in other augmented samples are then clustered. The clustering criterion is that the Intersection over Union (IoU) between detection boxes is greater than a preset threshold (e.g., 0.5), resulting in corresponding clusters. Each cluster contains n detection boxes. k The number of detection boxes n in each cluster kshould be equal to the number N of data augmentations. For n k <clusters with N, supplement N - n k detection boxes with a confidence of 0.

[0107] Suppose there are K target instances in the sample. Then, according to the above method, K clusters can be obtained, and one target instance corresponds to one cluster.

[0108] After obtaining all the clusters, the inconsistency within each cluster can be calculated respectively. For example, the variance or mutual information (MI for short) within each cluster can be calculated.

[0109] Taking mutual information as an example, for any cluster, the calculation method of the mutual information of this cluster can be:

[0110]

[0111] where represents the entropy calculation function, and p is the confidence vector of the detection box.

[0112] 1.2. Query Model Training

[0113] After obtaining the uncertainty of each target instance (which can be called the uncertainty score) using the scoring function, the target instances can be divided into three parts according to the uncertainty from large to small: target instances that need to be fully manually labeled, target instances that need to be partially labeled, and target instances that do not need to be labeled (i.e., using pseudo - labels).

[0114] Exemplarily, target instances with high uncertainty (e.g., greater than a preset second uncertainty threshold) need to be fully manually labeled, and full labeling means labeling the category and location of the target instance.

[0115] Target instances that need to be partially labeled are those with medium - sized uncertainty (i.e., the uncertainty is between the first uncertainty threshold and the second uncertainty threshold). For this part of the samples, only manual partial labeling is required, such as only labeling the category of the target instance.

[0116] For samples with low uncertainty (e.g., less than or equal to a preset first uncertainty threshold), no manual labeling is required, and the predicted results can be used as pseudo - labels.

[0117] When training the query model, all three parts of the targets participate in the model training, and the form of the loss function is as follows:

[0118]

[0119] where Φ, Ψ, and Ω represent fully labeled target instances, partially labeled target instances, and target instances using pseudo - labels respectively. L cls，iFor the category loss of the i-th target instance, L reg，i Let be the detection box position loss for the i-th target instance.

[0120] For example, for target instances using pseudo-labels, uncertainty MI can be used for weighting. The greater the uncertainty, the less accurate the pseudo-label is, and therefore the lower its weight in training, in order to improve the robustness of the label and thus improve the performance of the model.

[0121] The above-mentioned multi-level annotation (full annotation, partial annotation, pseudo annotation) can greatly reduce the annotation cost for annotators while maintaining the performance of the model.

[0122] Example 2: Segmentation Task

[0123] For the images that need to be labeled for the segmentation task, the images are divided into K*K pixel blocks, and each pixel block is treated as a target instance.

[0124] 2.1 Scoring Function

[0125] Since the output of a segmentation task is the classification result for each pixel, it does not require the complex association of detection bounding boxes as in a detection task. Associating different segmentation results with each other is relatively simple; they can be associated based on the pixel's position.

[0126] After association is complete, the mutual information (MI) of each pixel is calculated using the MI calculation formula described above, which represents the inconsistency of each pixel. The uncertainty of each target instance can be obtained by averaging the inconsistencies of the pixels in each target instance.

[0127] 2.2 Query Model

[0128] Based on the uncertainty from largest to smallest, the target instances are divided into two parts: target instances that require full manual annotation and target instances that do not require annotation (i.e., use pseudo-labels).

[0129] When training the query model, both objectives participate in the model training, and the loss function takes the following form:

[0130]

[0131] Where Φ and Ω represent fully annotated target instances and target instances using pseudo-labels, respectively.

[0132] For example, for target instances using pseudo-labels, uncertainty MI can be used for weighting. The greater the uncertainty, the less accurate the pseudo-label is, and therefore the lower its weight in training. This improves the robustness of the label, reduces the impact of noise in the pseudo-label on model training, and thus improves the model's performance.

[0133] The method provided in this application has been described above. The apparatus provided in this application is described below:

[0134] Please see Figure 3 This is a schematic diagram of the structure of an active learning device provided in an embodiment of this application, as shown below. Figure 3 As shown, the active learning device may include:

[0135] The enhancement processing unit 310 is used to enhance any sample in the unlabeled sample set using a preset enhancement method to obtain N different enhanced samples; the semantic information of the N different enhanced samples is consistent.

[0136] The prediction unit 320 is used to predict the N different augmented samples using the query model, and obtain the prediction results of the target instance in each augmented sample.

[0137] The first determining unit 330 is used to determine the uncertainty of any target instance in the sample based on the prediction results of the target instance in each enhanced sample.

[0138] The second determining unit 340 is used to determine whether the target instance uses a pseudo-label or needs to be manually labeled based on the uncertainty of the target instance.

[0139] In some embodiments, the second determining unit 340 determines whether the target instance uses a pseudo-label or requires manual labeling based on the uncertainty of the target instance, including:

[0140] If the uncertainty of the target instance is greater than the first uncertainty threshold, then the target instance needs to be manually labeled.

[0141] If the uncertainty of the target instance is less than or equal to the first uncertainty threshold, then the target instance is determined to use a pseudo-label.

[0142] In some embodiments, the query model is an object detection model;

[0143] For any target instance in the sample, the first determining unit 330 determines the uncertainty of the target instance based on the prediction results of the target instance in each enhanced sample, including:

[0144] Select any detection box from any augmentation sample among the N different augmentation samples, and use the detection box in the augmentation sample as the cluster center. Based on the overlap between the detection boxes in other augmentation samples and the detection box, perform clustering to obtain the cluster cluster corresponding to the cluster center.

[0145] Based on the detection boxes in the cluster, the uncertainty of the target instance corresponding to the cluster center is determined.

[0146] In some embodiments, the second determining unit 340 determines whether the target instance uses a pseudo-label or requires manual labeling based on the uncertainty of the target instance, and further includes:

[0147] Based on the uncertainty of the target instance, determine the type of the target instance;

[0148] The target instance types include the first type using pseudo-labels, the second type requiring partial annotation, and the third type requiring full manual annotation.

[0149] The uncertainty of the first type of target instance, the second type of target instance, and the third type of target instance increases sequentially.

[0150] In some embodiments, the second type of target instances are categorized through manual annotation.

[0151] The category loss of the second type of target instance is used for supervised training of the target detection model.

[0152] In some embodiments, the query model is a segmentation model;

[0153] For any target instance in the sample, the first determination unit 330 determines the uncertainty of the target instance based on the prediction results of the target instance in each enhanced sample, including:

[0154] Using a pixel block of a preset size as the target instance, for any target instance in the sample, the uncertainty of the target instance is determined based on the prediction results of pixel blocks at the same position in each enhanced sample.

[0155] In some embodiments, the first determining unit 330, for any target instance in the sample, determines the uncertainty of the target instance based on the prediction results of target instances at the same location in each enhanced sample, including:

[0156] For any pixel in the target instance, the uncertainty of the pixel is determined based on the classification results of pixels at the same position in each enhanced sample;

[0157] The uncertainty of the target instance is determined based on the average uncertainty of each pixel in the target instance.

[0158] In some embodiments, for a target instance using a pseudo-label, the weights of the target instance's loss used for supervised training of the query model are determined based on the uncertainty of the target instance; wherein the weights are negatively correlated with the uncertainty of the target instance.

[0159] This application provides an electronic device including a processor and a memory, wherein the memory stores machine-executable instructions that can be executed by the processor, and the processor executes the machine-executable instructions to implement the active learning method described above.

[0160] Please see Figure 4 This is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of this application. The electronic device may include a processor 401 and a memory 402 storing machine-executable instructions. The processor 401 and the memory 402 can communicate via a system bus 403. Furthermore, by reading and executing the machine-executable instructions corresponding to the active learning logic in the memory 402, the processor 401 can execute the active learning method described above.

[0161] The memory 402 mentioned in this document can be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, etc. For example, machine-readable storage media can be: RAM (Random Access Memory), volatile memory, non-volatile memory, flash memory, storage drives (such as hard disk drives), solid-state drives, any type of storage disk (such as optical discs, DVDs, etc.), or similar storage media, or combinations thereof.

[0162] In some embodiments, a machine-readable storage medium, such as Figure 4 The memory 402 in the machine-readable storage medium stores machine-executable instructions that, when executed by a processor, implement the active learning method described above. For example, the storage medium may be ROM, RAM, CD-ROM, magnetic tape, floppy disk, or optical data storage device.

[0163] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0164] The above description is merely a preferred embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of protection of this application.

Claims

1. An active learning method, characterized in that, include: For any sample in the unlabeled sample set, the sample is augmented using a preset augmentation method to obtain N different augmented samples; The semantic information of the N different augmented samples is consistent, and N≥2; The query model is used to predict the N different augmented samples, and the prediction results of the target instances in each augmented sample are obtained respectively. For any target instance in the sample, the uncertainty of the target instance is determined based on the prediction results of the target instance in each augmented sample; wherein, the uncertainty of the target instance is determined based on the inconsistency of the prediction results of the query model for the same target instance in different augmented samples corresponding to the same sample; the higher the inconsistency of the prediction results of the query model for the same target instance in different augmented samples corresponding to the same sample, the lower the reliability of the prediction result of the query model for the target instance, and the greater the uncertainty of the target instance. Based on the uncertainty of the target instance, determine whether the target instance should use a pseudo-label or requires manual annotation; The query model is a target detection model; For any target instance in the sample, the uncertainty of the target instance is determined based on the prediction results of the target instance in each enhanced sample, including: Select any detection box from any augmentation sample among the N different augmentation samples, and use the detection box in the augmentation sample as the cluster center. Based on the overlap between the detection boxes in other augmentation samples and the detection box, perform clustering to obtain the cluster cluster corresponding to the cluster center. Based on the detection boxes in the cluster, the uncertainty of the target instance corresponding to the cluster center is determined.

2. The method according to claim 1, characterized in that, The process of determining whether a target instance needs to be labeled using a pseudo-label or requires manual annotation based on the uncertainty of the target instance includes: If the uncertainty of the target instance is greater than the first uncertainty threshold, then the target instance needs to be manually labeled. If the uncertainty of the target instance is less than or equal to the first uncertainty threshold, then the target instance is determined to use a pseudo-label.

3. The method according to claim 1, characterized in that, The step of determining whether a target instance needs to be labeled or manually labeled based on its uncertainty also includes: Based on the uncertainty of the target instance, determine the type of the target instance; The target instance types include the first type using pseudo-labels, the second type requiring partial annotation, and the third type requiring full manual annotation. The uncertainty of the first type of target instance, the second type of target instance, and the third type of target instance increases sequentially.

4. The method according to claim 3, characterized in that, The second type of target instance is categorized through manual annotation. The category loss of the second type of target instance is used for supervised training of the target detection model.

5. The method according to claim 1, characterized in that, For a target instance using pseudo-labels, the weights of the target instance's loss used for supervised training of the query model are determined based on the uncertainty of the target instance; where the weights are negatively correlated with the uncertainty of the target instance.

6. An active learning device, characterized in that, include: The enhancement processing unit is used to enhance any sample in the unlabeled sample set using a preset enhancement method to obtain N different enhanced samples; The semantic information of the N different augmented samples is consistent, and N≥2; The prediction unit is used to make predictions for the N different augmented samples using the query model, and obtain the prediction results of the target instance in each augmented sample. The first determining unit is used to determine the uncertainty of any target instance in the sample based on the prediction results of the target instance in each augmented sample; wherein, the uncertainty of the target instance is determined based on the inconsistency of the prediction results of the same target instance in different augmented samples corresponding to the same sample by the query model; the higher the inconsistency of the prediction results of the query model in different augmented samples corresponding to the same sample, the lower the reliability of the prediction result of the query model for the target instance, and the greater the uncertainty of the target instance. The second determining unit is used to determine whether the target instance uses a pseudo-label or needs to be manually labeled based on the uncertainty of the target instance; The query model is a target detection model; For any target instance in the sample, the first determining unit determines the uncertainty of the target instance based on the prediction results of the target instance in each enhanced sample, including: Select any detection box from any augmentation sample among the N different augmentation samples, and use the detection box in the augmentation sample as the cluster center. Based on the overlap between the detection boxes in other augmentation samples and the detection box, perform clustering to obtain the cluster cluster corresponding to the cluster center. Based on the detection boxes in the cluster, the uncertainty of the target instance corresponding to the cluster center is determined.

7. The apparatus according to claim 6, characterized in that, The second determining unit determines whether the target instance uses a pseudo-label or requires manual labeling based on the uncertainty of the target instance, including: If the uncertainty of the target instance is greater than the first uncertainty threshold, then the target instance needs to be manually labeled. If the uncertainty of the target instance is less than or equal to the first uncertainty threshold, then the target instance is determined to use a pseudo-label. And / or, The second determining unit determines whether a target instance uses a pseudo-label or requires manual labeling based on the uncertainty of the target instance, and further includes: Based on the uncertainty of the target instance, determine the type of the target instance; The target instance types include the first type using pseudo-labels, the second type requiring partial annotation, and the third type requiring full manual annotation. The uncertainty of the first type of target instance, the second type of target instance, and the third type of target instance increases sequentially; The second type of target instance is categorized through manual annotation. The category loss of the second type of target instance is used for supervised training of the target detection model; And / or, For a target instance using pseudo-labels, the weights of the target instance's loss used for supervised training of the query model are determined based on the uncertainty of the target instance; where the weights are negatively correlated with the uncertainty of the target instance.

8. An electronic device, characterized in that, The method includes a processor and a memory, the memory storing machine-executable instructions that can be executed by the processor, the processor executing the machine-executable instructions to implement the method as described in any one of claims 1-5.

9. A machine-readable storage medium, characterized in that, The machine-readable storage medium stores machine-executable instructions, which, when executed by a processor, implement the method as described in any one of claims 1-5.