Method, system and device for improving quality of classified learning data set and storage medium

A data set and data set technology, applied in the field of image classification, can solve the problems of reducing the size of the data set, worsening the performance of the training classifier, increasing the cost of data processing, etc., to reduce the error level, improve the generalization performance, reduce the The effect of error rate

Pending Publication Date: 2022-01-11
NANJING UNIV OF POSTS & TELECOMM
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] Neural networks have made great progress in the field of image processing technologies such as image classification and recognition, object detection, etc., thanks to their ability to discover complex structures in high-dimensional data; usually dealing with these complex problems requires a lot of However, the acquisition of accurate and reliable large-scale labeled datasets is often very expensive and time-consuming; in recent years, crowdsourcing has gradually become the main solution for obtaining large-scale labeled datasets, which distributes data samples to a large number of Non-professional annotators carry out annotations. However, the capabilities and preferences of each annotator are different, and there will be errors in the completed labels. For example, for some biomedical images, the sample annotation often requires professional knowledge, and non-professional annotators are very likely to make mistakes in the samples. Mislabeling, resulting in a data set with wrong labels, the existence of wrong labels will make the performance of the trained classifier worse
[0003] Existing technologies use the memory characteristics of deep networks to screen and remove erroneous data in the data set to improve the quality of the data set, but the determination of the error level is a challenge, and experts are often used to mark a small part of the data set to estimate the quality of the entire data set. Error level, the introduction of experts increases the cost of data processing; at the same time, the screened out wrong label data is removed, reducing the size of the data set, making the adequacy of network training lost guarantee

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method, system and device for improving quality of classified learning data set and storage medium
  • Method, system and device for improving quality of classified learning data set and storage medium
  • Method, system and device for improving quality of classified learning data set and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] Such as figure 1 As shown, the present invention provides a method for improving the quality of classification learning datasets, comprising the steps of:

[0041] S1. Using a pre-designed update method to update the data set;

[0042] S2. In response to detecting that the proportion of clean labels in the data set does not increase, output the data set;

[0043] S3. In response to detecting that the proportion of clean labels in the data set increases, update the data set again using a pre-designed update method;

[0044] Among them, the pre-designed update methods include:

[0045] Obtain the error transition probability matrix of the label through the network output of the anchor sample;

[0046] Obtain the error rate and weight of the label according to the error transition probability matrix of the label, and obtain the weighted average error rate of the data set according to the error rate and weight of the label;

[0047] The data samples are sorted according...

Embodiment 2

[0061] The embodiment of the present invention also provides a system for improving the quality of classification learning data sets, including:

[0062] Update module: used to update the dataset using a pre-designed update method;

[0063] Output module: used to output the data set in response to detecting that the proportion of clean labels in the data set does not increase;

[0064] Re-update module: used to update the data set again by using a pre-designed update method in response to detecting an increase in the proportion of clean labels in the data set.

Embodiment 3

[0066] The embodiment of the present invention also provides a device for improving the quality of the classification learning data set, including a processor and a storage medium;

[0067] The storage medium is used to store instructions;

[0068] The processor is operable in accordance with the instructions to perform the steps according to the following method:

[0069] S1. Using a pre-designed update method to update the data set;

[0070] S2. In response to detecting that the proportion of clean labels in the data set does not increase, output the data set;

[0071] S3. In response to detecting that the proportion of clean labels in the data set increases, update the data set again using a pre-designed update method;

[0072] Among them, the pre-designed update methods include:

[0073] Obtain the error transition probability matrix of the label through the network output of the anchor sample;

[0074] Obtain the error rate and weight of the label according to the err...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method, system and device for improving the quality of a classified learning data set and a storage medium, and belongs to the technical field of image classification. The method comprises the following steps: updating a data set by utilizing a pre-designed updating method; outputting the data set in response to the detection that the proportion of the clean labels in the data set is not increased; in response to the detection that the proportion of the clean labels in the data set is increased, updating the data set by using the pre-designed updating method again, wherein the pre-designed updating method comprises the following steps: acquiring an error transition probability matrix of a label through network output of an anchor point sample; according to the error transition probability matrix of the label, obtaining an error rate and a weight of the label, and according to the error rate and the weight of the label, obtaining a weighted average error rate of the data set; sorting the data samples according to the probability of label labeling errors, screening out error label samples in combination with the weighted average error rate of the data set, correcting the labels of the error label samples by using the error transition probability matrix of the labels, and updating the data set.

Description

technical field [0001] The invention relates to a method, system, device and storage medium for improving the quality of a classification learning data set, and belongs to the technical field of image classification. Background technique [0002] Neural networks have made great progress in the field of image processing technologies such as image classification and recognition, object detection, etc., thanks to their ability to discover complex structures in high-dimensional data; usually dealing with these complex problems requires a lot of However, the acquisition of accurate and reliable large-scale labeled datasets is often very expensive and time-consuming; in recent years, crowdsourcing has gradually become the main solution for obtaining large-scale labeled datasets, which distributes data samples to a large number of Non-professional annotators carry out annotations. However, the capabilities and preferences of each annotator are different, and there will be errors in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06F17/16
CPCG06F17/16G06F18/214G06F18/24
Inventor 王玉峰王学刚
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products