Noise label correction method

A label and noise technology, applied in the field of data mining, can solve problems such as non-convex functions

Active Publication Date: 2019-10-22
NANJING UNIV OF SCI & TECH
View PDF2 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Natarajan proposed two methods to modify the loss function. The first method constructs an unbiased estimator of the correct distribution from the noise distribution, but the estimator may still be a non-convex function even if the original loss function is a convex function.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Noise label correction method
  • Noise label correction method
  • Noise label correction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] combine figure 1 , a method that uses a base classifier to classify observed samples and estimate the noise rate to identify noisy label data, the process is as follows:

[0050] Step 1, combine figure 2 Use the base classifier to classify the observation samples and estimate the noise rate, and identify the noise label data. The process is as follows:

[0051] Step 1.1, the base classifier clf predicts clf.fit(X,s) on the sample, and obtains the sample prediction probability g(x)=P(s=1|x). The base classifier can choose any existing classification algorithm, as long as the predicted probability of the sample can be obtained.

[0052] For the noise rate ρ 1 =P(s=0|y=1), which represents the probability that a sample with a real label of 1 is mislabeled as 0, that is, the proportion of samples whose observed label is 0 in the sample set with a correct label of 1. The following variables represent the size of the sample in each case: Indicates the sample whose obse...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a noise label relabeling method, which comprises the following steps of: 1, classifying observation samples by using a base classifier, estimating a noise rate, and identifyingnoise label data; and step 2, re-labeling the noise label sample by using a base classifier to obtain a corrected clean sample data set of the noise label sample.

Description

technical field [0001] The invention relates to a data mining technology, in particular to a noise label correction method. Background technique [0002] Traditional supervised learning classification problems usually assume that the labels of the dataset are complete, that is, there is a noise-free correct label for each dataset sample. However, in the real world, due to the randomness of the labeling process, sample labels are easily polluted by noise, resulting in inaccurate sample labels. The generation of noisy data is usually related to the acquisition method of the data set. For example, in the process of labeling the original data, the amount of sample data provided to the labeler is not enough to cause the labeler to misclassify the sample, or because the classification process itself is a subjective process or the professional knowledge of the labeler is not enough to ensure the correct classification sex. Various popular data labeling platforms are also one of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/2415G06F18/214
Inventor 徐建余孟池张静
Owner NANJING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products