Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Label noise detection method based on multi-granularity relative density

A relative density and noise detection technology, applied in the direction of instruments, character and pattern recognition, computer components, etc., can solve the problems of not making full use of the noise contrast characteristics of labels, poor information quality, and not much available information, etc., to reduce The risk of overfitting, the effect of reducing time overhead and good generalization ability

Inactive Publication Date: 2020-05-19
CHONGQING UNIV OF POSTS & TELECOMM
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Generally speaking, label noise may be more harmful than attribute noise. First, a sample may have multiple features, but only one label exists
Second, while each feature has its unique importance, labels always have a greater impact on learning
In the medical field, test results are often unknown and incomplete, and sometimes the information described in medical language may be too limited, and there is not much information available. This incomplete information may also lead to label noise
In addition, in some cases, the quality of the information is poor or the accuracy of the information is uncertain, for example, the patient's answers during the illness may be inaccurate or incorrect, and sometimes the patient is asked with repeated questions, and the patient feedback The answers to may also be different, which can easily lead to label noise
Second, human labeling itself may be wrong
Furthermore, since collecting reliable labels is a time-consuming and expensive task, it is common to obtain labels from experienced professionals, but labels based on the experience of experts are less reliable
Third, when the labeling task is an individual subjective behavior, for example, in the application of medical image data analysis, some experts may make some changes to the label according to the actual situation, which may also cause the appearance of label noise
Fourth, label noise can also simply come from data encoding or communication issues
However, these anomaly detection methods are unsupervised classification methods that cannot take full advantage of the contrastive properties of different categories of label noise, and their ability to deal with complex data such as high-dimensional data and imbalanced data is poor.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Label noise detection method based on multi-granularity relative density
  • Label noise detection method based on multi-granularity relative density
  • Label noise detection method based on multi-granularity relative density

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The present invention is a label noise detection method based on multi-granularity relative density. How to deal with label noise has become a research hotspot in the field of machine learning, covering many practical application areas. The current mainstream research methods are divided into noise accommodation and noise filtering. When the training data is polluted by label noise, an obvious solution is to clean the training data yourself, similar to outlier or anomaly detection. This scheme belongs to noise filtering.

[0030] S1 divides the data set into K clusters, and calculates the relative density of improvement in granularity for each sample. The improved relative density is defined as first calculating the centroids of the positive and negative samples respectively, and then calculating the distances from each sample to the centroid of the same kind and the centroid of the different kind, and using the ratio of the distances as the improved relative density a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a label noise detection method based on multi-granularity relative density, and belongs to the field of data classification. The method comprises the following steps: S1, dividing a data set into K clusters by utilizing a KMeans algorithm according to a label noise detection method based on multi-granularity relative density, and calculating the improved relative density ofeach sample in granularity; wherein the improved relative density is defined as follows: firstly, respectively calculating mass centers of the positive sample and the negative sample, then solving distances from the samples to the mass centers of the same kind and the mass centers of the different kinds, and taking a ratio of the distances as the improved relative density under the granularity; s2, changing the K value, repeating the process in the step S1, and calculating the improved relative density of each sample under different granularities; and S3, taking a sample of which the improvedrelative density exceeds a certain threshold value as label noise. According to the method, particle size calculation is introduced into the improved relative density model, and the method has higherefficiency than a traditional method.

Description

technical field [0001] The invention relates to a label noise detection method based on multi-granularity relative density, which belongs to the field of data classification. Background technique [0002] Real-world data is always flawed, and the appearance of noisy data is the result of this flaw. Noise processing is an important task in machine learning. In classification problems, noise is mainly divided into two categories: attribute noise and label noise. Attribute noise is caused by errors in the process of inputting attributes, while label noise is caused by label pollution. In general, label noise may be more harmful than attribute noise. First, a sample may have multiple features, yet only one label exists. Second, while each feature has its unique importance, labels always have a greater impact on learning. The performance of the classifier is degraded by the presence of label noise, and the complexity of the model is also increased. In addition, there is also...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23213G06F18/24133
Inventor 夏书银梁潇刘群王炳贵陈百云
Owner CHONGQING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products