A label query and change method based on active learning

An active learning and labeling technology, applied in the field of active learning, can solve problems such as affecting the learning effect, reducing the quality of training set samples, and being unable to control the labeling process

Inactive Publication Date: 2019-03-29
CHONGQING UNIV OF POSTS & TELECOMM
View PDF0 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

With the continuous development of crowdsourcing technology, the cost of obtaining labels is gradually reduced. However, due to the inability to effectively control the labeling p...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A label query and change method based on active learning
  • A label query and change method based on active learning
  • A label query and change method based on active learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The technical solutions in the embodiments of the present invention will be described clearly and in detail below with reference to the drawings in the embodiments of the present invention. The described embodiments are only some of the embodiments of the invention.

[0041] The technical scheme that the present invention solves the problems of the technologies described above is:

[0042] refer to figure 1 , this figure is a flow chart of an embodiment of the active learning-based label noise query and modification system of the present invention, and its main implementation process is:

[0043] Step S0, data collection and preprocessing

[0044] In step S1, using the difference in classification performance of different classifiers, we train multiple classifiers using R rounds of K-fold cross-validation.

[0045] Step S2, calculate the confidence matrix. In order to separate the difficult samples and noise samples from the normal data, we use the confidence matrix ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a label query and change method based on active learning, belonging to the active learning field. The method comprises the following steps: firstly, training a plurality of classification models by using a 10-fold cross-validation method and a strong-weak classification algorithm; Then calculating the residual matrix and confidence matrix; Secondly, splicing the two new matrices into one matrix horizontally as the eigenmatrix. using the K-Means algorithm to cluster the generated feature matrices, and obtaining four types of clusters A, B, C and D according to the threshold; Finally, relabeling some samples by active learning method for the suspected noise clusters (type A and type C), and obtaining the final noise clusters according to the labeling results. The sample labels of noise clusters are modified to correct labels, and the modified A, C samples and B, D samples are merged into the final training sample set. The invention has excellent effect of screening out label noise, can reduce the cost of manual labeling, and can obtain higher classification accuracy rate with very small active learning cost.

Description

technical field [0001] The invention belongs to the field of active learning and relates to a label noise query system based on active learning. Background technique [0002] In traditional supervised learning, the learning algorithm trains the model based on the labeled samples as the training set. Generally speaking, the more samples in the training set and the higher the quality, the stronger the classification ability and generalization ability of the obtained model. However, in real tasks, a large number of samples are unlabeled, and it is relatively difficult to obtain labeled samples. It usually requires domain experts to manually label, so it requires high labor costs. The emergence of the crowdsourcing model has solved the problem of data labeling to a certain extent. Witkey platforms such as Amazon Mechanical Turk allow task publishers to publish various tasks (such as data annotation, filling out questionnaires, etc.) on the platform, and attract users from all o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62
CPCG06F18/23213G06F18/217G06F18/214
Inventor 袁龙李智星于洪
Owner CHONGQING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products