Active learning classification method based on uncertainty and similarity measurement

A technology of uncertainty and similarity measurement, applied in the field of active learning classification, can solve the problems of not taking into account overlapping information, increasing labeling costs, information redundancy, etc., to reduce the cost of data labeling and reduce the sample size of information redundancy , Guarantee the effect of training effect

Pending Publication Date: 2021-11-02
SOUTHWEST PETROLEUM UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the existing active learning methods do not take into account the information overlap problem of the sample for the classifier, that is, the inf...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Active learning classification method based on uncertainty and similarity measurement
  • Active learning classification method based on uncertainty and similarity measurement
  • Active learning classification method based on uncertainty and similarity measurement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The present invention will be further described below in conjunction with the accompanying drawings and embodiments. It should be noted that, in the case of no conflict, the embodiments in the present application and the technical features in the embodiments can be combined with each other. It should be noted that, unless otherwise specified, all technical and scientific terms used in this application have the same meaning as commonly understood by those of ordinary skill in the art to which this application belongs. The disclosure of the present invention uses "comprises" or "comprises" and other similar words to mean that the elements or objects appearing before the words include the elements or objects listed after the words and their equivalents, without excluding other elements or objects.

[0039] Such as figure 1 As shown, the present invention provides a kind of active learning classification method based on uncertainty and similarity measure, comprises the fol...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an active learning classification method based on uncertainty and similarity measurement. The method comprises the following steps: S1, carrying out preprocessing and vectorization on unlabeled classification data; S2, clustering, selecting most representative samples in each class, carrying out manual labeling, recording the samples as a data set L, and recording the rest samples as a set U; S3, calculating a similarity metric value of each sample in the U; S4, enabling the L to be used for training a plurality of different machine learning models, and obtaining the accuracy rate and the output value of each model; S5, determining a weight value and an uncertainty degree of each model so as to determine an uncertainty decision value; S6, determining a diversified training sample with the maximum value, labeling the diversified training sample, updating the labeled diversified training sample to the data set L, and removing the labeled diversified training sample from the set U to obtain an updated set U; and S7, repeating the steps S3-S6 until the accuracy of each model does not change any more, and obtaining a final marked data set L. According to the method, the information redundant sample size can be reduced, and the data labeling cost is reduced on the basis of ensuring the training effect.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to an active learning classification method based on uncertainty and similarity measure. Background technique [0002] Analyzing data in a specific domain can help domain experts mine and discover useful domain knowledge. In the work of a large amount of data-driven modeling, the classification data annotation in the domain is often scarce, the annotation cost is expensive, and the annotator needs to have a strong domain knowledge reserve, which greatly limits the breadth of domain exploration. Aiming at these problems, active learning is currently considered to be a very effective solution. [0003] Active learning is a training data screening method for machine learning that automatically finds this diverse data. Compared with full manual operation, it only takes a fraction of the time to build a better data set and complete data-driven modeling work efficiently. Active learn...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06F40/194G06F40/279G06N20/00
CPCG06F16/35G06F40/194G06F40/279G06N20/00Y02D10/00
Inventor 刘智杨雅茹曾文丽张荣华杨根
Owner SOUTHWEST PETROLEUM UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products