Multi-label classifier constructing method based on cost-sensitive active learning

An active learning, cost-sensitive technology, applied in the field of multi-label classification, it can solve the problems of increasing the number of iterations, reducing the learning efficiency, and high cost of labeling samples, and achieving the effect of improving efficiency, reducing cost, and improving robustness.

Active Publication Date: 2014-11-26
SUZHOU UNIV
View PDF2 Cites 33 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, since each unlabeled sample may involve multiple labels, the above method leads to a high cost of labeled samples. At the same time, the inventors found that due to the difference in the degree of influence of different labels of a sample on the performance of the classifier, so Using the above method to add samples is often difficult to effectively improve the performance of the classifier, resulting in an increase in the number of iterations and reducing the efficiency of learning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-label classifier constructing method based on cost-sensitive active learning
  • Multi-label classifier constructing method based on cost-sensitive active learning
  • Multi-label classifier constructing method based on cost-sensitive active learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0037] Embodiment 1: A method for constructing a multi-label classifier based on cost-sensitive active learning, including the following:

[0038] This example uses the Diagnosis data set, which has 3 labels: Cold, LungCancer, and Cough, and 258 samples. This example uses 30 samples, and each sample has 3 labels, that is, 90 sample-label pairs as marked The sample set L, the remaining 158 samples as the unlabeled set U, and 70 samples as the test set. The number of sample-label pairs selected each time is 3.

[0039] The misclassification cost of each label is set according to prior knowledge, as shown in the following table:

[0040] Cold Lung Cancer Cough C 11 0 0 0 C 10 5 50 7 C 01 1 1 1 C 00 0 0 0

[0041] In this embodiment, BRkNN is used as the basic classifier, and the initial classifier model is trained on the labeled set L , as the current classifier.

[0042] (1) Use the current classifier model to predict and...

Embodiment 2

[0056] Embodiment two: see figure 1 and image 3 As shown, a multi-label classifier construction method based on cost-sensitive active learning includes the following:

[0057] This embodiment adopts the flags data set, which has 7 labels and 194 samples in total, of which 135 samples are used to establish a pool and 59 samples are used for testing. 210 sample-label pairs are randomly selected to train the initial classifier, and 35 labels are selected for each iteration.

[0058] In this embodiment, BRkNN is used as the initial classifier algorithm to construct the initial classifier; the initial classifier is trained using the sample pool to obtain the current classifier;

[0059] Use the current classifier to classify the test samples, obtain the predicted label value, calculate the expected misclassification cost of the sample-label pair, select 35 highest-risk sample-label pairs to mark, add to the training set, retrain the classifier, and obtain an update After the c...

Embodiment 3

[0068] On the six data sets birds, enron, genbase, medical, CAL500 and bibtex shown in the table below, the method of the present invention is compared and verified.

[0069] The methods for comparison are:

[0070] LCam: the label-based cost-sensitive active learning method of the present invention;

[0071] ECam: a sample-based cost-sensitive active learning method;

[0072] ERnd: sample-based random selection active learning method;

[0073] LRnd: A Label-Based Random Selection Active Learning Approach.

[0074] Table 1 Dataset Properties

[0075] name field Number of samples Number of tags birds audio 322 19 enron text 1702 53 genbase biology 662 27 medical text 978 45 CAL500 music 502 174 bibtex text 7395 159

[0076] Table 2 is at the cost ratio of C 01 =1, C 10 When =2, the number of iterations required for cost-sensitive multi-label active learning methods based on samples and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-label classifier constructing method based on cost-sensitive active learning. Specific label information of each sample is directly used, an initial classifier is trained on a marked set, then, learning is performed on a non-marked set through a current classifier, a certain number of sample-label pairs of non-marked samples largest in misclassification cost are selected, real label values of the selected sample-label pairs are marked and added into the marked set, and a training sample set and the classifiers are updated. According to the method, target misclassification cost can be achieved with a small number of iteration times, and learning efficiency is greatly improved; sampling granularity is shrunk to the sample-label pairs, cost for marking samples is greatly reduced, and more remarkable effect is achieved in multi-label classification with a large label number.

Description

Technical field [0001] The present invention involves the construction method of a multi -label classifier, which specifically involves a multi -label classification method with a sensitive price. Background technique [0002] With the development of information technology, multi -label data is becoming more and more popular, and applications related to multi -label classification technology are continuously growing, such as the semantic marking of images and video, functional genome and music genre. [0003] When it comes to multi -label classification, people are mainly concerned about obtaining the highest accuracy.However, sometimes the highest accuracy does not mean the best result when predicting.For example: a medical diagnosis system diagnose the patient according to the pathological characteristics of the patient, and diagnose a patient who does not have cancer as a cancer.It is necessary to spend money to return, and the latter may delay the patient's treatment timing a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F18/241
Inventor 吴健赵世泉赵朋朋刘纯平崔志明
Owner SUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products