Integrated transfer learning method for classification of unbalance samples

A transfer learning and unbalanced technology, applied in the field of machine learning, can solve problems such as classification accuracy decline, imbalance, and importance difference, and achieve the effect of improving efficiency and accuracy, and increasing contribution rate

Inactive Publication Date: 2012-06-27
BEIJING TECHNOLOGY AND BUSINESS UNIVERSITY
View PDF2 Cites 53 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in the real world, the distribution of samples representing two different classes can be extremely unbalanced, and there are also large differences in importance
[0008] In addition, there are often a large amo...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Integrated transfer learning method for classification of unbalance samples
  • Integrated transfer learning method for classification of unbalance samples
  • Integrated transfer learning method for classification of unbalance samples

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The integrated migration learning method (referred to as UBITLA) of unbalanced sample classification provided by the present invention, the steps are as follows (refer to figure 1 ):

[0047] 1. Input: The input data comes from two parts: migration auxiliary data set A and target data set O. Part of the data is extracted from these two parts of data and mixed in proportion to form a training data set C={(X 1 , Y 1 ), (X 2 , Y 2 ),…, (X N , Y N )}, where (X i , Y i ) is a training sample composed of sample feature attribute vector and sample category. i=1, 2, . . . , N. The first n samples in C are the data in A, and the remaining m samples in C are the data in O (n+m=N). The predetermined number of iterations is T. where X i ∈X, X is the input sample data, X i Is the characteristic attribute vector of the sample, the dimension is q, Y i ∈{0,+1} is the class label of the sample.

[0048] 2. Initialize sample weights:

[0049]

[0050] in, is the ini...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an integrated transfer learning method for classification of unbalance samples, which comprises the following steps of: in the initializing process, giving different weights to positive and negative samples to ensure that the negative samples which account a small ratio for the total samples and have a large amount of information have large initial weights; in the training process in each round, extracting part of samples according to a certain ratio and using the selected samples as a training subset to carry out training, and after finishing the training, selecting the classifier with the smallest error from a plurality of simple classifiers as a weak classifier and regulating the training dataset according to a redundant data dynamic eliminating algorithm; and obtaining a weak classifier sequence after T rounds of iteration and overlaying and combining a plurality of weak classifiers into a strong classifier. According to the invention, the classification law of novel data which is distributed similarly with old data is found by effectively utilizing the classification law of the old data; particularly, a novel method is provided for solving the problem of classification of the data which is classified in an unbalance mode; the effect of a small amount of the negative samples in the classification process in the classification training process is ensured; the contribution rate of the negative samples is effectively improved; and the classification efficiency and accuracy are improved.

Description

technical field [0001] The invention belongs to the field of machine learning. Aiming at auxiliary training data with a large amount of redundant data and unbalanced positive and negative samples, an improved integrated transfer learning algorithm is proposed, and the transfer of these auxiliary training data is used to help target data to be classified. Background technique [0002] Migration learning is a hot topic in the field of machine learning in recent years. It aims at the small amount of labeled data in new tasks, and proposes to effectively use outdated data migration to new tasks: There are differences, but there will certainly be some data that will help new classification problems. In order to be able to find these useful data, a small amount of new data that has been classified is used to mine valuable information in old data. Finally, a more efficient classification model is trained based on all the useful information in the two parts of the data to realize k...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N5/00
Inventor 于重重谭励田蕊刘宇吴子珺
Owner BEIJING TECHNOLOGY AND BUSINESS UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products