Re-sampling and cost-sensitive learning integrated unbalanced data integration and classification method

A cost-sensitive, data-integrated technology, applied in instruments, character and pattern recognition, computer components, etc., can solve the problems of not paying attention to different test sample information, loss, high sensitivity to outliers and noise points

Inactive Publication Date: 2018-01-05
SOUTH CHINA UNIV OF TECH
View PDF0 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The current traditional unbalanced learning has limitations: First, a large number of studies and experiments have proved that the method based on sparse sampling in resampling technology can improve the classification performance better than the method of oversampling, but the method of sparse sampling will lose part of the original data information, which is not all redundant information
Second, the effect of cost-sensitive learning is usu

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Re-sampling and cost-sensitive learning integrated unbalanced data integration and classification method
  • Re-sampling and cost-sensitive learning integrated unbalanced data integration and classification method
  • Re-sampling and cost-sensitive learning integrated unbalanced data integration and classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0079] This embodiment provides an unbalanced data integration classification method that combines resampling technology and cost-sensitive learning. The flow chart is as follows figure 1 shown, including the following steps:

[0080] Step 1. Input training data set

[0081] Input an unbalanced data set X to be classified. The row vector corresponds to the sample dimension, and the column vector corresponds to the attribute dimension. X is randomly divided into 66% training set and 34% test set.

[0082] Step 2. Calculate the relative density of the spatial distribution of training samples

[0083] Define the class with a large sample size as the negative class, and the set of data points in the training set is T n ={x 1 ,x 2 ,...,x l}, the class with a small sample size is a positive class, and the set of data points in the training set is T p ={x l+1 ,x l+2 ,...,x n}, where l>>n-l+1;

[0084] from T n A particular data point x in i starting, calculate its differe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an unbalanced data integration classification method that combines resampling technology and cost-sensitive learning, relates to the field of artificial intelligence integrated learning, and mainly solves the problem of unbalanced data classification using complete data information in the prior art. The method The steps are: (1) input the training data set; (2) calculate the relative density of the sample space distribution; (3) resample to generate multiple subsets and train the basic classifier; (4) calculate the similarity matrix of the test sample; (5) ) using multi-objective optimization and integration to obtain prior results; (6) performing cost-sensitive learning prediction on the test set; (7) using KL divergence to optimize and fuse the results. The method designs a new sampling method to solve the problem of unbalanced data distribution; uses a method combining resampling technology and cost-sensitive learning to solve the problem of incomplete information; and makes full use of the data information of the test set itself to improve integration performance of the classifier.

Description

technical field [0001] The invention relates to the field of computer artificial intelligence, in particular to an integrated classification method combining unbalanced data resampling technology and cost-sensitive learning simultaneously. Background technique [0002] Most of the standard machine algorithms proposed so far are designed on the assumption of balanced data distribution or equal error cost, so they are not suitable for processing data with unbalanced class distribution. If the standard learning algorithm is directly applied to the unbalanced data, the classification rules summarized for the class with a small sample size are less and less reliable than those with a large sample size. [0003] The traditional unbalanced learning classification methods are mainly classified into two categories, using data-level resampling technology to correct the imbalance of training samples and the design of cost-sensitive functions at the algorithm level to correct the advers...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62
Inventor 余志文温馨
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products