Virtual sample generation method based on kernel density estimation and Copula function

A technology of kernel density estimation and virtual samples, which is applied in the field of machine learning, can solve problems such as classifier performance degradation and unbalanced classification, and achieve the effects of improving classification effects, generalization capabilities, and improving the distribution of different types of data

Inactive Publication Date: 2019-08-09
BEIJING UNIV OF CHEM TECH
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

When traditional machine learning methods are used to solve these unbalanced classification problems, the performance of the classifier often drops sharply, and the resulting classifier has a large bias

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Virtual sample generation method based on kernel density estimation and Copula function
  • Virtual sample generation method based on kernel density estimation and Copula function
  • Virtual sample generation method based on kernel density estimation and Copula function

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0054] The core requirement of unbalanced data classification is how to improve the classification effect of minority samples. At present, the methods to solve the problem of unbalanced data classification can be roughly divided into data-level methods, algorithm-level methods, and feature selection methods. The processing method at the algorithm level is mainly based on the cost-sensitive learning algorithm, which improves the misclassification cost of a small number of data. Commonly used algorithms include the AdaCost algorithm and the cost-sensitive decision tree classifier. The improvement method based on the algorithm level only simply increases the classification cost of the small-class data, and does not fundamentally improve the classification effect of the minority-class samples. The improvement of the data level has gradually become a mainstream processing method because it can greatly improve the effect of many classifiers on unbalanced data without modifying the cl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a virtual sample generation method based on kernel density estimation and a Copula function. The method includes: obtaining an original sample set and an original training set,constructing an initial classification model according to the original sample set and the training set, obtaining a probability density estimation function of the original sample set according to a kernel density estimation method and positive samples in the original sample set, obtaining Copula model parameters according to a maximum likelihood estimation method; and constructing a joint densityfunction of the positive samples according to the Copula model parameters, obtaining a virtual sample set through re-sampling by using the joint density function, and determining the generation number of the virtual sample set according to the difference between the data volume of the negative samples and the data volume of the positive samples in the original sample set. According to the technical scheme provided by the invention, different types of data distribution conditions of the original data set can be effectively improved, and the classification effect of various classifiers under the unbalanced sample condition can be improved, so that the generalization capability of the classifiers is improved.

Description

Technical field [0001] The present invention relates to the technical field of machine learning, in particular to a virtual sample generation method based on kernel density estimation and Copula function. Background technique [0002] Pattern classification is one of the most basic intelligences that humans are born with. Since the day when people try to show intelligence on computers, pattern classification has naturally become the main research issue. In recent years, with the continuous development of the computer field, more and more excellent classification algorithms have emerged, such as Decision Tree (DT), Support Vector Machine (SVM), and K proximity algorithm (k -Nearest Neighbor, knn) etc. These classification algorithms have greatly improved the level of computer pattern classification, and can approach or even reach the recognition level of humans in many fields. However, classification algorithms often have high requirements for training samples, usually requiring...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/24G06F18/214
Inventor 朱群雄王世雄徐圆贺彦林
Owner BEIJING UNIV OF CHEM TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products