Virtual sample generation method based on kernel density estimation and Copula function

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of kernel density estimation and virtual samples, which is applied in the field of machine learning, can solve problems such as classifier performance degradation and unbalanced classification, and achieve the effects of improving classification effects, generalization capabilities, and improving the distribution of different types of data

Inactive Publication Date: 2019-08-09

BEIJING UNIV OF CHEM TECH

View PDF0 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

When traditional machine learning methods are used to solve these unbalanced classification problems, the performance of the classifier often drops sharply, and the resulting classifier has a large bias

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0054] The core requirement of unbalanced data classification is how to improve the classification effect of minority samples. At present, the methods to solve the problem of unbalanced data classification can be roughly divided into data-level methods, algorithm-level methods, and feature selection methods. The processing method at the algorithm level is mainly based on the cost-sensitive learning algorithm, which improves the misclassification cost of a small number of data. Commonly used algorithms include the AdaCost algorithm and the cost-sensitive decision tree classifier. The improvement method based on the algorithm level only simply increases the classification cost of the small-class data, and does not fundamentally improve the classification effect of the minority-class samples. The improvement of the data level has gradually become a mainstream processing method because it can greatly improve the effect of many classifiers on unbalanced data without modifying the cl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a virtual sample generation method based on kernel density estimation and a Copula function. The method includes: obtaining an original sample set and an original training set,constructing an initial classification model according to the original sample set and the training set, obtaining a probability density estimation function of the original sample set according to a kernel density estimation method and positive samples in the original sample set, obtaining Copula model parameters according to a maximum likelihood estimation method; and constructing a joint densityfunction of the positive samples according to the Copula model parameters, obtaining a virtual sample set through re-sampling by using the joint density function, and determining the generation number of the virtual sample set according to the difference between the data volume of the negative samples and the data volume of the positive samples in the original sample set. According to the technical scheme provided by the invention, different types of data distribution conditions of the original data set can be effectively improved, and the classification effect of various classifiers under the unbalanced sample condition can be improved, so that the generalization capability of the classifiers is improved.

Description

Technical field [0001] The present invention relates to the technical field of machine learning, in particular to a virtual sample generation method based on kernel density estimation and Copula function. Background technique [0002] Pattern classification is one of the most basic intelligences that humans are born with. Since the day when people try to show intelligence on computers, pattern classification has naturally become the main research issue. In recent years, with the continuous development of the computer field, more and more excellent classification algorithms have emerged, such as Decision Tree (DT), Support Vector Machine (SVM), and K proximity algorithm (k -Nearest Neighbor, knn) etc. These classification algorithms have greatly improved the level of computer pattern classification, and can approach or even reach the recognition level of humans in many fields. However, classification algorithms often have high requirements for training samples, usually requiring...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/62

CPCG06F18/24G06F18/214

Inventor 朱群雄王世雄徐圆贺彦林

Owner BEIJING UNIV OF CHEM TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Virtual sample generation method based on kernel density estimation and Copula function

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology