Check patentability & draft patents in minutes with Patsnap Eureka AI!

Hybrid sampling method based on boundary samples

A mixed sampling and sample technology, applied in the direction of instruments, character and pattern recognition, computer components, etc., can solve the problems of poor classification effect, rarely consider the role of boundary samples, etc., and achieve the effect of improving the classification effect

Pending Publication Date: 2021-09-28
HARBIN UNIV OF SCI & TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in the existing mixed sampling, the role of boundary samples is rarely considered, resulting in poor classification effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hybrid sampling method based on boundary samples
  • Hybrid sampling method based on boundary samples
  • Hybrid sampling method based on boundary samples

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0029] combine figure 1 , the present invention is based on the mixed sampling method of boundary samples, comprising the following steps:

[0030] Step 1: Use the boundary point detection algorithm based on the coefficient of variation to divide the majority class samples into majority class boundary samples and majority class internal samples.

[0031] Step 1.1 Calculate the k-distance of each majority class sample point p as k_dist(p), and obtain its corresponding local density according to the following formula (1)

[0032]

[0033] where N k-dist(p) is the number of majority class sample points within the k distance of the majority class sample point p;

[0034] Step 1.2 Calculate the coefficient of variation according to the following formula (2) according to the obtained local density and the number of sample points within the k di...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a hybrid sampling method based on boundary samples, which comprises the following steps: firstly, dividing majority class samples into boundary samples and internal samples through a boundary point detection algorithm based on a variable coefficient, and reversely determining minority class boundary samples according to the majority class boundary samples; and secondly, performing k-means clustering on the majority class samples and the minority class samples, determining the oversampling weight of each cluster for the minority class samples according to the in-cluster boundary sample density, and selecting a sample point closest to the clustering center for the majority class samples to carry out undersampling, so that the data reaches a balanced state. According to the method, the class imbalance problem is solved to a certain extent, and meanwhile useful information can be prevented from being lost in the sampling process.

Description

technical field [0001] The invention relates to the field of data mining, in particular to a boundary-based mixed sampling method. Background technique [0002] Imbalanced data refers to a data set with an unbalanced distribution of classes. Generally, a class with a small number of samples is called a minority class, and a class with a large number of samples is called a majority class. In many practical application problems, such as network intrusion detection, credit card transaction detection, medical diagnosis and fault detection, etc., the key to solving such problems is to efficiently and accurately identify minority samples. However, the high accuracy rate of many traditional algorithms can be found through analysis only for the recognition of majority class samples, and does not perform well in the recognition rate of minority class samples. [0003] In the solution to unbalanced data at the data level, due to certain defects in both oversampling and undersampling,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62
CPCG06F18/23213
Inventor 张爽何云斌
Owner HARBIN UNIV OF SCI & TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More