Improved SMOTE method and system based on adaptive neighborhood size

A self-adaptive neighborhood, near-neighbor technology, used in instruments, character and pattern recognition, computer parts, etc., can solve problems such as increasing false positives

Pending Publication Date: 2020-12-11
BEIJING INSTITUTE OF PETROCHEMICAL TECHNOLOGY
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The same problem occurs when the distribution of the minority class is irregular and complex, figure 1 Middle 3 is a complex situation. Since the positive class forms a non-convex cluster, other methods such as SMOTE to synthesize positive class samples will generate synthetic samples in the negative class area, and these synthetic samples will increase false positives.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Improved SMOTE method and system based on adaptive neighborhood size
  • Improved SMOTE method and system based on adaptive neighborhood size
  • Improved SMOTE method and system based on adaptive neighborhood size

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0063] In order to make the purpose, technical solution and advantages of the present invention clearer, the technical solution of the present invention will be described in detail below. Apparently, the described embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other implementations obtained by persons of ordinary skill in the art without making creative efforts fall within the protection scope of the present invention.

[0064] The technical solutions of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments.

[0065] see figure 2 , an improved SMOTE method based on adaptive neighborhood size proposed by an embodiment of the present invention, including:

[0066] Step S1, input a training sample set of minority class samples;

[0067] Step S2, judging whether the serial number i of the current minority clas...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an improved SMOTE method and system based on adaptive neighborhood size, and the method comprises the steps: employing a different neighborhood value for each piece of minority of data, and automatically determining the neighborhood value through tracking a precision reduction point in a super-rectangular region formed by a minority samples and an adjacent sample point. The synthesized data can be placed in the rectangular region, random linear interpolation is not needed, and as a minority class in the formed super-rectangular area is dominant, the synthesized samplegenerated in the rectangular region is safer and more reasonable.

Description

technical field [0001] The invention relates to the technical field of machine learning and data mining, in particular to an improved SMOTE method and system based on adaptive neighborhood size. Background technique [0002] With the rapid development of information technology, the amount of data accumulated in various industries is growing explosively, and the problem of class imbalance has attracted widespread attention. The class imbalance problem exists in many application fields, such as biomedical diagnosis, multimedia data classification, insurance fraud detection, spam and web page identification, credit card and telecom fraud detection, etc. In the class imbalance problem, we pay more attention to the minority class Data recognition rate. [0003] At present, class imbalance learning methods can be mainly divided into three levels: data, features, and algorithms. The main method at the data level is the sampling method based on data distribution. In recent years, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62
CPCG06F18/24147G06F18/214
Inventor 徐文星王芳吴文通王瑶安欣舒马瑞
Owner BEIJING INSTITUTE OF PETROCHEMICAL TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products