Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Oversampling method and device based on SMOTE algorithm and electronic equipment

An oversampling and algorithmic technology, applied in computing, computer components, character and pattern recognition, etc., can solve problems such as reduced prediction accuracy, impact on analysis results, blurred sample boundaries, etc., to solve data imbalance, optimize sampling methods, The effect of improving the distribution

Pending Publication Date: 2020-12-04
北京淇瑀信息科技有限公司
View PDF3 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Among them, the SMOTE algorithm is used to calculate the distance between samples and synthesize new samples to amplify the data, but the sampling points are linear sampling, and the distribution of sampling is not wide enough.
Since the existing method only samples all samples indiscriminately based on the distance between samples, it does not take into account the data characteristics between samples of the same type, which will lead to blurred or even overlapping sample boundaries after sampling, resulting in reduced prediction accuracy and affecting analysis. result
Therefore, there is still much room for improvement for oversampling methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Oversampling method and device based on SMOTE algorithm and electronic equipment
  • Oversampling method and device based on SMOTE algorithm and electronic equipment
  • Oversampling method and device based on SMOTE algorithm and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] Below, will refer to Figure 1 to Figure 3 An embodiment of the oversampling method based on the SMOTE algorithm of the present invention is described.

[0045] figure 1 It is a flowchart of the oversampling method based on the SMOTE algorithm of the present invention. Such as figure 1 As shown, an oversampling method includes the following steps.

[0046] Step S101, acquiring historical sample data sets, determining positive and negative samples and their corresponding quantities.

[0047] Step S102, determining the sample data of the majority class and the sample data of the minority class, and performing data vectorization processing.

[0048] Step S103, using the outlier point monitoring method to screen target sample data from the minority class sample data set.

[0049] Step S104, based on the SMOTE algorithm, oversampling the target sample data to generate a specific amount of new sample data.

[0050] Step S105 , according to the generated new sample data ...

Embodiment 2

[0092] An apparatus embodiment of the present invention is described below, and the apparatus can be used to execute the method embodiment of the present invention. The details described in the device embodiments of the present invention should be regarded as supplements to the above method embodiments; details not disclosed in the device embodiments of the present invention can be implemented by referring to the above method embodiments.

[0093] refer to Figure 4 , Figure 5 and Figure 6 , the present invention also provides a SMOTE algorithm-based oversampling device 400 for financial risk assessment or prediction, including: a data acquisition module 401 for acquiring historical sample data sets, determining positive and negative samples and their corresponding quantities; The determining module 402 is used to determine the majority class sample data and the minority class sample data, and perform data vectorization processing; the screening module 403 is used to use t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an oversampling method and device based on an SMOTE algorithm and electronic equipment. The method comprises the following steps: acquiring a historical sample data set, and determining positive and negative samples and corresponding numbers thereof; determining majority class sample data and minority class sample data, and performing data vectorization processing; screeningtarget sample data from the minority class sample data set by using a departure point monitoring method; performing oversampling on the target sample data based on an SMOTE algorithm to generate a specific number of new sample data; and obtaining an amplified minority class sample data set according to the generated new sample data and the original minority class sample data. According to the method, while the sampling method is optimized, the problem of data imbalance is solved, the accuracy of model prediction is improved, and the deviation caused by data imbalance is effectively reduced.

Description

technical field [0001] The present invention relates to the field of computer information processing, in particular to an oversampling method, device and electronic equipment based on the SMOTE algorithm. Background technique [0002] Class imbalance is a typical problem in classification tasks, which is mainly manifested by a large gap in the number of samples between two classes. In reality, there are many unbalanced categories, such as the identification of financial fraud and insurance fraud, the identification of cancer in medicine, and so on. The main difficulty in classifying unbalanced data is that traditional machine learning methods are based on the balance of categories in the training set, and are less sensitive to the distribution of data deviations, resulting in biased prediction results for multi-class data. However, from the perspective of data mining, minority groups often carry more important and useful information, therefore, mining and predicting these f...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06Q40/02
CPCG06Q40/02G06F18/241G06F18/214
Inventor 刘国旗
Owner 北京淇瑀信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products