Sample equalization method, apparatus and device, and storage medium

An equalization method and sample technology, applied in the field of data processing, can solve problems such as the amplification of the influence of noise points, and achieve the effect of reducing noise and improving the training effect.

Pending Publication Date: 2022-05-17
AGRICULTURAL BANK OF CHINA
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0028] Although the ADASYN method considers the distribution of the minority class and the majority class, and generates more samples at the classification boundary, when the neighbors of a minority class sample are all majority classes, the ADASYN algorithm will consider the weight of the sample to be the highest. , in this processing method, the influence of noise points will also be amplified

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sample equalization method, apparatus and device, and storage medium
  • Sample equalization method, apparatus and device, and storage medium
  • Sample equalization method, apparatus and device, and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0050] figure 1 It is a flow chart of a sample equalization method provided by Embodiment 1 of the present invention. This embodiment is applicable to the situation where the training samples of the classification model are oversampled. The method can be executed by a sample equalization device, which can be implemented by software and / or hardware implementation.

[0051] Such as figure 1 As shown, the method specifically includes the following steps:

[0052] Step 110: Divide the obtained initial sample set to obtain a majority class sample set and a minority class sample set.

[0053] In practical applications, a certain amount of sample data is generally used to train a certain classification model. In this embodiment, the sample set used for training the classification model without sample equalization may be called an initial sample set.

[0054] Among the classification models used in the financial field, especially the classification models used for risk control, r...

Embodiment 2

[0094] The sample equalization device provided in the embodiment of the present invention can execute the sample equalization method provided in any embodiment of the present invention, and has corresponding functional modules and beneficial effects for executing the method. figure 2 is a structural block diagram of a sample equalization device provided in Embodiment 2 of the present invention, as shown in figure 2 As shown, the device includes: an initial sample division module 210 , a generation total number determination module 220 , a recognition difficulty determination module 230 , a generation fraction determination module 240 and a sample equalization realization module 250 .

[0095] The initial sample division module 210 is configured to divide the acquired initial sample set to obtain a majority class sample set and a minority class sample set.

[0096] The total number of generated determination module 220 is configured to determine the total number of generated ...

Embodiment 3

[0121] image 3 A structural block diagram of a computer device provided in Embodiment 3 of the present invention, such as image 3 As shown, the computer device includes a processor 310, a memory 320, an input device 330 and an output device 340; the number of processors 310 in the computer device can be one or more, image 3 Take a processor 310 as an example; the processor 310, memory 320, input device 330 and output device 340 in the computer equipment can be connected by bus or other methods, image 3 Take connection via bus as an example.

[0122] The memory 320, as a computer-readable storage medium, can be used to store software programs, computer-executable programs and modules, such as program instructions / modules corresponding to the sample equalization method in the embodiment of the present invention (for example, the initial sample in the sample equalization device division module 210, generation total quantity determination module 220, identification difficult...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a sample equalization method and device, equipment and a storage medium. The method comprises the following steps: dividing an obtained initial sample set to obtain a majority class sample set and a minority class sample set; determining the total number of generated samples according to the number of samples contained in the majority class sample set and the number of samples contained in the minority class sample set; based on the spatial distribution condition of the minority class samples in the minority class sample set, determining the identification difficulty of each minority class sample; according to each identification difficulty, in combination with the total sample generation number, determining a sample generation sub-number corresponding to each minority class sample; and a linear interpolation method is adopted to generate a corresponding sample for each minority class sample to generate a number of new sample data so as to realize sample equalization. According to the method, noise is effectively reduced during sample oversampling, more new samples are generated at the fuzzy position of the sample classification boundary, the training effect of the classification model is improved, and the classification model pays more attention to learning at the classification boundary.

Description

technical field [0001] Embodiments of the present invention relate to the technical field of data processing, and in particular, to a sample equalization method, device, device, and storage medium. Background technique [0002] In machine learning classification problems, sample imbalance often affects the classification performance of classification models. In the classification problems in the fields of finance, medicine, intrusion detection, etc., there are very few abnormal data, and the classification model often cannot obtain a good ability to distinguish minority samples through the original data. [0003] Taking the financial field as an example, sample imbalance problems widely exist in data mining practice. For example, most of the obtained historical data on risk control and intrusion detection are normal values, and there are very few abnormal samples. However, these abnormal samples are relatively Normal samples contain more information, so correctly identifyin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06V10/774G06V10/764G06K9/62
CPCG06F18/24G06F18/214
Inventor 刘毅然
Owner AGRICULTURAL BANK OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products