Optimal binning data processing method and system based on NSGA-II genetic algorithm

A genetic algorithm and binning technology, applied in the fields of genetic laws, manufacturing computing systems, and computing, can solve problems such as the inability to meet the WoE monotonicity, the inability to set binning constraints, and the difficulty in obtaining the best segmentation results, so as to ensure the WoE Monotonicity, reducing binning time consumption, and good binning effect

Active Publication Date: 2022-03-15
百融云创科技股份有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The embodiment of the present application provides an optimal binning data processing method and system based on the NSGA-II genetic algorithm, which is used to solve the more or less certain shortcomings of the common binning algorithms in the prior art, such as the inability to set the binning algorithm. Bin constraints, such as the sample size of each bin, the upper and lower bounds of the number of bins, etc., cannot satisfy the monotonicity of WoE after binning, and the binning effect is not good or the binning efficiency is low, and it is difficult to obtain the best segmentation results. question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Optimal binning data processing method and system based on NSGA-II genetic algorithm
  • Optimal binning data processing method and system based on NSGA-II genetic algorithm
  • Optimal binning data processing method and system based on NSGA-II genetic algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031] Such as figure 1 As shown, the embodiment of the present application provides a method for processing optimal binned data based on the NSGA-II genetic algorithm, the method comprising:

[0032] S100: Preprocessing the data samples to obtain a first data sample;

[0033] Specifically, the above-mentioned data samples are data that need to be binned, and the data samples may be any continuous data. In actual business requirements, the binning algorithm needs to bin a series of target data according to the preset segmentation standard according to the business needs, and then judge the impact parameters of each bin data on the business.

[0034] An example without limitation is given below to illustrate the application of the binning algorithm in actual business, but it is not a limitation of this application.

[0035] In order to study the influence of the average radius of breast glands on the prevalence of breast cancer, a medical research group has measured a series ...

Embodiment 2

[0150] Based on the same inventive concept as an optimal binning data processing method based on NSGA-II genetic algorithm in the foregoing embodiments, such as Image 6 As shown, the embodiment of the present application provides an optimal binning data processing system based on the NSGA-II genetic algorithm, wherein the system includes:

[0151] A first obtaining unit 11, the first obtaining unit 11 is configured to preprocess the data samples to obtain the first data samples;

[0152] A first processing unit 12, configured to pre-bin the first data sample according to a pre-bin rule to obtain n pre-bins;

[0153] The second processing unit 13, the second processing unit 13 is used to define a decision variable matrix according to the n pre-binning , Contains a lower triangular matrix of size n, where ;

[0154] A second obtaining unit 14, the second obtaining unit 14 is used to obtain the decision variable matrix IV value;

[0155] The third processing unit 15, t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an optimal binning data processing method and system based on an NSGA-II genetic algorithm, and the method comprises the steps: carrying out the preprocessing of a data sample, and obtaining a first data sample; performing pre-binning on the first data sample according to a pre-binning rule to obtain n pre-binning boxes; according to the n pre-separation boxes, defining a decision variable matrix which comprises a lower triangular matrix with the size of n, obtaining an IV value of the decision variable matrix; defining a target function vector according to the decision variable matrix and the IV value; setting constraint conditions of box separation; performing multi-objective optimization solution on the first data sample after pre-binning by using an NSGA-II genetic algorithm according to a binning constraint condition and an objective function vector to obtain a plurality of optimal solutions; obtaining an optimal segmentation point according to the plurality of optimal solutions; and according to the optimal segmentation point, carrying out binning on the first data sample after pre-binning.

Description

technical field [0001] The invention relates to the technical field of data binning algorithms, in particular to an optimal binning data processing method and system based on NSGA-II genetic algorithm. Background technique [0002] The binning algorithm is a kind of feature engineering. It mainly divides the data into different boxes according to different rules. It can be understood as a modeling method that transforms continuous data into discrete data. Binning can reduce the impact of noise in the data and improve the robustness of the model. For example, in a financial scoring system, binning data can avoid the impact of extreme values ​​on modeling; the binning algorithm discretizes continuous variables and facilitates feature derivation , you can directly use the feature as an inner product to increase the feature dimension. [0003] There are many methods of binning algorithms, relatively simple methods include equidistant binning and equal frequency binning, and k-m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06N3/12
CPCG06N3/126G06F18/2111G06F18/2113Y02P90/30
Inventor 刘凯张韶峰冯鑫
Owner 百融云创科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products