A continuous feature automatic binning algorithm based on similarity combination

A continuous and similar technology, applied in the field of continuous feature automatic binning algorithm, can solve the problems of monotonous default ratio, lack of objective basis, and increased default ratio, so as to reduce time-consuming and information loss, and weaken subjective influence , the effect of improving expressive ability

Inactive Publication Date: 2019-06-04
杭州排列科技有限公司
View PDF0 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] For continuous variables, most institutions will directly adopt the binning method of equal frequency or equidistant automatic splitting, and automatically or manually combine bins under the premise that the number of bins does not exceed a certain threshold, so that the default ratio of each bin is monotonous. Then calculate the IV, and use the feature set whose IV value reaches a certain size as the variable set entering the regression model. Finally, the analyst combines the actual business to judge whether the default ratio trend of each box of the variable is consistent with the business logic from a subjective point of view. Manual The binning method adjusts the bins and rebuilds the model accordingly. On the one hand, equal-frequency or equidistant binning is an unsupervised binning method, and binning is performed on the basis of equal-frequency or equidistant rough binning The operation will cause a large IV loss; on the other hand, in the process of closing the box, simply aiming at the monotonous proportion of defaults, and making judgments, closing boxes, and then judging in a cyclical manner, this method will have the following problems:
[0005] 1. The cycle will cause the box to take a long time;
[0006] 2. Due to the strict and monotonous limit on the proportion of defaults, the final result of combining boxes in this way is often 2 boxes;
[0007] 3. The IV obtained in this way is not the best IV, which will cause relatively more IV losses;
[0008] 4. The monotonic trend of the default ratio obtained by different parameter settings may produce inconsistencies. For the same variable, the default ratio may increase with the increase of the partition interval value and the default ratio may decrease with the increase of the partition interval. two situations;
[0009] 5. Analysts will explain the monotonous trend of the default ratio from a subjective point of view. Different monotonous trends have different explanations. This explanation lacks objective basis and is not persuasive;
[0010] 6. Variables with dissenting monotonic trends enter the scorecard model, which will lead to poor stability of the model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A continuous feature automatic binning algorithm based on similarity combination
  • A continuous feature automatic binning algorithm based on similarity combination
  • A continuous feature automatic binning algorithm based on similarity combination

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] The present invention will be further described below in conjunction with the examples.

[0032] The following examples are used to illustrate the present invention, but cannot be used to limit the protection scope of the present invention. The conditions in the embodiment can be further adjusted according to the specific conditions, and under the premise of the concept of the present invention, simple improvements to the method of the present invention all belong to the scope of protection of the present invention.

[0033] see figure 1 , an automatic binning algorithm for continuous features based on similarity merging, including the following steps:

[0034] S1. Initialize the binning of the original continuous variables of the modeling training set using a decision tree to obtain the initial segmentation sequence point cutlist_0. The decision tree binning is essentially a binary classification. Taking CART as an example, calculate the adjacent The median of the el...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a continuous feature automatic binning algorithm based on similarity combination. The algorithm comprises modeling data, decision tree initial binning, 100 equal frequency binning, linear trend judgment, trend + ChiMerge binning, IV, correlation and the like, and a final scoring card model is obtained. According to the algorithm, the whole process is realized by using python; the time consumption and the information loss in the process of obtaining the segmentation point conforming to monotonicity are greatly reduced; meanwhile, the intervention of analysts on the sub-boxes is reduced; The stability of the model is well tested, the subjective influence of an analyst on trend judgment is weakened through the algorithm that box merging is conducted according to the trend of the data, the default monotonicity presented by a box separation result is better supported by modeling data, the persuasion of box separation is enhanced, and the expression capacity of variables is improved.

Description

technical field [0001] The invention belongs to the technical field of personal credit risk assessment in consumer finance scenarios, and specifically relates to an automatic binning algorithm for continuous features based on similarity merging. Background technique [0002] The credit score card is a credit evaluation system that quantifies the repayment ability and repayment willingness of the borrower based on the relevant information of the borrower, such as identity status, occupational characteristics, income and expenditure status and other characteristics. On the one hand, for applicants, the level of credit score means the quality of the credit service they enjoy; The important basis for pricing is closely related to the income of credit financial institutions. Therefore, the credit score card has become an important means for financial institutions to effectively and quickly identify default customers, improve credit income and reduce risk losses. [0003] Differ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06Q40/02G06Q20/40
Inventor 段兆阳王华瑞孙博
Owner 杭州排列科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products