Unlock instant, AI-driven research and patent intelligence for your innovation.

Method for performing iterative modeling on unsaturated information

An iterative and confidence-based technology, applied in the field of iterative modeling of unsaturated information, can solve problems such as insufficient data, inapplicable machine learning methods, noise and deviation, and achieve high accuracy and efficiency

Inactive Publication Date: 2019-10-15
SICHUAN XW BANK CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] 1: Although data expansion solves the problem of insufficient data, it also introduces noise and bias
The distribution of the new sample is not completely consistent with the original sample, so the formed training sample is also different from the sample distribution of the target field used by the model for prediction. Inconsistent sample distribution will cause the model to be biased. During the prediction process, we get The prediction error will be larger
[0007] 2: Migration learning requires similarity between the main features of the target sample and the original training sample, which is currently mainly used in deep learning, but not applicable to general machine learning methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for performing iterative modeling on unsaturated information
  • Method for performing iterative modeling on unsaturated information
  • Method for performing iterative modeling on unsaturated information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0061] (1) Original unsaturated data sample:

[0062] Positive samples: 1187

[0063] Negative samples: 35060

[0064] Through the existing GBDT algorithm modeling method, the data samples with unsaturated information are trained to obtain the probability value P of the data model and data sample labels i ,As shown in table 2:

[0065] Table 2:

[0066] serial number P i

Classification 1 0.013 0 2 0.058 1 3 0.030 0 4 0.062 0 5 0.004 0 6 0.223 1 7 0.151 1 8 0.037 0 … … …

[0067] (2) Calculate the first confidence upper bound and the first confidence lower bound according to step B, as shown in Table 3:

[0068] table 3:

[0069]

[0070]

[0071] The first upper bound of confidence and the lower bound of first confidence corresponding to the maximum value of AUC are taken as the upper bound of final confidence and the lower bound of final confidence.

[0072] (3) According to the final confid...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method for performing iterative modeling on unsaturated information. The method comprises the following steps: A, training an unsaturated data sample to obtain a probabilityvalue of the data sample; B, setting a first confidence coefficient list, and layering the data samples according to the relationship between the probability values and confidence coefficients in thelist to obtain a final confidence coefficient upper bound and a final confidence coefficient lower bound; C, layering again to obtain a training data set; D, predicting the probability values of datasamples except for the training data set, layering the data samples except for the training data set according to the upper / lower bound of the final confidence, and combining a layering result with positive samples and negative samples in the training data set to form a new training data set; and E, iterating the steps B to D until the data samples except for the training data set cannot be layered again to obtain a finally formed new training data set. According to the invention, a universal model is realized, the unsaturated information applied in various occasions can be classified, and the accuracy and the efficiency are relatively high.

Description

technical field [0001] The invention relates to a modeling method according to the type of information samples, specifically a method for iteratively modeling unsaturated information. Background technique [0002] In the field of data mining, it usually takes a period of observation period to obtain the sample labels. Therefore, when the time window is insufficient and the data is small, the sample labels may be less than the actual situation or the confidence level is not enough. This situation will cause some samples to be difficult to distinguish in the actual modeling process (there is not enough confidence in the prediction process to prove whether it is a positive sample or a negative sample), so that the overall AUC (Area Under Curve) of the model, KS (in the model The evaluation indicators used to distinguish the degree of separation between positive and negative samples) and other indicators are low, and the model effect cannot reach the ideal value. The quality of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/2458G06K9/62
CPCG06F16/2465G06F18/2415
Inventor 王张琦韩晗刘嵩刘宇超
Owner SICHUAN XW BANK CO LTD