Unlock instant, AI-driven research and patent intelligence for your innovation.

Missing value interpolation method based on multi-method ensemble learning

An integrated learning and missing value technology, applied in integrated learning, special data processing applications, instruments, etc., can solve problems that affect the accuracy of later prediction models, data research cannot be carried out smoothly, and model-related information is missing

Pending Publication Date: 2021-05-18
胡安民 +2
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Since most of the current models based on statistical methods or machine learning algorithms require all data to be complete, the existence of missing data prevents research from proceeding smoothly
On the one hand, if the missing data variables are simply and roughly eliminated, the relevant information of the model will be missed. On the other hand, if the simple interpolation directly affects the accuracy of the later prediction model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Missing value interpolation method based on multi-method ensemble learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The above and other technical features and advantages of the present invention will be described in more detail below in conjunction with the accompanying drawings.

[0020] Random Forest (RF) refers to a classification that uses multiple decision trees to train and predict samples, and each decision tree is unrelated. Random Forest randomly selects training data with replacement and then constructs a classifier, and finally combines learning to obtain a model to increase the overall effect. The random forest calculates the importance of each feature as a whole and sorts them in descending order, and then removes some features according to the importance of the features to obtain a new feature set, then sorts the importance and removes some features again, and iterates repeatedly; finally according to The different feature sets obtained and their corresponding out-of-bag error rates. The feature of the dependent variable is the feature set corresponding to the lowest o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a missing value interpolation method based on multi-method ensemble learning. The missing value interpolation method comprises the steps of S1, extracting original data; s2, screening a part of missing variables; s3, carrying out simple interpolation on missing values; s4, screening feature variables of missing values; s5, carrying out 10-fold cross validation; s6, carrying out multiple interpolation of missing values; S7-S8, iteratively covering the original simple interpolation data; and S9, respectively predicting variables with large missing proportions. According to the method, multiple methods are used for predicting missing values, the potential uncertainty influence of interpolation data on the model is weakened as much as possible, meanwhile, real incomplete variable information is utilized to the maximum extent, and the accuracy and the efficiency of missing data prediction are improved.

Description

technical field [0001] The invention relates to the field of missing data filling, in particular to a missing value interpolation algorithm based on integrated learning of multiple methods. Background technique [0002] At present, big data related research has been widely used in many fields, and then there are often some missing data in the actual data extraction process. Since most of the current models based on statistical methods or machine learning algorithms require all data to be complete, the existence of missing data prevents research from proceeding smoothly. On the one hand, if the missing data variables are simply and roughly eliminated, the relevant information of the model will be missed. On the other hand, if the simple interpolation is used, it will directly affect the accuracy of the later prediction model. [0003] Purpose of the invention: In order to solve the above-mentioned technical problems, try to weaken the potential uncertainty impact of interpol...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215G06K9/62G06N20/20
CPCG06F16/215G06N20/20G06F18/24323
Inventor 胡安民吴超然李镇
Owner 胡安民