Missing data completion method based on k plane regression

A missing data and completion technology, which is applied in data mining, electrical digital data processing, special data processing applications, etc., can solve the problem that the accuracy of segmented data completion needs to be improved.

Inactive Publication Date: 2016-04-06
EAST CHINA UNIV OF SCI & TECH
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the above methods have achieved good results, the completion accuracy for segmented data needs to be improved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Missing data completion method based on k plane regression
  • Missing data completion method based on k plane regression
  • Missing data completion method based on k plane regression

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] Step 1. Manually detect missing data, and use the data that needs to be completed as the output, and the rest of the data as the input.

[0026] Step 2, perform parameter initialization setting.

[0027] The choice of the maximum error allowed is to multiply the difference between the maximum value and the minimum value of the data that needs to complete the dimension multiplied by an artificially set coefficient α, and our value for α is 0.1.

[0028] Step 3, use PCA to reduce the dimensionality of the input data.

[0029] As shown in the following formula (1), the covariance matrix C is obtained, where X is the input of our completion algorithm, and m is the number of data items. And find the eigenvalues ​​and corresponding eigenvectors of the covariance matrix C, then arrange the eigenvectors into a matrix from top to bottom according to the size of the corresponding eigenvalues, and take the first d columns to form a matrix P, Y=XP is Data obtained after dimension...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a new missing database data completion method. The method is characterized by comprising steps: 1, missing detection is carried out on a given data set; 2, dimension reduction of an input variable is carried out, correlation between input dimensions is analyzed, pivoting (PCA) is adopted to select a correlated input dimension, and a new input data set is formed; 3, training set k partitioning is carried out, a cluster (Kmeans) is used for carrying out partitioning on the input training set, and k classes of training sets are obtained; 4, a k plane regression function is built, the optimal regression coefficient and the geometric center of each plane are solved, and a regression fitting function is given; and finally, data completion test is carried out. The experiment proves that the data completion method is extremely effective; in an allowable error range, a completed database with a use value is obtained; and the challenging technical problem brought to machine learning and data mining due to data incompletion can be solved to a certain degree; and the big data application technology progress is pushed.

Description

technical field [0001] The invention mainly relates to data mining technology, in particular to a method for complementing missing data based on K-plane regression. Background technique [0002] Ideally, every piece of data in the dataset should be complete. However, incomplete and noisy data are common in the real world. For the fields of data mining and pattern recognition, these missing data can have a very large impact. For example, these missing data will affect the correctness of the patterns extracted from the dataset and the accuracy of the derived rules, which will lead to wrong data mining models. And for the vast majority of data mining algorithms at this stage, they do not have the ability to analyze and process data sets with missing data. If these missing data are not analyzed and processed, and discarded directly, a large amount of information will be lost, and bias will be generated, resulting in systematic differences between incomplete observation data a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06F17/30
CPCG06F16/35G06F2216/03G06F18/23213
Inventor 袁玉波阮彤邱文强汤伟赵婷婷高炬殷亦超
Owner EAST CHINA UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products