KNN-based improved missing data filling algorithm

A missing data and algorithm technology, applied in data mining, digital data processing, special data processing applications, etc., to achieve accurate results, accurate calculation results, and wide applicability

Inactive Publication Date: 2017-02-15
NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
View PDF0 Cites 37 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Purpose of the invention: In order to solve the above technical problems, reduce the computational complexity of the missing data algorithm, improve the accuracy of

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • KNN-based improved missing data filling algorithm
  • KNN-based improved missing data filling algorithm
  • KNN-based improved missing data filling algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] The present invention will be further described below in conjunction with the accompanying drawings and the existing KNN algorithm.

[0050] like figure 1 Shown is the flow chart of the embodiment of the present invention, comprises steps:

[0051] (1) Perform an attribute set reduction operation on the data sample set, and delete samples with little relevance

[0052] First, analyze and compare the samples with missing values ​​with other complete samples, delete some samples that are not closely related to the missing samples; perform further attribute reduction operations on the data set samples. Improve the traditional reciprocal weighting method of multiple correlation coefficients, and use the improved algorithm to calculate the importance of each attribute to attributes with missing values, delete some attributes that are less associated with key attributes, and streamline the attribute set , to get a data sample set containing only the reduced attribute set. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a KNN-based improved missing data filling algorithm, which comprises the steps of (1) improving a traditional multiple correlation coefficient inverse weighting method and calculating the importance of each attribute on a missing value-containing attribute by using an improved algorithm, deleting a few of attributes with relatively small correlation with a key attribute and carrying out streamlined operation on an attribute set to obtain a data sample set which only contains the streamlined attribute set; (2) comprehensively considering the advantages of the correlation between the attributes and the variability by using a mahalanobis distance, effectively predicting an uncertain factor-containing sample by combining a grey correlation analysis method and calculating K adjacent samples of a missing sample; and (3) giving entropy weight values to the attributes corresponding to the K samples according to the calculated K distance values and an entropy weight method and then calculating a final filling value by combining attribute values. According to the KNN-based improved missing data filling algorithm, the calculating complexity of the missing data algorithm can be reduced, the accuracy of the adjacent sample values is improved and the estimation accuracy of the data filing value is improved.

Description

technical field [0001] The invention relates to the field of missing data filling, in particular to an improved missing data filling algorithm based on KNN. Background technique [0002] In practical applications, due to differences in data acquisition methods or data modeling, the obtained data is marked as "unknown" or directly vacant because it does not fully conform to the previously defined format, and these data are called incomplete data. or missing values. Missing values ​​generally exist in related fields such as medicine, survey research, industry, etc. Inaccurate measurement methods, limitations of collection conditions, omissions of manual entry, etc. may lead to missing data. Excavation work will have very adverse effects. For example, missing values ​​may directly affect the accuracy of newly discovered patterns, leading to wrong mining models. In association rules, the unknown of missing values ​​will interfere with the normal data distribution and affect t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06N3/08
CPCG06F16/2358G06F16/215G06F2216/03G06N3/084
Inventor 谢强王振
Owner NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products