K-Nearest Neighbor (KNN) algorithm-based missing data filling method

A KNN algorithm and missing data technology, which is applied in electrical digital data processing, special data processing applications, calculations, etc., can solve the problem of a large number of missing value data in structured data

Active Publication Date: 2017-09-22
MERIT DATA CO LTD
View PDF5 Cites 39 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a missing data filling method based on the nearest neighbor KNN algorithm, which involves missing data under the enterprise-le

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • K-Nearest Neighbor (KNN) algorithm-based missing data filling method
  • K-Nearest Neighbor (KNN) algorithm-based missing data filling method
  • K-Nearest Neighbor (KNN) algorithm-based missing data filling method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention. ,

[0054] refer to figure 1 , Step 1, constructing target data (missing value data) into a data format supported by the KNN model;

[0055] Step 2, data preprocessing operation, for the noun attribute column in the data set, perform noun data labeling processing on it, and for the numerical attribute column in the data set, perform data normalization or standardization processing on it;

[0056] Step 3. Calculate the Euclidean distance d between the target data...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a KNN algorithm-based missing data filling method. The method comprises the following steps of: automatically recognizing feature attributes with missing values in a data set; traversing the feature attributes with missing values and gradually filling the missing values by utilizing a KNN algorithm; in the process of filling the missing value of each feature attribute, iterating KNN algorithm parameters k to obtain a KNN model cluster configured by different parameters; and selecting an optimum model according to an optimal objective function, and carrying out missing value filling on missing data by utilizing the model. The algorithm parameters k have great influences on the KNN algorithm, and an optimization strategy is applied to the construction of missing value filling models, so that the model precision can be greatly improved, and correspondingly, the quality of the filled data is greatly improved.

Description

technical field [0001] The present invention relates to the field of enterprise data governance, more specifically, a method for filling missing data based on the nearest neighbor KNN algorithm, and relates to missing data under an enterprise-level data governance system. Background technique [0002] Data governance refers to the process of moving from using fragmented data to using unified master data, from having little or no organizational and process governance to comprehensive data governance across the enterprise. Enterprise data governance aims to improve the quality of enterprise data, and through the formulation of relevant processes, policies, standards and related technical means, it is used to ensure the integrity, timeliness, accuracy, consistency and security of enterprise data information. [0003] However, the data in the real world are intricate, and they inevitably have the problem of missing data. Data loss is the biggest problem in the integrity of ente...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/215G06F16/2358
Inventor 程宏亮刘宏白朝旭饶思维张建
Owner MERIT DATA CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products