A method for completing and predicting an attribute missing data set based on generative adversarial network

A technology of missing data and prediction methods, applied in biological neural network models, neural learning methods, instruments, etc., can solve problems such as difficult calculations, high computational complexity and high computational complexity, and achieve good filling effects and good predictions The result, the effect of the simple method

Active Publication Date: 2019-01-08
SOUTH CHINA UNIV OF TECH
View PDF6 Cites 33 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

These algorithms often achieve more accurate filling results than the mean and median when the amount of data is sufficient, and there are usually some problems: in the regression filling method, there is a significant linear relationship between attributes, and based on EM The filling method of the algorithm has high computational complexity and is easy to fall into a local optimum; the filling method based on k-nearest neighbors is simple to implement, but when faced with a large amount of data, the large amount of calculation and extremely high complexity lead to difficult calculation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for completing and predicting an attribute missing data set based on generative adversarial network
  • A method for completing and predicting an attribute missing data set based on generative adversarial network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The present invention will be further described below in conjunction with specific examples.

[0027] Such as figure 1 As shown in this example, the data set completion and prediction method based on generative confrontation network for missing attributes is provided in this example. The details are as follows:

[0028]1) Data preprocessing: The data types of different attributes are different, and the corresponding processing methods are also different. The main data types involved are divided into continuous and discrete values. For continuous values, minmax is used for normalization; for discrete values, after conversion to one hot encoding, minmax is used for normalization, and the missing positions are unified. Make up 0. In addition, the dataset is divided into two parts: data with missing attributes and data without missing attributes.

[0029] 2) Construct missing position encoding vectors: When filling data, the missing attribute positions of samples are als...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for completing and predicting an attribute missing data set based on a generative adversarial network. The method comprises the following steps: 1) normalizing data minmax and using one hot code for discrete attribute, wherein the missing value is marked as 0; 2) using that data set to establish a missing position coding vector with respect to the sample; 3) constructing a generative antagonistic network and an auxiliary prediction network for data filling and label prediction; 4) restoring that result before minmax normalization according to the maximum and minimum value in the attribute; 5) selecting a suitable super parameter through testing. The data distribution information and the label information in the data set are fully utilized, and at the same time, another auxiliary prediction network included in the method can directly input attribute missing data to give the prediction result of the label after the training is completed. The process is simple and has higher prediction accuracy.

Description

technical field [0001] The present invention relates to the technical field of data preprocessing, in particular to a method for complementing and predicting missing attribute datasets based on generative confrontation networks. Background technique [0002] The phenomenon of missing attributes of data sets exists widely in various data sets, and is usually caused by information loss during data collection or transmission. The loss of one or more attributes of the samples in the data set will reduce the prediction accuracy of the subsequent prediction and classification models. How to complete these missing data and use the information contained in samples with missing attributes to build a high-precision prediction model is a key problem in data preprocessing. [0003] Most statistical tools deal with the problem of missing attributes by deleting the rows and columns corresponding to the missing samples, or using the median and average of the column to fill in the missing ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N3/04G06N3/08
CPCG06N3/084G06N3/045G06F18/2148
Inventor 赵跃龙王禹
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products