Distance-preserving and dimension-reducing method for data containing missing data

A technology for missing data and data, applied in the field of data processing, can solve the problems of inability to reduce the dimension of missing data, lack of distance preservation, and inability to effectively learn nonlinear structure information of high-dimensional data, so as to save data processing time and space. , the effect of facilitating data processing

Pending Publication Date: 2020-02-28
HUNAN UNIV
View PDF0 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] Aiming at the problems in the prior art that the traditional dimensionality reduction algorithm cannot reduce the dimensionality of missing data, cannot effectively learn the complex nonlinear structural information of high-dimensional data, and does not have distance preservation in the dimensionality reduction process, the present invention provides a method containing The data distance-preserving dimensionality reduction method for missing data achieves dimensionality reduction with missing data by improving the autoencoder, and captures the nonlinear information between high-dimensional data, and then intervenes in the weight matrix in the improved autoencoder , so that the update of the encoder weight matrix conforms to the random projection property, so as to obtain a trained autoencoder, and use the trained autoencoder to reduce the dimensionality of high-dimensional data, which can achieve distance-preserving dimensionality reduction with missing data, so that Save time and space for subsequent data processing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distance-preserving and dimension-reducing method for data containing missing data
  • Distance-preserving and dimension-reducing method for data containing missing data
  • Distance-preserving and dimension-reducing method for data containing missing data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] The technical solutions in the present invention are clearly and completely described below in combination with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0052] Take the MNIST data set as an example to illustrate a method for data distance-preserving dimensionality reduction with missing data proposed by the present invention. The MNIST data set is a large-scale handwritten digital database collected and organized by the National Institute of Standards and Technology. The data set A total of 70,000 pictures are divided into a training set and a test set. The test set contains 10,000 pictures, and the training set contains 60,00...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distance-preserving and dimension-reducing method for data containing missing data, and relates to the technical field of data processing. According to the data distance-preserving and dimension-reducing method, the missing part in the original data does not participate in the calculation of the loss function of the automatic encoder through the missing data matrix, so that the automatic encoder can process the dimensionality reduction containing the missing data, and the influence of the missing data on the automatic encoder is avoided; meanwhile, the strong automatic learning capability of the automatic encoder is used; the complex nonlinear relationship between the original data can be effectively captured; and the weight matrix of the encoder in the loss function is updated through constraint, so that the dimension reduction processing has distance preserving performance, the distribution information of the original high-dimensional data is reserved for the low-dimensional data after dimension reduction to the maximum extent, subsequent data processing is facilitated, and the data processing time and space are saved.

Description

technical field [0001] The invention belongs to the technical field of data processing, in particular to an autoencoder-based distance-preserving dimensionality reduction method for data containing missing data. Background technique [0002] With the advent of the era of big data and the popularization of electronic equipment, a large amount of high-dimensional data has been generated. Direct analysis and processing of high-dimensional data usually requires a large time and space overhead. Dimensionality reduction is a method of mapping high-dimensional data to low-dimensional Algorithms that retain the original data information in the dimensional space are increasingly favored by people. Applying dimensionality reduction algorithms such as Principal Components Analysis (PCA for short) or Linear Discriminant Analysis (LDA for short) to reduce the dimensionality of high-dimensional data can bring great convenience to subsequent data processing. However, most of the data gene...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06N3/048G06N3/045G06F18/213
Inventor 从银川谢鲲欧阳与点文吉刚
Owner HUNAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products