Data dimension reduction method and device

A technology for data dimension reduction and low-dimensional data, applied in the field of data processing, can solve problems such as not being as complete as the original sample, unable to adapt to large data set dimension reduction, information loss, etc.

Inactive Publication Date: 2019-04-02
PETROCHINA CO LTD
View PDF1 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Existing data dimensionality reduction methods can be divided into two categories: linear methods and nonlinear methods: the most classic method in linear methods is principal component analysis (PCA), and PCA uses the K-L transformation with the smallest distortion under the mean square error criterion in linear algebra The original space data set is transformed into the feature vector space, but the dimensionality reduction results often have a certain degree of ambiguity, which is not as complete as the original sample, and the contribution rate is small, but the principal components containing important information of sample differences may be directly discarded, resulting in information loss; Representative methods of nonlinear methods include: kernel PCA method, local linear embedding method (LLE), isometric mapping method (ISOMap), etc. Among them, the dimensionality reduction effect of kernel PCA method depends on the selection of kernel function, and both LLE and ISOMap assume The dataset has a manifold structure and cannot accommodate all dataset types
[0004] There are deficiencies in the above methods: ① all involve matrix operations, which cannot adapt to the dimensionality reduction of large data sets; ② cannot memorize the characteristics of the data set, once new samples are added to the data set, it needs to be recalculated; ③ the dimensionality reduction results of some methods are not very good Preserve the distance relationship between global sample points in high-dimensional data sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data dimension reduction method and device
  • Data dimension reduction method and device
  • Data dimension reduction method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, the embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings. Here, the exemplary embodiments and descriptions of the present invention are used to explain the present invention, but not to limit the present invention.

[0040] figure 1 is a schematic flowchart of a data dimensionality reduction method according to an embodiment of the present invention. Such as figure 1 As shown, the data dimensionality reduction method of some embodiments may include:

[0041] Step S110: Construct an initial neural network for data dimensionality reduction, and use a high-dimensional data set and a low-dimensional data set as the input and output of the initial neural network, respectively, and the dimensionality of the sample points in the high-dimensional data set is greater than The dimensiona...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data dimension reduction method and device. The method comprises the steps of constructing an initial neural network used for data dimension reduction, using a high-dimensional data set and a low-dimensional data set as input and output of the initial neural network respectively, wherein the dimension of sample points in the high-dimensional data set is larger than that ofsample points in the low-dimensional data set; constructing a neural network objective function based on the sample point distance relationship of the high-dimensional data set and the sample point distance relationship of the low-dimensional data set; optimizing and adjusting parameters of the initial neural network according to the neural network objective function; and carrying out dimension reduction processing on to-be-processed data by using the initial neural network after parameter optimization and adjustment. The low-dimensional data set obtained through the scheme can keep the global characteristics of the high-dimensional data set.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a data dimensionality reduction method and device. Background technique [0002] Data dimensionality reduction refers to reducing the dimensionality of high-dimensional data sets to low-dimensional data sets, and ensuring that the generated low-dimensional data sets are similar to the main information contained in the original high-dimensional data sets. In reality, many data sets to be processed and analyzed are usually large in volume and high in dimensionality. For example, in seismic exploration, in order to identify oil and gas development locations using reflected seismic waveforms, the data sets to be processed and analyzed may contain hundreds of Up to tens of millions of sample points, the dimension of each sample point may reach 100 dimensions. Data dimensionality reduction can reduce the time or space complexity of high-dimensional data sets, save the computat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N3/04
CPCG06N3/047G06N3/048G06N3/045G06F18/214
Inventor 杨昊郑晓东李劲松魏超
Owner PETROCHINA CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products