High-dimensional data processing method based on deep manifold transformation network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A high-dimensional data and processing method technology, applied in the field of data processing, can solve problems such as misleading, information loss, easy to get errors, etc., to solve the inconsistency between geometric structures, the purpose and advantages of concise and easy to understand, and avoid collapse or over-smoothed effect

Pending Publication Date: 2021-06-22

WESTLAKE UNIV

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The defects of T-SNE and UMAP are: (1) they may destroy the geometric or topological structure of the original data in the process of data dimensionality reduction, resulting in loss of information; (2) they may cause one-to-many Mapping, the same sample point is mapped to multiple different values after dimensionality reduction; (3) They are all non-deep methods, which adopt the method of directly optimizing embedding instead of optimizing network parameters, and it is difficult to compare with existing deep learning techniques. combine

[0006] Data dimensionality reduction, clustering and visualization are three basic tasks closely related to high-dimensional data analysis. However, these three tasks are generally completed independently, which will not only affect their performance, but also easily make the differences between each task There are inconsistencies between them, and there is no way to truly reveal the inherent geometric and topological information of high-dimensional data, which makes it easy to make mistakes and draw misleading conclusions during data analysis

[0007] To sum up, for the above-mentioned problems existing in high-dimensional data analysis in the prior art, no effective solution has been obtained yet

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0034] figure 1 is a model architecture diagram of a deep manifold transformation network according to an embodiment of the present application, such as figure 1 Said, the depth manifold transformation network includes an autoencoder, the autoencoder is configured to: include an encoder network and a decoder network, the encoder network and the decoder respectively include a plurality of dense block layers,

[0035] In the encoder network, the dimensionality of the input space is reduced to the latent space by the first nonlinear transformation of multiple dense block layers, and then the dimensionality of the latent space is reduced to the embedding space by the second nonlinear transformation of multiple dense block layers, A third non-linear transformation of multiple dense block layers in the decoder network restores the dimensions of the latent space to the reconstruction space;

[0036] Calculate the reconstruction loss based on the input space and the reconstruction sp...

Embodiment 2

[0051] Based on the same idea, this application also proposes a high-dimensional data processing method based on deep manifold transformation network, refer to figure 2 , the method includes:

[0052] S201. Obtain an input space;

[0053] S202. Reduce the dimension of the input space to the hidden space through the first nonlinear transformation, and reduce the dimension of the latent space to the embedding space through the second nonlinear transformation;

[0054] S203. Applying a bidirectional divergence loss between the input space and the hidden space, and / or applying a bidirectional divergence loss between the hidden space and the embedding space, and / or applying a bidirectional divergence loss between the input space and the embedding space, Make the first nonlinear transformation and the second nonlinear transformation keep the structure unchanged;

[0055] S204. Cluster the data in the input space in the latent space, and visualize the dimensionally reduced data in...

Embodiment 3

[0060] The new deep manifold transformation framework proposed in this application can also be used for downstream tasks such as classification and regression. Specifically, classify the data in the input space by optimizing the function of cross entropy in the latent space; and / or regress the data in the input space by the function of mean square error in the latent space and / or in the embedding space , and draw the data points obtained by dimensionality reduction in the coordinate system to realize data visualization.

[0061] It is worth noting that the deep popular transformation network proposed in this application is a flexible and efficient framework, which solves the information loss caused by the destruction of the geometric / topological structure of the original data in the process of dimensionality reduction, clustering, and visualization in existing algorithms The problem. This framework can be combined with various existing classification, regression, and clusteri...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a high-dimensional data processing method based on a deep manifold transformation network. The method comprises the following steps: acquiring an input space; reducing the dimension of the input space to a hidden space through first nonlinear transformation, and reducing the dimension of the hidden space to an embedding space through second nonlinear transformation; applying bidirectional divergence loss between the input space and the hidden space, and / or applying bidirectional divergence loss between the hidden space and the embedding space, and / or applying bidirectional divergence loss between the input space and the embedding space, so that the structures of the first nonlinear transformation and the second nonlinear transformation are kept unchanged; and clustering the data of the input space in the hidden space, and visualizing the data after dimension reduction in the embedding space. The bidirectional divergence loss is applied between any two layers to ensure local smoothness of the network, and the situation that the final clustering effect becomes poor due to information loss caused by geometric or topological structure damage in original data due to dimension reduction is prevented.

Description

technical field [0001] The present application relates to the technical field of data processing, in particular to a high-dimensional data processing method based on a deep manifold transformation network. Background technique [0002] High-dimensional data analysis consists of three basic tasks: data dimensionality reduction, clustering, and visualization. [0003] Classical clustering algorithms, such as K-means clustering, analyze the intra-class similarity and inter-class similarity of the clustering results by testing the clustering effect of all K values, and select the optimal clustering effect corresponding to a large number of clustering results. value of k. Similarly, clustering and spectral clustering based on Gaussian mixture models are clustered by some distance or similarity measures defined in the high-dimensional input space. However, due to the inherent non-Euclidean characteristics of high-dimensional data, that is, the arrangement of Feeuclidian data is ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/62G06N3/08

CPCG06N3/088G06F18/2321G06F18/213

Inventor 李子青吴立荣臧泽林

Owner WESTLAKE UNIV

High-dimensional data processing method based on deep manifold transformation network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology