Dimension reduction and correlation analysis method suitable for large-scale data

A large-scale data and association analysis technology, applied in the fields of computer science and image processing, can solve the problems of insufficient utilization of speed and memory efficiency, and achieve the effect of improving computing speed and memory utilization, and using memory efficiently

Inactive Publication Date: 2020-12-29
JIANGSU UNIV
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, although the above methods have solved the application processing problem of massive data, they are still not fully uti

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Dimension reduction and correlation analysis method suitable for large-scale data
  • Dimension reduction and correlation analysis method suitable for large-scale data
  • Dimension reduction and correlation analysis method suitable for large-scale data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0037] Such as figure 1 A dimensionality reduction and association analysis method suitable for large-scale data is shown, including the following steps:

[0038] Step 1, data initialization, collect data sample set X(M 1 ×N) and Y(M 2 ×N) as the required data set. Explain M here 1and M 2 Indicate the dimensions of data sets X and Y respectively, that is, each row of X and Y is an attribute of the data; X=[x 1 x 2 ... x N ], similarly, Y=[y 1 the y 2 ... y N ], N represents the number of samples of the data, that is, each column of vectors (ie x i and y i , i=1, 2,...N) represent al...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a dimension reduction and correlation analysis method suitable for large-scale data, and the method comprises the steps: projecting high-dimensional data to a Fourier domain, and converting a solving feature vector problem of linear correlation analysis into a meaningful Fourier domain basis; because the Fourier domain basis is predefined and the characteristic value distribution of the data is ordered, accelerating training by inputting the training samples in batches until the required Fourier basis is stable and ordered; determining the number of Fourier bases and aprojection matrix, and multiplying the projection matrix by the high-dimensional data set to obtain a low-dimensional data set so as to facilitate rapid processing of data. According to the data dimension reduction method, on the basis of fast Fourier transform and correlation analysis, noise and redundant information in a high-dimension data set can be removed, unnecessary operation processes indata processing are reduced, and the running speed and the memory use efficiency in data dimension reduction calculation are improved.

Description

technical field [0001] The invention belongs to the field of computer science and image processing technology, in particular to a dimensionality reduction and association analysis method suitable for large-scale data. Background technique [0002] Traditional data processing methods have been unable to effectively analyze massive data. At the same time, with the continuous increase of data dimensions generated by big data processing and cloud computing, in many fields of research and applications, it is usually necessary to observe data containing multiple variables, collect a large amount of data, and then analyze and find patterns. Multivariate large data sets will undoubtedly provide rich information for research and application, but also increase the workload of data collection to a certain extent. [0003] Canonical Correlation Analysis (CCA) is one of the most commonly used algorithms for mining data association relationships. It is also a dimensionality reduction tec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/14G06F17/16
CPCG06F17/14G06F17/142G06F17/16
Inventor 沈项军徐兆瑞
Owner JIANGSU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products