Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-level dimension reduction method for high-dimensional database

A database and multi-level technology, applied in the field of data processing, can solve problems such as complex operating costs of machine learning algorithms, and achieve the effects of dynamic dimensionality reduction, overcoming dependencies, and high operating efficiency

Pending Publication Date: 2022-07-01
南京开特信息科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The technical problem to be solved by the present invention is to provide a multi-level dimensionality reduction method for high-dimensional databases, which overcomes the problems of complex machine learning algorithms and high operating costs, while maximizing the retention of original index attributes, high operating efficiency, and Strong operability

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-level dimension reduction method for high-dimensional database
  • Multi-level dimension reduction method for high-dimensional database
  • Multi-level dimension reduction method for high-dimensional database

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0046] like figure 1 As shown in the figure, a multi-level dimensionality reduction method for a high-dimensional database is characterized in that, it includes the following steps:

[0047] Step 1: Obtain a sample data set containing multi-dimensional indicators, and preprocess the sample data set;

[0048] Step 2: Standardize the preprocessed sample data set to make the data of each indicator dimensionless;

[0049] Step 3: Perform the first dimension reduction on the sample data set, delete the data set containing insufficient information, reduce the calculation amount of subsequent operations, and improve the calculation speed;

[0050] Step 4: Use the key influence index sorting method to perform a second dimension reduction on the sample data set after one dimension reduction to reduce the collinearity problem in the subsequent steps;

[0051] Step 5: Perform a third dimension reduction on the sample data set after the second dimension reduction based on the improved p...

Embodiment 2

[0070] The difference between Embodiment 2 and Embodiment 1 is that the dynamic time dimension of the indicator data set is increased.

[0071] Specifically, as figure 2 As shown, the core steps of the invention are as follows:

[0072] 1. Import the database, and preprocess the database to check whether the data indicators are abnormal (such as garbled characters). If there are abnormalities, it is necessary to eliminate abnormal data or abnormal indicators.

[0073] 2. After confirming that there is no abnormality in the data, standardize all index data to make the data dimensionless: standardize the index data by the traditional method of subtracting the mean and dividing by the standard deviation.

[0074] 3. Sample weight update. A time penalty factor is introduced to update the sample weights at different time points.

[0075] with indicator dataset For example, the indicator dataset contains m indicators. In a time unit, n sample data are newly generated, and at ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-level dimension reduction method for a high-dimensional database, which comprises the following steps of: 1, acquiring a sample data set containing a multi-dimensional index, and preprocessing the sample data set; 2, standardization processing is carried out, so that the data of each index is dimensionless; 3, carrying out the first dimension reduction, deleting a data set containing insufficient information, reducing the calculation amount of subsequent operation, and improving the calculation speed; 4, performing second dimension reduction on the sample data set by using a key influence index sorting method, and reducing a collinearity problem in subsequent steps; and 5, carrying out third dimension reduction on the sample data set on the basis of not changing the index attributes based on an improved principal component analysis method to obtain a final sample data set. According to the method, the problems that a machine learning algorithm is relatively complex and the operation cost is relatively high are solved, original index attributes are reserved to the maximum extent, the operation efficiency is high, and the operability is high.

Description

technical field [0001] The invention belongs to the field of data processing, in particular to a multi-level dimension reduction method for a high-dimensional database. Background technique [0002] In the era of big data, the number of data indicators has increased dramatically. Usually, these indicators contain a large amount of irrelevant and redundant information, which will greatly increase the storage cost and query cost of the database. [0003] In terms of dimensionality reduction of high-dimensional data, the existing technical methods can be summarized into two categories: [0004] The first category is dimensionality reduction based on transformation methods. The advantage of dimensionality reduction based on transformation methods is that high-dimensional data can be directly reduced to several dimensions or even 1 dimension through mathematical transformation, and the dimensionality reduction speed is fast. The disadvantage of this method is that the data The i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215G06F16/2453
CPCG06F16/215G06F16/2453
Inventor 沈克勤王伟
Owner 南京开特信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products