Multi-level dimension reduction method for high-dimensional database

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A database and multi-level technology, applied in the field of data processing, can solve problems such as complex operating costs of machine learning algorithms, and achieve the effects of dynamic dimensionality reduction, overcoming dependencies, and high operating efficiency

Pending Publication Date: 2022-07-01

南京开特信息科技有限公司

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] The technical problem to be solved by the present invention is to provide a multi-level dimensionality reduction method for high-dimensional databases, which overcomes the problems of complex machine learning algorithms and high operating costs, while maximizing the retention of original index attributes, high operating efficiency, and Strong operability

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0046] like figure 1 As shown in the figure, a multi-level dimensionality reduction method for a high-dimensional database is characterized in that, it includes the following steps:

[0047] Step 1: Obtain a sample data set containing multi-dimensional indicators, and preprocess the sample data set;

[0048] Step 2: Standardize the preprocessed sample data set to make the data of each indicator dimensionless;

[0049] Step 3: Perform the first dimension reduction on the sample data set, delete the data set containing insufficient information, reduce the calculation amount of subsequent operations, and improve the calculation speed;

[0050] Step 4: Use the key influence index sorting method to perform a second dimension reduction on the sample data set after one dimension reduction to reduce the collinearity problem in the subsequent steps;

[0051] Step 5: Perform a third dimension reduction on the sample data set after the second dimension reduction based on the improved p...

Embodiment 2

[0070] The difference between Embodiment 2 and Embodiment 1 is that the dynamic time dimension of the indicator data set is increased.

[0071] Specifically, as figure 2 As shown, the core steps of the invention are as follows:

[0072] 1. Import the database, and preprocess the database to check whether the data indicators are abnormal (such as garbled characters). If there are abnormalities, it is necessary to eliminate abnormal data or abnormal indicators.

[0073] 2. After confirming that there is no abnormality in the data, standardize all index data to make the data dimensionless: standardize the index data by the traditional method of subtracting the mean and dividing by the standard deviation.

[0074] 3. Sample weight update. A time penalty factor is introduced to update the sample weights at different time points.

[0075] with indicator dataset For example, the indicator dataset contains m indicators. In a time unit, n sample data are newly generated, and at ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multi-level dimension reduction method for a high-dimensional database, which comprises the following steps of: 1, acquiring a sample data set containing a multi-dimensional index, and preprocessing the sample data set; 2, standardization processing is carried out, so that the data of each index is dimensionless; 3, carrying out the first dimension reduction, deleting a data set containing insufficient information, reducing the calculation amount of subsequent operation, and improving the calculation speed; 4, performing second dimension reduction on the sample data set by using a key influence index sorting method, and reducing a collinearity problem in subsequent steps; and 5, carrying out third dimension reduction on the sample data set on the basis of not changing the index attributes based on an improved principal component analysis method to obtain a final sample data set. According to the method, the problems that a machine learning algorithm is relatively complex and the operation cost is relatively high are solved, original index attributes are reserved to the maximum extent, the operation efficiency is high, and the operability is high.

Description

technical field [0001] The invention belongs to the field of data processing, in particular to a multi-level dimension reduction method for a high-dimensional database. Background technique [0002] In the era of big data, the number of data indicators has increased dramatically. Usually, these indicators contain a large amount of irrelevant and redundant information, which will greatly increase the storage cost and query cost of the database. [0003] In terms of dimensionality reduction of high-dimensional data, the existing technical methods can be summarized into two categories: [0004] The first category is dimensionality reduction based on transformation methods. The advantage of dimensionality reduction based on transformation methods is that high-dimensional data can be directly reduced to several dimensions or even 1 dimension through mathematical transformation, and the dimensionality reduction speed is fast. The disadvantage of this method is that the data The i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/215G06F16/2453

CPCG06F16/215G06F16/2453

Inventor 沈克勤王伟

Owner 南京开特信息科技有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Multi-level dimension reduction method for high-dimensional database

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology