Unlock instant, AI-driven research and patent intelligence for your innovation.

Method for quickly classifying massive database tables

A rapid classification and database technology, applied in the database field, can solve the problems of time-consuming, labor-intensive, and huge labor costs, and achieve the effect of fast processing process, rich field features, and simplified manual processing workload.

Pending Publication Date: 2020-05-05
中国长峰机电技术研究设计院
View PDF8 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Existing data classification methods need to be implemented by personnel based on database design documents, database table structure notes, etc., which largely rely on human experience, and each piece of metadata information needs to be confirmed one by one, which is time-consuming and laborious
When faced with massive data types and data scale, the labor cost is very huge

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for quickly classifying massive database tables
  • Method for quickly classifying massive database tables
  • Method for quickly classifying massive database tables

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] In order to make the purpose, content, and advantages of the present invention clearer, the specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

[0030] figure 1 Shown is a flow chart of a method for fast classification of massive database tables of the present invention, as figure 1 As shown, the present invention first obtains the key attributes of each table by calculating the mutual information entropy, and constructs the feature vector of the selected attribute according to the metadata information such as the attribute field type (mainly for character type and numerical type) and the data content summary, etc., Use the clustering algorithm of machine learning to cluster the key attributes, label the cluster centers, form a training set to train the classification algorithm, apply the trained classification algorithm to the classification of other attribute...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method for quickly classifying massive database tables. The method comprises the steps of: calculating a mutual information entropy to obtain a key attribute of each table;constructing the feature vector of the selected attribute according to the metadata information of the attribute field type and the data content abstract; and clustering the key attributes by using aclustering algorithm of machine learning, labeling a clustering center, forming a training set, training a classification algorithm, applying the trained classification algorithm to residual attributeclassification, performing sampling judgment on a classification result, reversely optimizing the classification algorithm, and outputting categories of all database table attribute fields. Accordingto the method, field feature vectors are constructed in combination with database field metadata information and field content, the to-be-analyzed database key fields are clustered, the data field (tagging) is set, the training set is constructed, the classification algorithm of industry features is trained, and the workload of manual processing is reduced.

Description

technical field [0001] The invention relates to database technology, in particular to a method for quickly classifying massive database tables. Background technique [0002] In the process of data warehouse construction, data cataloging and cleaning consume a lot of manpower and material resources, and one of the most important tasks is to classify database tables. By classifying and labeling the database tables, identifying the data field to which it belongs (for example, the field represents customers, products, quantities, amounts, etc.), and establishing a data catalog, it is helpful to fill in missing metadata information and assist in the formulation of data quality rule discovery For data quality issues, etc., follow-up data governance and improvement will be carried out in a targeted manner. [0003] Existing data classification methods need to be carried out by personnel based on database design documents, database table structure notes, etc., which largely rely on...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/28
CPCG06F16/285Y02D10/00
Inventor 王衍祺王楠孟庆磊毛俐旻
Owner 中国长峰机电技术研究设计院