Intelligent data blood relationship tracing method and device based on clustering analysis

A cluster analysis and data technology, applied in the field of big data, can solve problems such as inability to complete, data performance impact, and inability to process data lineage, etc., to achieve the effect of improving accuracy and efficiency

Active Publication Date: 2019-08-02
中电科嘉兴新型智慧城市科技发展有限公司 +1
View PDF6 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Its principle determines that only the blood relationship of data within the same database can be recorded, and blood relationship tracking across databases or even across database types (such as data conversion from mysql to oracle) cannot be completed
Moreover, the method based on the database plug-in requires the data blood relationship to be recorded at the same time as the data is generated. If the record fails when the

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Intelligent data blood relationship tracing method and device based on clustering analysis

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0021] According to one or more embodiments, such as figure 1 As shown, a method for intelligently tracing the origin of data based on cluster analysis includes the steps:

[0022] Step 1: Read the table structure and data, and form the data characteristics of each field by means of data engineering. The specific method is as follows:

[0023] Step 1.1: Parse the data characteristics of the original data into structured sample data, including field type, field length, field content mode, etc.

[0024] Step 1.2: Combine the existing features in the sample data to form high-dimensional features;

[0025] Step 1.3: Analyze high-dimensional features, form new dimensions and rank the influence of new dimensions;

[0026] Step 1.4: Reduce the sample data according to the new dimension, and use the smallest number of dimensions on the premise that the distortion rate of the sample data is lower than the set value;

[0027] Step 1.5: Normalize the sample data of the new dimension.

[0028] Step ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An intelligent data blood relationship tracing method based on clustering analysis comprises the steps that 1, a table structure and data are read, and forming data characteristics of all fields through a data engineering means; step 2, learning the data sample by using a clustering analysis algorithm in machine learning by taking a field as a unit and a field data feature set as a feature; 3, repeatedly executing the clustering analysis in the step 2 until the optimal classification number and the optimal classification number are found; step 4, under the optimal classification, automaticallyjudging the data fields in the same classification as fields which may have a blood relationship; 5, for each blood relationship, inferring the direction of the blood relationship according to the sequence of table creation time pointed by the relationship, namely inferring which field is a source and which field is a target, and if the field of the blood relationship comes from the same table, marking the blood relationship as an invalid blood relationship; and 6, calculating the table blood relationship according to the effective field blood relationship.

Description

technical field [0001] The invention belongs to the technical field of big data, and in particular relates to a method and device for intelligent traceability of blood relationship of data based on cluster analysis. Background technique [0002] With the development and popularization of big data and machine learning technology, the amount of data used, managed and generated by data analysis software is increasing, and the dependence on the format, content and quantity of data is also increasing. Before the data analysis system runs, it needs to perform various extraction, cleaning, conversion and desensitization operations on the data. The complexity of these businesses determines that there are many procedures, long processes and complicated methods in the data processing process. It is necessary to trace back the lineage of the data to judge the credibility of the data, analyze the influence of the data, and analyze and process the source of the wrong data. Therefore, e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/2458G06F16/22G06F16/28
CPCG06F16/2462G06F16/2282G06F16/285
Inventor 王鹏陈昊于会游姜玉峰滕姿李栋杜浩饶定远唐丽娜靳翼闵圣捷陈丽婷童昊许亚洋
Owner 中电科嘉兴新型智慧城市科技发展有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products