Method and device for determining data consanguinity based on structural data

A technology of structured data and data, which is applied in the field of data processing, can solve problems such as time-consuming, high labor costs, dependence and helplessness

Inactive Publication Date: 2019-02-12
RAJAX NETWORK &TECHNOLOGY (SHANGHAI) CO LTD
View PDF6 Cites 45 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The existing data lineage implementation scheme can realize the analysis of this dependency relationship. The source table name t_order and t_order_logistics_info can be obtained by analyzing the select statement, and the target table name t_order_shop_all_daily can be obtained by inserting the statement, and then the relationship between the table and the table can be obtained. However, if you want to know which fields of the merchant's monthly order form depend on the order form and the logistics information form, the existing technical solutions cannot solve it.
[0005] Imagine a usage scenario where a certain table T can no longer meet the current usage requirements due to the irrationality in the original design stage, and the table structure needs to be adjusted, and field c may be deleted, but the table has already been running in the production environment For a period of time, it is the upstream dependency of many tables. At this time, it is necessary to clarify which fields of which tables depend on the field c to be deleted in T table. However, the existing data lineage implementation can only obtain table-level dependencies. For Dependence at the field level is powerless, so you can only rely on manual means to filter and search. If there are thousands of tables in the system, it will consume a lot of labor costs, and it is impossible to avoid errors and omissions in statistics
[0006] Imagine a scenario where analysts use sql to query various indicators on the big data platform for analysis and use every day. If each query system can respond in seconds, there will be a very good user experience, but it is inevitable that some queries will It takes a long time. In order to improve the efficiency of data output, it is necessary to optimize the table structure of the system according to the user's usage habits. This requires statistics on the tables and fields in the SQL used by the user to obtain the heat of the table and the heat of the field. , the first thing to pay attention to and optimize is those tables and fields that are frequently used by users, and the existing technology cannot meet this demand
[0007] With the rapid development of the Internet, the data generated by network applications is also growing explosively. How to effectively manage the production of big data and make the data interpretable has become an urgent problem to be solved. However, for data production Existing data lineage implementation schemes can only achieve table-level granularity analysis, which cannot achieve fine-grained data management. Therefore, it is urgent to propose a method that can implement field-level granularity analysis for data lineage based on structural data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for determining data consanguinity based on structural data
  • Method and device for determining data consanguinity based on structural data
  • Method and device for determining data consanguinity based on structural data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0078] Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily realize them. Also, for clarity, parts not related to describing the exemplary embodiments are omitted in the drawings.

[0079] In the present disclosure, it should be understood that terms such as "comprising" or "having" are intended to indicate the presence of features, numbers, steps, acts, components, parts or combinations thereof disclosed in the specification, and are not intended to exclude one or a plurality of other features, numbers, steps, acts, parts, parts or combinations thereof exist or are added.

[0080] In addition, it should be noted that, in the case of no conflict, the embodiments in the present disclosure and the features in the embodiments can be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides a method and device for determining data consanguinity based on structural data. The method comprises: parsing a selection statement to obtain a source abstract syntax tree, and organizing the table information and the field information obtained by traversing the source abstract syntax tree into a source list layer by layer; obtaining the target abstract syntax treeby parsing the insertion statement, and organizing the table information and the field information obtained by traversing the target abstract syntax tree into the target list layer by layer; traversing the source manifest to obtain the source table information and traversing the target manifest to obtain the target table information to obtain the table granularity data kinship relationship; extracting the target field information for the target table from the target manifest, starting from the first layer of the source manifest, finding the source field in the source table with the same name as the target field information of the target table layer by layer, and when the source table to which the source field belongs no longer comes from the subquery, determining the corresponding source field to be the source field with blood relationship corresponding to the target field information. The scheme can parse the field-level granularity of data consanguinity based on structured data.

Description

technical field [0001] The present disclosure relates to the technical field of data processing, and in particular to a data lineage determination method, device, electronic device and computer-readable storage medium based on structural data. Background technique [0002] There is currently no unified definition of data lineage, which can be roughly understood as the link generated by data. Data lineage describes which tables a table depends on, and how the fields in the table are generated, and further even describes which fields these fields depend on in other tables. The upstream and downstream dependencies of data production can be known through data blood relationship. Data lineage is mainly used in the field of big data. As background knowledge, let's first understand the entire production process of big data. The overall production process of big data is generally divided into four layers: data source, production, warehouse, and data application. The data source is...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/28G06F16/242G06F17/27
CPCG06F40/253
Inventor 梁福坤张传凯刘海宇
Owner RAJAX NETWORK &TECHNOLOGY (SHANGHAI) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products