Data bloodline full-link analysis method and system based on graph technology

By transforming data sources into graph data and utilizing graph theory algorithms, the problems of high computational load and complexity in traditional data lineage storage are solved, enabling end-to-end display and efficient data analysis, and reducing data governance costs.

CN116303548BActive Publication Date: 2026-06-26PING AN BANK CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
PING AN BANK CO LTD
Filing Date
2023-04-07
Publication Date
2026-06-26

Smart Images

  • Figure CN116303548B_ABST
    Figure CN116303548B_ABST
Patent Text Reader

Abstract

The application provides a data blood relationship full-link analysis method based on a graph technology, characterized in that the data blood relationship full-link analysis method based on the graph technology comprises: converting data in a data source into graph data according to a preset structure, wherein the graph data comprises at least one vertex, and the vertex comprises a vertex parameter and an edge parameter; obtaining an analysis result according to a preset algorithm based on the vertex parameter and the edge parameter; and forming an analysis report according to the analysis result. The preset algorithm involved includes but is not limited to a minimum spanning tree algorithm, a DFS algorithm, a degree centrality algorithm, a Louvain algorithm and the like. In addition, the application further provides a system and a computer device. The technical scheme of the application effectively solves the problem that the existing data blood relationship analysis cannot be intelligently and efficiently analyzed.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of financial technology, and in particular to a data lineage analysis method, system, and computer equipment based on graph technology. Background Technology

[0002] Traditionally, data lineage relationships are stored in structured databases, with data attributes and relationships stored in separate tables. Each row in the relationship table only stores upstream and downstream relationships. However, for large-scale data warehouses, the volume of relational data is enormous. Furthermore, lineage relationships have a unique multi-source characteristic, meaning the same data can have multiple sources. This significantly increases the complexity of data relationships, especially field-level lineages, leading to an exponential increase in computational load during lineage analysis and placing immense pressure on database performance. Therefore, traditional data governance platforms only display upstream and downstream relationships initially, generating tasks for multi-level lineage query analysis only when needed. This approach makes it difficult to identify data architecture problems at a global level, potentially leading to issues such as duplicate processing and circular calls, and significantly increasing the costs of data tracing, analysis, and development. Summary of the Invention

[0003] This invention provides a data lineage analysis method, system, and computer equipment based on graph technology, which can display data lineage relationships across the entire chain and thus perform data analysis.

[0004] In a first aspect, embodiments of the present invention provide a graph-based data lineage end-to-end analysis method, the graph-based data lineage end-to-end analysis method comprising:

[0005] The data in the data source is transformed into graph data according to a preset structure, wherein the graph data includes at least one vertex, and the vertex includes vertex parameters and edge parameters;

[0006] The vertex parameters and edge parameters are analyzed using a preset algorithm to obtain the analysis results; and

[0007] An analysis report is generated based on the analysis results.

[0008] Preferably, the edge parameters include edge weights, where the edge weights are the parameters of the connection line between two vertices. Obtaining the analysis results from the vertex parameters and the edge parameters using a preset algorithm further includes:

[0009] Select any two vertices as calculation points;

[0010] A plurality of connection lines are obtained based on the vertex parameters and / or edge parameters corresponding to the calculation point, wherein the plurality of connection lines include a plurality of connection lines;

[0011] Calculate the sum of the weights of the corresponding edge weights in each of the aforementioned connection lines;

[0012] The connection path with the smallest sum of weights is compared and selected as the shortest path; and

[0013] The analysis results are generated based on the shortest path.

[0014] Preferably, obtaining the analysis results of the vertex parameters and the edge parameters according to a preset algorithm further includes:

[0015] Select any one of the vertexes as the initial point according to the vertex parameters;

[0016] The initial point is assigned to the link and an access identifier is set for the initial point;

[0017] Mark the initial point as the ingress point;

[0018] The existence of a connection point is determined based on the edge parameters, wherein the connection point is the vertex that does not have the access identifier and has a connection relationship with the returning point;

[0019] When the connection point exists, the connection point is included in the link and an access identifier is set for the connection point;

[0020] Replace the connection point with the ingress point and determine whether to reacquire the connection point based on the replaced ingress point;

[0021] When the connection point does not exist, the link is used as the full link of the initial point;

[0022] Sequentially obtain all the full links of the vertices; and

[0023] Based on the full-link formation analysis results.

[0024] Preferably, the preset algorithm includes a modularity calculation formula, and obtaining the analysis results of the vertex parameters and the edge parameters according to the preset algorithm further includes:

[0025] Each vertex is assigned a number of community identifiers, wherein the community identifiers include the current community and neighboring communities;

[0026] Calculate the module degree increment of the vertex when it leaves the current community and moves into the neighboring community according to the module degree calculation formula;

[0027] The vertex is moved from the current community to the neighboring community with the largest module degree increment and merged to form a new current community;

[0028] The target community is obtained through iterative processing based on preset values; and

[0029] Based on the analysis results of the target community formation.

[0030] Preferably, the preset algorithm includes a degree centrality formula, and obtaining the analysis results of the vertex parameters and edge parameters according to the preset algorithm further includes:

[0031] Calculate the centrality of any given vertex using the vertex parameters and / or edge parameters according to the degree centrality formula; and

[0032] The analysis results are generated based on the centrality value.

[0033] Preferably, the data includes table data and field data, and converting the data in the data source into graph data according to a preset structure includes:

[0034] Transform the table data and field data into vertices in the graph data; and

[0035] The relationship between the table data and the field data is transformed into the connection relationship between vertices in the graph data; and

[0036] Save the vertices to the graph database.

[0037] Preferably, converting the relationship between the table data and the field data into the connection relationship between vertices in the graph data includes:

[0038] Obtain the relationships between different table data, the relationships between different field data, and the relationships between the table data and the field data; and

[0039] The relationships between different table data, the relationships between different field data, and the relationships between table data and field data are transformed into connection relationships between vertices.

[0040] In a second aspect, embodiments of the present invention provide a computer device, the computer device comprising:

[0041] Memory, used to store program instructions; and

[0042] A processor for executing the program instructions to implement the graph-based data lineage end-to-end analysis method as described in any of the above descriptions.

[0043] Thirdly, embodiments of the present invention provide a graph data analysis system, wherein the graph data analysis includes:

[0044] The transformation module is used to transform data from the data source into graph data according to a preset structure. The graph data includes at least one vertex, and the vertex includes vertex parameters and edge parameters.

[0045] The analysis module is used to obtain analysis results from the vertex parameters and edge parameters according to a preset algorithm; and

[0046] The generation module is used to generate an analysis report based on the analysis results.

[0047] Preferably, the data includes table data and field data, and the conversion module includes:

[0048] The first conversion module is used to convert the table data and the field data into vertices in the graph data;

[0049] The second conversion module is used to convert the relationship between the table data and the field data into the connection relationship between vertices in the graph data; and

[0050] The saving module is used to save the vertices to the graph database.

[0051] The aforementioned graph-based data lineage analysis method, system, and computer equipment transfer data from a data source into a graph database in a corresponding format. The transformed graph data includes at least one vertex, with vertex parameters and edge parameters. Analysis results are obtained from the vertex and edge parameters using a preset algorithm. An analysis report is then generated based on these results. Data lineage relationships include data attributes such as tables, fields, and importance levels, linked together through processing methods and function calls. Graph analysis technology has a natural advantage in handling complex relationships. Graph analysis can establish an interpretable, end-to-end lineage graph, revealing the hierarchical number of lineages and analyzing the complexity of the data architecture, serving as an important tool for data governance. Attached Figure Description

[0052] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the structures shown in these drawings without creative effort.

[0053] Figure 1 A flowchart illustrating the steps of a graph-based data lineage analysis method provided in this embodiment of the invention.

[0054] Figure 2 for Figure 1 The flowchart for the first sub-step is provided.

[0055] Figure 3 for Figure 1 The flowchart for the second sub-step is provided.

[0056] Figure 4 for Figure 1 The provided flowchart for the third sub-step.

[0057] Figure 5 for Figure 1 The flowchart for the fourth sub-step is provided.

[0058] Figure 6 for Figure 1 The flowchart for the fifth sub-step is provided.

[0059] Figure 7 This is a schematic diagram of the internal structure of the graph data analysis system provided in an embodiment of the present invention.

[0060] Figure 8 for Figure 7 The provided diagram shows the internal structure of the graph data analysis system.

[0061] Figure 9 for Figure 8 A schematic diagram of the internal structure of the provided conversion module.

[0062] The realization of the objective, functional features and advantages of the present invention will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0063] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without inventive effort are within the scope of protection of this invention.

[0064] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification, claims, and accompanying drawings of this application are used to distinguish similar planned objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data are interchangeable where appropriate; in other words, the described embodiments are implemented according to a sequence other than that illustrated or described herein. Furthermore, the terms “comprising” and “having,” and any variations thereof, may also include other content; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0065] It should be noted that the descriptions involving "first," "second," etc., in this invention are for descriptive purposes only and should not be construed as indicating or implying their relative importance or implicitly specifying the number of technical features indicated. Therefore, features defined with "first" or "second" may explicitly or implicitly include one or more of that feature. Furthermore, the technical solutions of the various embodiments can be combined with each other, but this must be based on the ability of those skilled in the art to implement them. If the combination of technical solutions is contradictory or impossible to implement, it should be considered that such a combination of technical solutions does not exist and is not within the scope of protection claimed by this invention.

[0066] Please refer to the following: Figure 1 This is a flowchart of the steps of the graph-based data lineage analysis method provided in the embodiments of the present invention.

[0067] The graph-based data lineage analysis method includes the following steps.

[0068] Step S101: Convert the data in the data source into graph data according to the preset structure.

[0069] Specifically, the graph data includes at least one vertex, and each vertex includes vertex parameters and edge parameters. Current structured data entities are generally categorized into databases, tables, and fields. Tables and fields are selected as vertices in the graph database. There are three types of relationships between two vertex entities: between tables, between fields, and between tables and fields. Depending on the data source and the graph database, there are different data access schemes, which will not be elaborated upon in this paper.

[0070] Step S102: Obtain the analysis results of the vertex parameters and the edge parameters according to a preset algorithm.

[0071] Step S103: Generate an analysis report based on the analysis results.

[0072] In summary, the data from the data source is transformed into graph data according to different data structures and then imported into the graph database. The graph data is based on graph theory. According to the preset structure, the entities in the data source are represented by tables and fields as vertices, and the relationships between tables, fields, and between tables and fields are used as the connection relationships between vertices. Then, the corresponding analysis results are obtained through preset algorithms, and the analysis results are summarized to form an analysis report.

[0073] Please refer to the following: Figure 2 This is a flowchart of the first sub-step of the graph-based data lineage analysis method provided in this embodiment of the invention. The edge parameters include edge weights, which are the parameters of the connection line between two vertices. Step S102 includes the following steps.

[0074] Step S202: Select any two vertices as calculation points.

[0075] Specifically, two vertices are arbitrarily selected from the graph data as calculation points.

[0076] Step S204: Obtain several connection lines based on the vertex parameters and / or edge parameters corresponding to the calculation points.

[0077] Specifically, there are connections between other vertices between two calculation points, and between other vertices and other vertices. These connections include several of the aforementioned vertices. For example, there are two connections between calculation point A and calculation point D: A to B to C to D and A to E to D.

[0078] Step S206: Calculate the sum of the weights of the corresponding edge weights in each of the connection lines.

[0079] Specifically, for example, if there is a connecting line A to B to C to D between calculation point A and calculation point D, with edge AB having edge weight a, edge BC having edge weight b, and edge CD having edge weight c, then the total weight of the connecting line ABCD is the sum of a + b + c.

[0080] Step S208: Compare and obtain the connection line with the smallest sum of weights as the shortest path.

[0081] Specifically, for example, if the total weight of ABCD is 6 and the total weight of AED is 5, then the total weight of AED is greater than that of ABCD, which means that AED is the shortest path between points A and D.

[0082] Step S210: Analyze the results based on the shortest path.

[0083] Specifically, the shortest path can represent the fastest transformation relationship or the fastest path between two pieces of data, that is, it can reflect whether there is an optimized reference path.

[0084] In summary, this preset algorithm is the minimum spanning tree algorithm. It obtains any two points as calculation points, obtains the vertices and connecting lines between the two calculation points by calculating the vertex parameters and edge parameters of the calculation points, determines the shortest path by calculating the sum of the weights in the connecting lines, and finally generates an analysis report based on the shortest path.

[0085] Please refer to the following: Figure 3 This is a flowchart of the second sub-step of the graph-based data lineage analysis method provided in this embodiment of the invention. Step S102 further includes the following steps.

[0086] Step S302: Select any one of the vertices as the initial point according to the vertex parameters.

[0087] Step S304: Assign the initial point to the link and mark the initial point as an access point.

[0088] Step S306: Mark the initial point as the ingress point.

[0089] Step S308: Determine whether there is a connection point based on the edge parameters. If the connection point exists, proceed to step S310; if the connection point does not exist, proceed to step S314.

[0090] Specifically, the connection point is the vertex that has no access identifier but is connected to the return point. That is, it is the vertex that is connected to the return point and has not been visited before.

[0091] Step S310: Assign the connection point to the link and mark the connection point as an access point.

[0092] Step S312: Replace the connection point with the return point and determine whether to reacquire the connection point based on the replaced return point.

[0093] Specifically, this involves continuously acquiring surrounding vertices that are connected but have not been visited until no such vertex exists.

[0094] Step S314: Use the link as the full link of the initial point.

[0095] Step S316: Sequentially obtain all full links of the vertices.

[0096] Step S318: Analyze the entire chain based on the results.

[0097] In summary, this preset algorithm is a depth-first search algorithm. It takes any vertex as the initial point, marks the initial point as visited, and then includes it as the destination point. It takes any unvisited vertex around the destination point that is connected to the initial point as the connection point, marks the connection point, and replaces the destination point. This process is repeated until there is no unvisited vertex around the destination point that is connected to the initial point. The depth-first search algorithm can find the source or end of a certain data, and all the links obtained can be used as the basis for data tracing analysis.

[0098] Please refer to the following: Figure 4 This is a flowchart of the third sub-step of the graph-based data lineage analysis method provided in this embodiment of the invention. The preset algorithm includes a modularity calculation formula, and step S102 further includes the following steps.

[0099] Step S402: Form several community identifiers for each of the vertices.

[0100] Specifically, each vertex within the selected area is marked as a community identifier. This community includes the current community and neighboring communities. In the initial stage, each vertex is a current community, and other vertices are neighboring communities relative to the current community.

[0101] Step S404: Calculate the module degree increment of the vertex when it leaves the current community and moves into the neighboring community according to the module degree calculation formula.

[0102] Specifically, the modularity increment of the current vertex when it moves from the current community to another neighboring community is calculated using the modularity calculation formula based on the edge parameters of the vertex. That is, the degree of integration when it moves in.

[0103] Step S406: Move the vertex from the current community to the neighboring community with the largest module degree increment to form a new current community.

[0104] Specifically, moving a vertex from the current community to the neighboring community with the largest increase in modularity value and merging it into a new current community means that two vertices with high fusion degree can be considered to become a new current community.

[0105] Step S408: Iteratively process the preset values ​​to obtain the target community.

[0106] Specifically, by continuously absorbing nearby vertices from the current community using preset values, a target community is eventually formed.

[0107] Step S410: Analyze the results based on the target community.

[0108] Specifically, the Louvain algorithm is used. The Louvain algorithm is a community detection method that can find groups with clustering characteristics in a graph.

[0109] In summary, by constructing vertices into communities and continuously merging communities using the modularity calculation formula, and setting a predetermined number of iterations, the community fit is satisfied without losing the fit of vertices within a community. This allows us to identify clusters with agglomerative characteristics in the graph. These clusters are characterized by strong internal connections and sparse external connections with other points. Discovering these clusters within the data's lineage allows us to further investigate commonalities among the data, such as whether there are repeated calls or multiple processing of fields, and to consolidate the data with repeated processing.

[0110] Please refer to the following: Figure 5This is a flowchart of the fourth sub-step of the graph-based data lineage analysis method provided in this embodiment of the invention. The preset algorithm includes a degree centrality formula, and step S102 further includes the following steps.

[0111] Step S502: Calculate the centrality of the vertex according to the degree centrality formula by taking the vertex parameters and / or edge parameters corresponding to any vertex.

[0112] Specifically, the degree centrality of a vertex can be defined as: C D (v i )=d i =∑ j A ij , where d is a vertex and v is an edge.

[0113] Step S504: Analyze the results based on the centrality magnitude.

[0114] In summary, in a graph network, the value of a vertex depends on its position within the network; the more central the vertex, the greater its value. In data lineage, a more central vertex indicates more upstream and downstream connections, more frequent references, and therefore, its corresponding data and tasks should be more important, deserving more resources to ensure data stability and quality. The degree centrality formula can be used to calculate the vertex's position within the network, thus determining its value.

[0115] Please refer to the following: Figure 6 This is a flowchart of the fifth sub-step of the graph-based data lineage analysis method provided in this embodiment of the invention. The data includes table data and field data, and step S101 includes the following steps.

[0116] Step S602: Convert the table data and the field data into vertices in the graph data.

[0117] Step S604: Convert the relationship between the table data and the field data into the connection relationship between vertices in the graph data.

[0118] Specifically, the relationships between different table data, the relationships between different field data, and the relationships between the table data and the field data are obtained; and the relationships between different table data, the relationships between different field data, and the relationships between the table data and the field data are transformed into connection relationships between the vertices.

[0119] Step S606: Save the vertex to the graph database.

[0120] In summary, by treating the data from the data source as vertices in the graph data and the relationships between the table data and field data as edges in the graph data, a graph database can be effectively constructed, ensuring that the data is valuable without being overly complex.

[0121] Please refer to the following: Figure 9 This is a schematic diagram of the internal structure of a computer device provided in an embodiment of this application. The computer device 10 includes a memory 11 and a processor 12. The memory 11 is used to store program instructions, and the processor 12 is used to execute program instructions to implement the above-described graph-based data lineage analysis.

[0122] In some embodiments, the processor 12 may be a central processing unit (CPU), controller, microcontroller, microprocessor or other data processing chip, used to run program instructions stored in the memory 11.

[0123] The memory 11 includes at least one type of readable storage medium, including flash memory, hard disk, multimedia card, card-type memory (e.g., SD or DX memory), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 11 may be an internal storage unit of a computer device, such as a hard disk. In other embodiments, the memory 11 may be an external storage device of a computer device, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc., provided on the computer device. Furthermore, the memory 11 may include both internal and external storage units of the computer device. The memory 11 can be used not only to store application software and various types of data installed on the computer device, such as code for graph-based data lineage analysis, but also to temporarily store data that has been output or will be output.

[0124] Please refer to the following: Figure 7 This is a schematic diagram of the internal structure of the graph data analysis system provided in an embodiment of the present invention. The present invention provides a graph data analysis system 1000, which includes a transformation module 101, an analysis module 102, and a generation module 103.

[0125] The conversion module 101 is used to convert data in the data source into graph data according to a preset structure, wherein the graph data includes at least one vertex, and the vertex includes vertex parameters and edge parameters.

[0126] The analysis module 102 is used to obtain analysis results from the vertex parameters and the edge parameters according to a preset algorithm.

[0127] The generation module 103 is used to generate an analysis report based on the analysis results.

[0128] Please refer to the following: Figure 8 This is a schematic diagram of the internal structure of the conversion module provided in an embodiment of the present invention. The data includes table data and field data, and the conversion module 101 includes a first conversion module 201, a second conversion module 202, and a storage module 203.

[0129] The first conversion module 201 is used to convert the table data and the field data into vertices in the graph data.

[0130] The second conversion module 202 is used to convert the relationship between the table data and the field data into the connection relationship between vertices in the graph data.

[0131] Specifically, the relationships between different table data, the relationships between different field data, and the relationships between the table data and the field data are obtained; and the relationships between different table data, the relationships between different field data, and the relationships between the table data and the field data are transformed into connection relationships between the vertices.

[0132] The storage module 203 is used to save the vertices to the graph database.

[0133] In the above embodiments, data from the data source is transferred into a graph database in a corresponding form. The transformed graph data includes at least one vertex, and each vertex includes vertex parameters and edge parameters. Analysis results are obtained from the vertex parameters and edge parameters according to a preset algorithm. An analysis report is generated based on the analysis results. Data lineage relationships include data attributes such as tables, fields, and importance levels, which are linked together through processing methods, function calls, etc. Graph analysis technology has a natural advantage in handling complex relationships. Graph analysis can establish an interpretable, end-to-end lineage graph, showing the number of levels of lineage across the entire chain, and analyzing the complexity of the data architecture.

[0134] Obviously, those skilled in the art can make various modifications and variations to this invention without departing from its spirit and scope. Therefore, if these modifications and variations fall within the scope of the claims of this invention and their equivalents, this invention also intends to include these modifications and variations.

[0135] The above-listed embodiments are merely preferred embodiments of the present invention and should not be construed as limiting the scope of the present invention. Therefore, any equivalent variations made in accordance with the claims of the present invention are still within the scope of the present invention.

Claims

1. A data lineage analysis method based on graph technology, characterized in that, The graph-based data lineage analysis method includes: The data in the data source is transformed into graph data according to a preset structure, wherein the graph data includes at least one vertex, and the vertex includes vertex parameters and edge parameters; The vertex parameters and edge parameters are analyzed using a preset algorithm to obtain the analysis results; and An analysis report will be generated based on the analysis results; The preset algorithm includes at least a depth-first search algorithm; obtaining the analysis results of the vertex parameters and edge parameters according to the preset algorithm also includes: Select any one of the vertexes as the initial point according to the vertex parameters; The initial point is assigned to the link and an access identifier is set for the initial point; Mark the initial point as the ingress point; The existence of a connection point is determined based on the edge parameters, wherein the connection point is the vertex that does not have the access identifier and has a connection relationship with the returning point; When the connection point exists, the connection point is included in the link and an access identifier is set for the connection point; Replace the connection point with the ingress point and determine whether to reacquire the connection point based on the replaced ingress point; When the connection point does not exist, the link is used as the full link of the initial point; Sequentially obtain all the full links of the vertices; and Based on the full-link formation analysis results.

2. The data lineage analysis method based on graph technology as described in claim 1, characterized in that, The edge parameters include edge weights, which are parameters of the connection line between two vertices. Obtaining the analysis results from the vertex parameters and the edge parameters using a preset algorithm also includes: Select any two vertices as calculation points; A plurality of connection lines are obtained based on the vertex parameters and / or edge parameters corresponding to the calculation point, wherein the plurality of connection lines include a plurality of connection lines; Calculate the sum of the weights of the corresponding edge weights in each of the aforementioned connection lines; The connection path with the smallest sum of weights is compared and selected as the shortest path; and The analysis results are generated based on the shortest path.

3. The data lineage analysis method based on graph technology as described in claim 1, characterized in that, The preset algorithm includes a modularity calculation formula, and obtaining the analysis results of the vertex parameters and edge parameters according to the preset algorithm also includes: Each vertex is assigned a number of community identifiers, wherein the community identifiers include the current community and neighboring communities; Calculate the module degree increment of the vertex when it leaves the current community and moves into the neighboring community according to the module degree calculation formula; The vertex is moved from the current community to the neighboring community with the largest module degree increment and merged to form a new current community; The target community is obtained through iterative processing based on preset values; and Based on the analysis results of the target community formation.

4. The data lineage analysis method based on graph technology as described in claim 1, characterized in that, The preset algorithm includes a degree centrality formula, and obtaining the analysis results of the vertex parameters and edge parameters according to the preset algorithm also includes: Calculate the centrality of any given vertex using the vertex parameters and / or edge parameters according to the degree centrality formula; and The analysis results are generated based on the centrality value.

5. The data lineage analysis method based on graph technology as described in claim 1, characterized in that, The data includes table data and field data. Transforming the data from the data source into graph data according to a preset structure includes: Transform the table data and the field data into vertices in the graph data; The relationship between the table data and the field data is transformed into the connection relationship between vertices in the graph data; and Save the vertices to the graph database.

6. The data lineage analysis method based on graph technology as described in claim 5, characterized in that, Converting the relationship between the table data and the field data into the connection relationship between vertices in the graph data includes: Obtain the relationships between different table data, the relationships between different field data, and the relationships between the table data and the field data; and The relationships between different table data, the relationships between different field data, and the relationships between table data and field data are transformed into connection relationships between vertices.

7. A computer device, characterized in that, The computer device includes: Memory, used to store program instructions; and A processor for executing the program instructions to implement the graph-based data lineage analysis method as described in any one of claims 1 to 6.

8. A graph data analysis system, characterized in that, The graph data analysis includes: The transformation module is used to transform data from the data source into graph data according to a preset structure. The graph data includes at least one vertex, and the vertex includes vertex parameters and edge parameters. The analysis module is used to obtain analysis results from the vertex parameters and edge parameters according to a preset algorithm; and The generation module is used to generate an analysis report based on the analysis results; The preset algorithm includes at least a depth-first search algorithm; obtaining the analysis results of the vertex parameters and edge parameters according to the preset algorithm also includes: Select any one of the vertexes as the initial point according to the vertex parameters; The initial point is assigned to the link and an access identifier is set for the initial point; Mark the initial point as the ingress point; The existence of a connection point is determined based on the edge parameters, wherein the connection point is the vertex that does not have the access identifier and has a connection relationship with the returning point; When the connection point exists, the connection point is included in the link and an access identifier is set for the connection point; Replace the connection point with the ingress point and determine whether to reacquire the connection point based on the replaced ingress point; When the connection point does not exist, the link is used as the full link of the initial point; Sequentially obtain all the full links of the vertices; and Based on the full-link formation analysis results.

9. The graph data analysis system as described in claim 8, characterized in that, The data includes table data and field data, and the conversion module includes: The first conversion module is used to convert the table data and the field data into vertices in the graph data; The second conversion module is used to convert the relationship between the table data and the field data into the connection relationship between vertices in the graph data; and The saving module is used to save the vertices to the graph database.