Unlock instant, AI-driven research and patent intelligence for your innovation.

Lineage diagram abstract method based on node structure similarity and semantic proximity

A technology of node structure and lineage graph, applied in the field of lineage graph, which can solve problems such as difficult to understand and huge results.

Active Publication Date: 2020-05-08
FUDAN UNIV
View PDF6 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the lineage data is accumulated over time, which makes the results of the lineage query very large
If the query results are displayed in the form of a lineage graph, the graph may contain thousands of nodes. Such a lineage graph is difficult for readers to understand intuitively.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Lineage diagram abstract method based on node structure similarity and semantic proximity
  • Lineage diagram abstract method based on node structure similarity and semantic proximity
  • Lineage diagram abstract method based on node structure similarity and semantic proximity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0072] See appendix 1 for the implementation pseudocode of the genealogy graph summary method based on node structure similarity and semantic proximity proposed by the present invention.

[0073] The complexity of the above training algorithm is O(|V||E|+|C|^2+|D|^2), where |V| is the number of nodes in the lineage graph, |E| is the number of edges in the lineage graph, |C| is the set size of activities that may output the same, and |D| is the number of data nodes in the lineage graph. The time complexity of the algorithm is polynomial time complexity, which is acceptable.

[0074] Based on the logic of the above algorithm, the present invention uses 36 successful running scientific workflow lineages of Taverna provenance to synthesize a lineage graph data set with 1502 nodes and 1598 edges, and on the basis of this data set, it tests the Variation of the pruning gain for a variation of the class threshold σ. The pruning gain is used to evaluate how compact the post-digest l...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of lineage, and particularly relates to a lineage graph abstract method based on node structure similarity and semantic proximity. The method comprises twostages: a similar node set identification stage: gathering similar nodes together according to the structural similarity and semantic proximity of the nodes, and identifying a series of similar nodesets; in the node set replacement stage, wherein a lineage diagram comprises various types of nodes such as data nodes, movable nodes and agent nodes, adopting different replacement strategies for different types of node sets so that effectiveness of the lineage diagram after replacement is guaranteed. The semantic distance between the active nodes is defined by combining the influence proximity and the time proximity between the active nodes, and finally the active node set with the semantic proximity is identified. Super nodes are used for replacing node sets with similar structures and semantics, similar nodes in the lineage diagram are extracted, the structural complexity and semantic complexity of the lineage diagram are reduced, and the understandable degree of the lineage diagram isimproved.

Description

technical field [0001] The invention belongs to the technical field of genealogy graphs, and in particular relates to a method for summarizing a genealogy graph based on node structure similarity and semantic proximity. Background technique [0002] Lineage data records the history of data evolution, and for a certain data lineage query, the data generation process can be described to help with result reproduction, trust enhancement, quality assessment, etc. However, the lineage data is continuously accumulated over time, which makes the result of the lineage query very large. If the query results are displayed in the form of a lineage graph, the graph may contain tens of thousands of nodes. Such a lineage graph is difficult for readers to understand intuitively. Most of the existing genealogy graph summarization algorithms need a large amount of human assistance to identify similar node sets, such as identification based on the knowledge base generated by a large number of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/36G06F16/35G06F40/30
CPCG06F16/367G06F16/35
Inventor 卢暾周倍思于方玉张鹏顾宁
Owner FUDAN UNIV