Method and device for generating graph data based on relational database data

A data generation and graph data technology, applied in the database field, can solve the problems of adjacency matrix consumption of large space, space waste, etc., to achieve the effect of reducing duplication, reducing space utilization, and convenient query

Active Publication Date: 2020-05-08
四川蜀天梦图数据科技有限公司
9 Cites 0 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0007] The technical problem to be solved by the present invention is that common graph data storage data structures include adjacency matrix, adjacency linked list, etc., but the problem with these two methods is that the adj...
View more

Method used

For the adjacency linked list in the prior art, common mode is to use linked list to represent the adjoining apex, just the node in linked list is a vertex, and the technology provided by the invention is to use linked list to connect relation, just in linked list The nodes in the vertex are the relationships belonging to the vertex, and the links to the forw...
View more

Abstract

The invention relates to the technical field of databases, and provides a method and device for generating graph data based on relational database data. The method comprises the steps: loading a relational database into a memory to serve as original data; determining a graph data target type of the original data; wherein the graph data comprises nodes, relationships, node attributes and relationship attributes; and constructing a data structure of a graph data target type, and filling the data structure of the graph data target type with the original data. According to the technology providedby the invention, the relationships are connected by using the linked list, namely, the nodes in the linked list belong to the relationship of the vertexes, and the links of the forward relationship and the backward relationship are stored in each relationship, so that the query of the relationships is more convenient. Besides, the attributes and the relationships in the method are all identifiedby the IDs, so that the same attributes can adopt an ID reference mode, repeated creation of the same value is reduced, and space utilization is reduced.

Application Domain

Relational databasesOther databases indexing +1

Technology Topic

RelvarData science +9

Image

  • Method and device for generating graph data based on relational database data
  • Method and device for generating graph data based on relational database data
  • Method and device for generating graph data based on relational database data

Examples

  • Experimental program(3)

Example Embodiment

[0050] Example 1:
[0051] Embodiment 1 of the present invention provides a method for generating graph data based on relational database data, such as figure 1 Shown, including:
[0052] In step 201, the relational database is loaded into the memory as the original data.
[0053] Wherein, the traditional relational database includes one or more of the open source relational database MySQL, the open source relational database Maria DB, the Microsoft SQL Server relational database, and the Oracle relational database.
[0054] In the embodiment of the present invention, in order to improve the determination of the graph data target type and the generation of the corresponding data structure in the subsequent steps, preferably, when loading the relational database, for each attribute table, select a specified time period according to the database log The relational data is loaded if it is generated within and covers the relational data tables. In this way, it can not only ensure the screening of the data representatives in the typical tables in the relational database, but also ensure that they have due relevance through time constraints.
[0055] In step 202, the graph data target type of the original data is determined; wherein the graph data includes nodes, relationships, node attributes, and relationship attributes.
[0056] The graph data object type is more like a framework built with more nodes and relationships, where the number of nodes can support typical relationships. That is, the main or all relationships are included in the graph data object type.
[0057] In step 203, a data structure of the graph data object type is constructed, and the original data is filled into the data structure of the graph data object type.
[0058] Wherein, in the target type data structure, the node includes: node ID, relationship ID, and node attribute ID; relationship includes: relationship ID, relationship previous node ID, relationship next node ID, relationship previous level relationship ID , The next level of the relationship ID, relationship attribute ID; relationship attributes and node attributes each include: attribute ID, attribute key, attribute value and next attribute ID.
[0059] For the adjacent linked list in the prior art, a common way is to use the linked list to represent the adjacent vertices, that is, the nodes in the linked list are vertices, and the technology provided by the present invention is to use the linked list to connect the relationships, that is, the nodes in the linked list are It belongs to the relationship of the vertex, and the link to the forward relationship and the backward relationship is saved in each relationship, which makes the query of the relationship more convenient. In addition, the attributes and relationships in the present invention adopt ID identification, so that the same attribute can be referenced by ID, which reduces repeated creation of the same value and reduces space utilization.
[0060] In order to further clarify the node and relationship characteristics involved in the above embodiment, the Java class definition used by the graph data node will be cited as follows:
[0061] class Node
[0062] {
[0063] int id;
[0064] int nextRelationShipId;
[0065] int nextPropertyId;
[0066] int labelId;
[0067] }
[0068] The Nodes class above saves a graph data vertex (also can be understood as a node in the embodiment of the present invention) information, where the member variable id is a 4-byte integer number, which uniquely identifies the current node; the member variable nextRelationShipId is a 4-byte integer number , Represents the relationship id that points to the current node; the member variable nextPropertyId is a 4-byte integer, pointing to the first attribute of the current node; the member variable labelId is a 4-byte integer, pointing to the label information of the current node.
[0069] The Java classes used in graph data relationships are defined as follows:
[0070] class Realtionship
[0071] {
[0072] int id;
[0073] int firstNodeId;
[0074] int secondNodeId;
[0075] int firstPreviousRelationshipId;
[0076] int firstNextRelationshipId;
[0077] int secondPreviousRelationshipId;
[0078] int secondNextRelationshipId;
[0079] int propertyId;
[0080] }
[0081] The above Relationship class saves the relationship information in a piece of graph data. All member variables are 4-byte integers, id uniquely identifies the current relationship, firstNodeId is the id of the first node of the relationship, secondNodeId is the id of the second node of the relationship (where, if the relationship is compared to a straight line , The firstNodeId and secondNodeId can be understood as the two end points of the line), firstPreviousRelationshipId is the id of the previous relationship of the first node of the relationship (ie the previous relationship ID of the above-mentioned relationship), and firstNextRelationshipId is the relationship The next relationship of the first node, secondPreviousRelationshipId is the previous relationship of the second node of this relationship, secondNextRelationshipId is the next relationship of the second node of this relationship (ie, the relationship ID of the next level of the relationship), propertyId is The id of the relationship attribute. Special attention here is that the forward relationship and the backward relationship are referenced in the existence of the relationship itself, and the id is used for reference.
[0082] Through the above-mentioned relationship class Realtionship, it can be seen that compared with the objects contained in the relationship described in Example 1, there are at least two different features of "firstPreviousRelationshipId" and "secondPreviousRelationshipId"; this is considered by the present invention. In the graph data, when the original input is filled, it is filled in order, rather than executed concurrently. figure 2 As an example, the label in the node can be understood as the corresponding calibrated according to the filling order. figure 2 As an example, explain the above-mentioned new parameter items "firstPreviousRelationshipId" and "secondPreviousRelationshipId". At this time, if figure 2 The relationship between node 3 and node 4 is the current relationship, the corresponding firstNodeId is node 3, and secondNodeId is node 4, firstPreviousRelationshipId is the relationship between node 2 and node 3, and firstNextRelationshipId is node 3 and node 5 (It should be noted that the relationship between node 3 and node 6 at this time will not be used as the firstNextRelationshipId in the relationship parameter between node 3 and node 4; and the relationship between node 3 and node 6 The relationship ID between node 3 and node 5 will be used as the specific content of firstNextRelationshipId in the relationship parameter between node 3 and node 5. The secondPreviousRelationshipId is figure 2 If it does not exist in the figure shown, it is empty, and secondNextRelationshipId is the relationship between node 4 and node 7.
[0083] Through the above theory, a linked list relationship between relationships can be established, thereby providing a second lookup dimension in addition to the node link list, that is, using the relationship link list and relationship attributes, combined with the node link list and node attributes, and dual search dimensions to provide fast search functions.
[0084] To achieve a high degree of automation in the embodiment of the present invention, the key lies in the determination of the graph data target type of the original data in step 202. Therefore, based on the embodiment of the present invention, a preferred extension scheme is also provided for corresponding automation. Implementation provides technical support, such as image 3 As shown, specifically including:
[0085] In step 301, traverse one or more data tables in the relational database to determine that each data table contains the same data items in the adjacent tables, and/or determine that each data table contains the same data items. The number of data tables that have an association relationship.
[0086] In the specific implementation process, the above-mentioned means 1. Determine that each data table contains the same data items in adjacent tables; Method 2. Determine the number of data tables that have an association relationship with each data table by including the same data items ; The two methods can choose one or the combination and summation method. Among them, the combination method can further avoid the situation that a single method may have the same data table and increase the probability of uniqueness.
[0087] In step 302, sorting is performed according to the number of data tables containing the same data items in the adjacent tables in each data table and/or the number of association relationships contained in each data table.
[0088] In step 303, based on the same data items in adjacent tables contained in one or more of the top ranked tables as nodes, other characteristic data in each table is used as the attribute value of the corresponding node, and the data in the relational database is used The relationship attribute between the table and the data table generates the relationship and the relationship attribute between the corresponding nodes.
[0089] To expand the description in Example 2 Figure 5-Figure 7 Take three relational data tables as an example, where Figure 7 According to the above steps 301-303, the first data table is calculated accordingly, and the student_id and subjetct_id corresponding to the student name and subject name are used as such Figure 8 The nodes in the graph data object type shown are rendered.
[0090] In the embodiment of the present invention, the node attribute class is composed of the current attribute ID, attribute content, and the next attribute ID, so that one or more node attributes under the same node constitute an attribute linked list.
[0091] The (Java) class definitions of the attributes used by the graph data (both node attributes and relationship attributes can use the classes shown below) are as follows:
[0092] class Property
[0093] {
[0094] int id;
[0095] String key;
[0096] Object value;
[0097] int nextPropertyId;
[0098] }
[0099] The above Property class saves a piece of attribute information. This attribute information can be the attribute information of the node or the attribute information of the edge. The member variable id uniquely identifies the current attribute, the key is a string attribute key, and the value is any type of attribute value. nextPropertyId is the id of the next property.
[0100] Through nodes, attributes and relationships, a complete graph data can be constructed. Since the relationship saves the context with the first node and the second node of the relationship, it is easy to traverse all the related relationships of a node, and it can also be very convenient. It is easy to find neighboring nodes through the relationship, so as to continue to expand according to the relationship.
[0101] In the embodiment of the present invention, in step 203, there is a feasible way to implement the data structure of the construction graph data target type, which specifically includes:
[0102] According to the key value in one or more relational data determined as the graph data node, and the relation and relation attribute between the corresponding nodes generated by the relation attribute between the data table and the data table in the relational database, the corresponding graph is constructed Execution script file of data target type data structure.
[0103] Based on the above execution script file generation, the corresponding attribute information filled into the graph data target type according to the original data includes:
[0104] Using the original data as the data source, execute the script file to obtain graph data corresponding to the relational database data.
[0105] In the implementation of the embodiment of the present invention, in addition to the aforementioned analysis of potential characteristics between relational database data tables, the determination of the target type of the graph data can also be achieved by combining with another means. The combination of the other means and the above steps 301-303 includes at least the following two ways: the first one can be completed as the steps before the above steps 301-303, if successful, the steps 301-303 can be skipped. ; The second type can be completed as the steps before the above steps 301-303, and further through the steps 301-303 to adjust the above-mentioned image data target type determined by the template. As a common process of two possible combinations, such as Figure 4 As shown, before determining the graph data target type of the original data, it further includes:
[0106] In step 401, the original data content is analyzed to determine whether it matches the historically established graph data type template.
[0107] In step 402, if it matches, the corresponding graph data type template is directly used as the graph data target type;
[0108] Wherein, the graph data type template includes campus type, enterprise type and government type, and one or a combination of financial type, personnel type and management type.

Example Embodiment

[0109] Example 2:
[0110] The embodiment of the present invention provides a method for converting student information, teacher teaching subject information, and student achievement information into graph data.
[0111] Step S1: Obtain student information, teacher teaching subject information, and student achievement information data in the relational database. The corresponding table data such as Figure 5-Figure 7 Shown.
[0112] Step S2: Determine the graph data type of the relational database information. In this example, determine student information as the node type in the graph data, teacher teaching information as the graph data node type, and student performance information as the graph data relationship type.
[0113] Step S3, construct a data structure of the graph data target type.
[0114] The (Java) classes used by the graph data node are defined as follows:
[0115] class Node
[0116] {
[0117] int id;
[0118] int nextRelationShipId;
[0119] int nextPropertyId;
[0120] int labelId;
[0121] }
[0122] The above Nodes class saves a graph data vertex information, where the member variable id is a 4-byte integer number, which uniquely identifies the current node; the member variable nextRelationShipId is a 4-byte integer number, which represents the relationship id that points to the current node; the member variable nextPropertyId It is a 4-byte integer that points to the first attribute of the current node; the member variable labelId is a 4-byte integer that points to the label information of the current node.
[0123] The (Java) classes used in graph data relationships are defined as follows:
[0124] class Realtionship
[0125] {
[0126] int id;
[0127] int firstNodeId;
[0128] int secondNodeId;
[0129] int firstPreviousRelationshipId;
[0130] int firstNextRelationshipId;
[0131] int secondPreviousRelationshipId;
[0132] int secondNextRelationshipId;
[0133] int propertyId;
[0134] }
[0135] The above Relationship class saves the relationship information in a piece of graph data. All member variables are 4-byte integers, id uniquely identifies the current contact, firstNodeId is the id of the first node of the contact, secondNodeId is the id of the second node of the relationship, and firstPreviousRelationshipId is the first node of the relationship The id of the previous relationship, firstNextRelationshipId is the next relationship of the first node of this relationship, secondPreviousRelationshipId is the previous relationship of the second node of this relationship, secondNextRelationshipId is the next relationship of the second node of this relationship, propertyId Is the id of the relationship attribute. Special attention here is that the forward relationship and the backward relationship are referenced in the existence of the relationship itself, and the id is used for reference.
[0136] The (Java) class definition of the attributes used by the graph data is as follows
[0137] class Property
[0138] {
[0139] int id;
[0140] String key;
[0141] Object value;
[0142] int nextPropertyId;
[0143] }
[0144] The above Property class saves a piece of attribute information. This attribute information can be the attribute information of the node or the attribute information of the edge. The member variable id uniquely identifies the current attribute, the key is a string attribute key, and the value is any type of attribute value. nextPropertyId is the id of the next property.
[0145] Through nodes, attributes and relationships, a complete graph data can be constructed. Since the relationship saves the context with the first node and the second node of the relationship, it is easy to traverse all the related relationships of a node, and it can also be very convenient. It is easy to find neighboring nodes through the relationship, so as to continue to expand according to the relationship.
[0146] After constructing the corresponding graph data structure, the graph logical structure corresponding to the original data is as follows Figure 8.
[0147] Step S4: Fill in the attribute information in the data structure of the target graph.
[0148] Fill the information in the relational data into the target graph structure. Take Zhang San in the student information table as an example. The target graph data structure is the node. Next, extract the id and value in the student information table, and extract it as the first Attribute, name and value are extracted as the second attribute, gender and value are extracted as the third attribute, point the nextPropertyId member variable of the second attribute to the first attribute, and point the nextPropertyId member variable of the third attribute to the second attribute Property, since the first property does not have a corresponding nextPropertyId value, it is set to empty, and the property filling of the relationship is also the same. The structure of the graph data after filling the attributes is as follows Picture 9 As shown, so far, the entire process of converting traditional relational database data into graph data has been completed.

Example Embodiment

[0149] Example 3:
[0150] Such as Picture 10 What is shown is a schematic structural diagram of a content recommendation apparatus based on a human body state according to an embodiment of the present invention. The content recommendation apparatus based on the human body state in this embodiment includes one or more processors 21 and a memory 22. among them, Picture 10 Take one processor 21 as an example.
[0151] The processor 21 and the memory 22 can be connected by a bus or other means, Picture 10 Take the bus connection as an example.
[0152] As a non-volatile computer-readable storage medium, the memory 22 can be used to store non-volatile software programs and non-volatile computer-executable programs, such as the method for generating graph data based on relational database data in Embodiment 1. . The processor 21 executes the method of generating graph data based on relational database data by running non-volatile software programs and instructions stored in the memory 22.
[0153] The memory 22 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices. In some embodiments, the memory 22 may optionally include memories remotely provided with respect to the processor 21, and these remote memories may be connected to the processor 21 through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
[0154] The program instructions/modules are stored in the memory 22, and when executed by the one or more processors 21, the method for generating graph data based on relational database data in Embodiment 1 is executed, for example, the above describe Figure 1-Figure 4 The various steps shown.
[0155] It is worth noting that the information interaction and execution process between the above-mentioned device and the modules and units in the system are based on the same concept as the processing method embodiment of the present invention. For details, please refer to the description in the method embodiment of the present invention. , I won’t repeat it here.
[0156] A person of ordinary skill in the art can understand that all or part of the steps in the various methods of the embodiments can be completed by a program instructing relevant hardware. The program can be stored in a computer-readable storage medium. The storage medium can include: Read memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

Integrated management system

InactiveCN108242038AEasy for comprehensive managementConvenient query
Owner:天津利廷灏远食品科技有限公司

Method and device for inquiring recycled articles on block chain

PendingCN112256950AConvenient queryImprove experience
Owner:北京抱朴再生环保科技有限公司

Log receiving and processing method and device, electronic equipment and storage medium

PendingCN113806158AConvenient queryFast data processing
Owner:BEIJING QIANXIN TECH +1

Retail and catering intelligent management system

PendingCN112581041AEasy to manage and process dataConvenient query
Owner:深圳巨为科技开发有限公司

Classification and recommendation of technical efficacy words

  • Convenient query
  • reduce duplication

Instrument for monitoring gas emission quantity in roadway

ActiveCN102619562AConvenient query
Owner:SHANDONG UNIV OF SCI & TECH

VoLTE voice quality assessment method and system

InactiveCN108428459AConvenient queryEvaluation results are reliable
Owner:CHINA MOBILE GRP GUANGDONG CO LTD +1

Anti-theft door and window automatic detection method based on network

InactiveCN102722949AConvenient queryhandy history
Owner:刘源 +2

Character recognition method and stylus

InactiveCN104899560Agood for long term storageConvenient query
Owner:NUBIA TECHNOLOGY CO LTD

Character inputting method and electronic equipment

ActiveCN102937858ASaves re-typing input informationreduce duplication
Owner:DONGGUAN YULONG COMM TECH +1

Data migration method and system

ActiveCN105956191Areduce duplicationimprove work efficiency and
Owner:SUZHOU LANGCHAO INTELLIGENT TECH CO LTD

DPC-based method and system for detecting abnormality of multi-door system of rail vehicle

InactiveCN109374318Areduce duplicationUniversal
Owner:NANJING KANGNI MECHANICAL & ELECTRICAL +1

Multi-delay continuous shooting method for digital camera

InactiveCN101895690Areduce duplicationEasy to shoot
Owner:TIANJIN SAMSUNG OPTO ELECTRONICS

Method and apparatus for displaying data on terminal screen

ActiveCN105739932Areduce related lossesreduce duplication
Owner:ADVANCED NEW TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products