[0050] Example 1:
[0051] Embodiment 1 of the present invention provides a method for generating graph data based on relational database data, such as figure 1 Shown, including:
[0052] In step 201, the relational database is loaded into the memory as the original data.
[0053] Wherein, the traditional relational database includes one or more of the open source relational database MySQL, the open source relational database Maria DB, the Microsoft SQL Server relational database, and the Oracle relational database.
[0054] In the embodiment of the present invention, in order to improve the determination of the graph data target type and the generation of the corresponding data structure in the subsequent steps, preferably, when loading the relational database, for each attribute table, select a specified time period according to the database log The relational data is loaded if it is generated within and covers the relational data tables. In this way, it can not only ensure the screening of the data representatives in the typical tables in the relational database, but also ensure that they have due relevance through time constraints.
[0055] In step 202, the graph data target type of the original data is determined; wherein the graph data includes nodes, relationships, node attributes, and relationship attributes.
[0056] The graph data object type is more like a framework built with more nodes and relationships, where the number of nodes can support typical relationships. That is, the main or all relationships are included in the graph data object type.
[0057] In step 203, a data structure of the graph data object type is constructed, and the original data is filled into the data structure of the graph data object type.
[0058] Wherein, in the target type data structure, the node includes: node ID, relationship ID, and node attribute ID; relationship includes: relationship ID, relationship previous node ID, relationship next node ID, relationship previous level relationship ID , The next level of the relationship ID, relationship attribute ID; relationship attributes and node attributes each include: attribute ID, attribute key, attribute value and next attribute ID.
[0059] For the adjacent linked list in the prior art, a common way is to use the linked list to represent the adjacent vertices, that is, the nodes in the linked list are vertices, and the technology provided by the present invention is to use the linked list to connect the relationships, that is, the nodes in the linked list are It belongs to the relationship of the vertex, and the link to the forward relationship and the backward relationship is saved in each relationship, which makes the query of the relationship more convenient. In addition, the attributes and relationships in the present invention adopt ID identification, so that the same attribute can be referenced by ID, which reduces repeated creation of the same value and reduces space utilization.
[0060] In order to further clarify the node and relationship characteristics involved in the above embodiment, the Java class definition used by the graph data node will be cited as follows:
[0061] class Node
[0062] {
[0063] int id;
[0064] int nextRelationShipId;
[0065] int nextPropertyId;
[0066] int labelId;
[0067] }
[0068] The Nodes class above saves a graph data vertex (also can be understood as a node in the embodiment of the present invention) information, where the member variable id is a 4-byte integer number, which uniquely identifies the current node; the member variable nextRelationShipId is a 4-byte integer number , Represents the relationship id that points to the current node; the member variable nextPropertyId is a 4-byte integer, pointing to the first attribute of the current node; the member variable labelId is a 4-byte integer, pointing to the label information of the current node.
[0069] The Java classes used in graph data relationships are defined as follows:
[0070] class Realtionship
[0071] {
[0072] int id;
[0073] int firstNodeId;
[0074] int secondNodeId;
[0075] int firstPreviousRelationshipId;
[0076] int firstNextRelationshipId;
[0077] int secondPreviousRelationshipId;
[0078] int secondNextRelationshipId;
[0079] int propertyId;
[0080] }
[0081] The above Relationship class saves the relationship information in a piece of graph data. All member variables are 4-byte integers, id uniquely identifies the current relationship, firstNodeId is the id of the first node of the relationship, secondNodeId is the id of the second node of the relationship (where, if the relationship is compared to a straight line , The firstNodeId and secondNodeId can be understood as the two end points of the line), firstPreviousRelationshipId is the id of the previous relationship of the first node of the relationship (ie the previous relationship ID of the above-mentioned relationship), and firstNextRelationshipId is the relationship The next relationship of the first node, secondPreviousRelationshipId is the previous relationship of the second node of this relationship, secondNextRelationshipId is the next relationship of the second node of this relationship (ie, the relationship ID of the next level of the relationship), propertyId is The id of the relationship attribute. Special attention here is that the forward relationship and the backward relationship are referenced in the existence of the relationship itself, and the id is used for reference.
[0082] Through the above-mentioned relationship class Realtionship, it can be seen that compared with the objects contained in the relationship described in Example 1, there are at least two different features of "firstPreviousRelationshipId" and "secondPreviousRelationshipId"; this is considered by the present invention. In the graph data, when the original input is filled, it is filled in order, rather than executed concurrently. figure 2 As an example, the label in the node can be understood as the corresponding calibrated according to the filling order. figure 2 As an example, explain the above-mentioned new parameter items "firstPreviousRelationshipId" and "secondPreviousRelationshipId". At this time, if figure 2 The relationship between node 3 and node 4 is the current relationship, the corresponding firstNodeId is node 3, and secondNodeId is node 4, firstPreviousRelationshipId is the relationship between node 2 and node 3, and firstNextRelationshipId is node 3 and node 5 (It should be noted that the relationship between node 3 and node 6 at this time will not be used as the firstNextRelationshipId in the relationship parameter between node 3 and node 4; and the relationship between node 3 and node 6 The relationship ID between node 3 and node 5 will be used as the specific content of firstNextRelationshipId in the relationship parameter between node 3 and node 5. The secondPreviousRelationshipId is figure 2 If it does not exist in the figure shown, it is empty, and secondNextRelationshipId is the relationship between node 4 and node 7.
[0083] Through the above theory, a linked list relationship between relationships can be established, thereby providing a second lookup dimension in addition to the node link list, that is, using the relationship link list and relationship attributes, combined with the node link list and node attributes, and dual search dimensions to provide fast search functions.
[0084] To achieve a high degree of automation in the embodiment of the present invention, the key lies in the determination of the graph data target type of the original data in step 202. Therefore, based on the embodiment of the present invention, a preferred extension scheme is also provided for corresponding automation. Implementation provides technical support, such as image 3 As shown, specifically including:
[0085] In step 301, traverse one or more data tables in the relational database to determine that each data table contains the same data items in the adjacent tables, and/or determine that each data table contains the same data items. The number of data tables that have an association relationship.
[0086] In the specific implementation process, the above-mentioned means 1. Determine that each data table contains the same data items in adjacent tables; Method 2. Determine the number of data tables that have an association relationship with each data table by including the same data items ; The two methods can choose one or the combination and summation method. Among them, the combination method can further avoid the situation that a single method may have the same data table and increase the probability of uniqueness.
[0087] In step 302, sorting is performed according to the number of data tables containing the same data items in the adjacent tables in each data table and/or the number of association relationships contained in each data table.
[0088] In step 303, based on the same data items in adjacent tables contained in one or more of the top ranked tables as nodes, other characteristic data in each table is used as the attribute value of the corresponding node, and the data in the relational database is used The relationship attribute between the table and the data table generates the relationship and the relationship attribute between the corresponding nodes.
[0089] To expand the description in Example 2 Figure 5-Figure 7 Take three relational data tables as an example, where Figure 7 According to the above steps 301-303, the first data table is calculated accordingly, and the student_id and subjetct_id corresponding to the student name and subject name are used as such Figure 8 The nodes in the graph data object type shown are rendered.
[0090] In the embodiment of the present invention, the node attribute class is composed of the current attribute ID, attribute content, and the next attribute ID, so that one or more node attributes under the same node constitute an attribute linked list.
[0091] The (Java) class definitions of the attributes used by the graph data (both node attributes and relationship attributes can use the classes shown below) are as follows:
[0092] class Property
[0093] {
[0094] int id;
[0095] String key;
[0096] Object value;
[0097] int nextPropertyId;
[0098] }
[0099] The above Property class saves a piece of attribute information. This attribute information can be the attribute information of the node or the attribute information of the edge. The member variable id uniquely identifies the current attribute, the key is a string attribute key, and the value is any type of attribute value. nextPropertyId is the id of the next property.
[0100] Through nodes, attributes and relationships, a complete graph data can be constructed. Since the relationship saves the context with the first node and the second node of the relationship, it is easy to traverse all the related relationships of a node, and it can also be very convenient. It is easy to find neighboring nodes through the relationship, so as to continue to expand according to the relationship.
[0101] In the embodiment of the present invention, in step 203, there is a feasible way to implement the data structure of the construction graph data target type, which specifically includes:
[0102] According to the key value in one or more relational data determined as the graph data node, and the relation and relation attribute between the corresponding nodes generated by the relation attribute between the data table and the data table in the relational database, the corresponding graph is constructed Execution script file of data target type data structure.
[0103] Based on the above execution script file generation, the corresponding attribute information filled into the graph data target type according to the original data includes:
[0104] Using the original data as the data source, execute the script file to obtain graph data corresponding to the relational database data.
[0105] In the implementation of the embodiment of the present invention, in addition to the aforementioned analysis of potential characteristics between relational database data tables, the determination of the target type of the graph data can also be achieved by combining with another means. The combination of the other means and the above steps 301-303 includes at least the following two ways: the first one can be completed as the steps before the above steps 301-303, if successful, the steps 301-303 can be skipped. ; The second type can be completed as the steps before the above steps 301-303, and further through the steps 301-303 to adjust the above-mentioned image data target type determined by the template. As a common process of two possible combinations, such as Figure 4 As shown, before determining the graph data target type of the original data, it further includes:
[0106] In step 401, the original data content is analyzed to determine whether it matches the historically established graph data type template.
[0107] In step 402, if it matches, the corresponding graph data type template is directly used as the graph data target type;
[0108] Wherein, the graph data type template includes campus type, enterprise type and government type, and one or a combination of financial type, personnel type and management type.