A data query method and device, a storage medium and an equipment
By first querying the set of nodes that match the path when receiving a matching statement in the graph database, and then determining whether it belongs to the query results, the problem of low query efficiency caused by path pattern filtering conditions is solved, and more efficient data query is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
- Filing Date
- 2022-12-19
- Publication Date
- 2026-06-19
AI Technical Summary
When existing graph databases contain path pattern filtering conditions in the matching statements, query efficiency is low, especially when the path pattern is long or involves a large number of node hops, resulting in high query time overhead.
When receiving a matching statement containing a path pattern as a filter condition, the system first queries each node in the graph database to find nodes that match the path, obtaining a first set of nodes. Then, it determines whether a node falls into this set based on the matching condition, thus determining the query result. This process is simplified to matching nodes with a set of nodes.
It improves the efficiency of data querying, reduces the consumption of computing resources and time, and simplifies the query process.
Smart Images

Figure CN116126854B_ABST
Abstract
Description
Technical Field
[0001] This specification relates to the field of graph database technology, and in particular to a data query method, apparatus, storage medium and device. Background Technology
[0002] Currently, graph databases are widely used. Typically, users can use the match statement in the declarative graph database query language (Cypher) to query data in a graph database.
[0003] The `match` statement can be used to identify nodes that match its included matching conditions. Furthermore, the `match` statement can also include filtering conditions to filter the nodes matched based on the conditions, thus obtaining the query results.
[0004] However, when the filter conditions of the match statement include path pattern filter conditions, for the nodes obtained based on the matching conditions, it is necessary to traverse the path according to the path pattern, resulting in low query efficiency.
[0005] To improve the query efficiency of the MATCH statement, this manual provides a data query method. Summary of the Invention
[0006] This specification provides a data query method, apparatus, storage medium, and device to at least partially solve the aforementioned problems.
[0007] The following technical solution is adopted in this specification:
[0008] This specification provides a data query method applied to a graph database, the method comprising:
[0009] Receive a matching statement. When the matching statement contains a path pattern filtering condition, query the nodes that match the path from each node of the graph database according to the path in the path pattern filtering condition to obtain a first node set.
[0010] Based on the matching conditions of the matching statement, determine the nodes that satisfy the matching conditions from each node in the graph database;
[0011] For each node that meets the matching conditions, determine whether the node falls into the first node set;
[0012] Based on the judgment result, determine whether the node belongs to the query result of the matching statement.
[0013] This specification provides a data query method applied to a graph database, the method comprising:
[0014] Receive a matching statement. When the matching statement contains a path pattern filtering condition, determine a first matching condition based on the path pattern filtering condition and use it as a first statement fragment.
[0015] Determine the set identifier of the node set obtained by matching the first statement fragment;
[0016] The original matching conditions in the matching statement are used as the second matching conditions, and the filtering conditions are updated according to the set identifier and the node identifier of the node queried by the matching statement to obtain the updated filtering conditions.
[0017] A second statement fragment is generated based on the second matching condition and the updated filtering condition;
[0018] Based on the first statement fragment and the second statement fragment, the optimized matching statement is generated;
[0019] Execute the optimized matching statement to determine the query results.
[0020] This specification provides a data query device applied to a graph database, the device comprising:
[0021] The set determination module is used to receive a matching statement. When the matching statement contains a path pattern filtering condition, it queries the nodes in the graph database that match the path according to the path in the path pattern filtering condition to obtain a first node set.
[0022] The matching module is used to determine, based on the matching conditions of the matching statement, nodes that satisfy the matching conditions from the nodes of the graph database;
[0023] The judgment module is used to determine whether each node that meets the matching conditions falls into the first node set.
[0024] The query result determination module is used to determine whether the node belongs to the query result of the matching statement based on the judgment result.
[0025] This specification provides a data query device applied to a graph database, the device comprising:
[0026] The first generation module is used to receive a matching statement, and when the matching statement contains a path pattern filtering condition, it determines the first matching condition according to the path pattern filtering condition and generates a first statement fragment.
[0027] The identifier determination module is used to determine the set identifier of the node set obtained by matching the first statement fragment;
[0028] The update module is used to take the original matching conditions in the matching statement as the second matching conditions, and update the filtering conditions according to the set identifier and the node identifier of the node queried by the matching statement to obtain the updated filtering conditions.
[0029] The second generation module is used to generate a second statement fragment based on the second matching condition and the updated filtering condition;
[0030] An optimization module is used to generate an optimized matching statement based on the first statement fragment and the second statement fragment;
[0031] The execution module is used to execute the optimized matching statement and determine the query results.
[0032] This specification provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described data query method.
[0033] This specification provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the above-described data query method.
[0034] The at least one technical solution adopted in this specification can achieve the following beneficial effects: When the filtering conditions of the matching statement carried by the received query request contain a path pattern for filtering query results, nodes matching the path pattern are determined from each node in the graph database according to the path pattern, thus obtaining a first node set. Nodes satisfying the matching conditions are then determined from each node in the graph database according to the matching conditions of the matching statement. For each node satisfying the matching conditions, it is determined whether the node falls into the first node set, and based on the determination result and the filtering conditions, it is determined whether the node belongs to the query result of the query request.
[0035] As can be seen from the above, the data query method provided in this specification can simplify the traversal of path patterns in a matching statement to the matching of nodes and node sets, thereby improving the query efficiency of data queries based on the matching statement. Attached Figure Description
[0036] The accompanying drawings, which are included to provide a further understanding of this specification and form part of this specification, illustrate exemplary embodiments and their descriptions, serving to explain this specification and do not constitute an undue limitation thereof.
[0037] In the picture:
[0038] Figure 1This is a flowchart illustrating a data query method provided in this specification;
[0039] Figure 2 This is a flowchart illustrating a data query method provided in this specification;
[0040] Figure 3 This is a schematic diagram of a data query process provided in this specification;
[0041] Figure 4 This is a schematic diagram of a data query device provided in this specification;
[0042] Figure 5 This is a schematic diagram of a data query device provided in this specification;
[0043] Figure 6 This is a schematic diagram of an electronic device provided in this specification. Detailed Implementation
[0044] A graph consists of nodes and edges. In a graph database, nodes are entities, and edges are the relationships between entities.
[0045] Typically, the format of a matching statement is: MATCH (match clause) RETURN (return clause). It can be used to query data in a property graph.
[0046] The match clause is used to set matching conditions to describe the content of the query, such as which tags or path patterns to use for the query. The return clause describes the content returned, that is, the target of the query. For example, one path pattern in the match clause could be (Beijingers) - [Knows] - (Hunanese) - [Lives in] -> (Shanghai). The return clause of the match statement can define whether the query target is Beijingers or Hunanese who match this path pattern.
[0047] To filter the data retrieved by a match statement, a conditional clause can be included in the match statement. This conditional clause can contain filtering criteria. The format of a match statement can also be: MATCH clause, WHERE clause, RETURN clause.
[0048] In this context, MATCH is the keyword for the matching statement and can also be used as the keyword for the matching clause. WHERE is the keyword for the conditional clause. RETURN is the keyword for the return clause. Each keyword is a non-modifiable part of the clause corresponding to the matching statement. The matching clause, conditional clause, and return clause are all configurable according to requirements.
[0049] For example, a matching statement can be as follows:
[0050] MATCH(person: Person{id: 17592186055119})-[ :knows*1..2]-(friend)<-[ :postHasCreator|commentHasCreator]-(messageX)-[ :postIsLocatedIn|commentIsLocatedIn]->(countryX)
[0051] WHERE
[0052] countryX.name = 'Laos'
[0053] AND messageX.creationDate <2011071>2160000000
[0054] AND not exists((friend)-[ :personIsLocatedIn]->()-[ :isPartOf]->(countryX))
[0055] return friend.id, count(DISTINCT messageX) AS xCount。
[0056] Among them, (person: Person{id: 17592186055119})-[ :knows*1..2]-(friend)<-[ :postHasCreator|commentHasCreator]-(messageX)-[ :postIsLocatedIn|commentIsLocatedIn]->(countryX) is the matching clause and also the matching condition.
[0057] The conditional clause is:
[0058] countryX.name = 'Laos'
[0059] AND messageX.creationDate <2011071>2160000000
[0060] AND not exists((friend)-[ :personIsLocatedIn]->()-[ :isPartOf]->(countryX)).
[0061] The different filtering conditions in the conditional clause are separated by the AND keyword and there are a total of 3 filtering conditions: countryX.name = 'Laos', messageX.creationDate<2011071>2160000000, not exists((friend)-[:personIsLocated In]->()-[:isPartOf]->(countryX)).
[0062] Among them, not exists((friend)-[:personIsLocatedIn]->()-[:isPartOf]->(countryX)) is the filtering condition for the path pattern in the conditional clause.
[0063] The return clause is: friend.id,count(DISTINCT messageX)AS xCount.
[0064] In the matching statement, the content within the parentheses represents a node. The content within the square brackets represents the relationship between nodes, and the content within the curly brackets represents an attribute. Then, (friend) represents a node, [:isPartOf] represents a relationship, {id: 17592186055119} represents that the value of the attribute of the id of the node is 17592186055119. "()" represents an anonymous node.
[0065] Among them, "-", "->", and "<-" represent the direction of the relationship.
[0066] For example, in `(person:Person{id:17592186055119})-[:knows*1..2]-(friend)`, the node `(person:Person{id:17592186055119})` and the node `(friend)` are connected by `-[:knows*1..2]-`, which contains two `-` symbols. This indicates that the relationship between the node `(person:Person{id:17592186055119})` and the node `(friend)` is bidirectional. In `(friend)-[:personIsLocatedIn]->()`, the node `(friend)` and the node `()` are connected by `-[:personIsLocatedIn]->`, which contains one `-` symbol and one `->` symbol. This indicates that the [:personIsLocatedIn] relationship between node (friend) and node () is unidirectional, and that the relationship is that the person node (friend) is located in the () node, not that the () node is located in the (friend) node.
[0067] The "|" symbol in relational notation is used to connect multiple relations, and it indicates that the connected relations are "AND". For example, in the relation [:postIsLocatedIn|commentIsLocatedIn], "|" indicates that "postIsLocatedIn" and "commentIsLocatedIn" are "AND" relations.
[0068] In the example above, the path pattern of the matching clause is: (person:Person{id:17592186055119})-[:knows*1..2]-(friend)<-[:postHasCreator|commentHasCreator]-(messageX)-[:postIsLocatedIn|commentIsLocatedIn]->(countryX). The path pattern in the conditional clause is: (friend)-[:personIsLocatedIn]->()-[:isPartOf]->(countryX).
[0069] As can be seen, both the matching clause and the conditional clause in the above matching statements contain path patterns. Unlike the matching clause, which describes the path pattern used for the query, the path pattern in the conditional clause is used to filter the results retrieved by the matching clause.
[0070] When this matching statement is executed, for each specific path obtained according to the path pattern in the matching clause, the (friend) node and (countryX) node must be validated by the path pattern in the filtering conditions. From the obtained (friend) nodes and (countryX) nodes, nodes in paths that match the path pattern (friend)-[:personIsLocatedIn]->()-[:isPartOf]->(countryX) are filtered out. The remaining nodes after filtering constitute the query results.
[0071] In other words, it is necessary to traverse the paths of nodes matched by the matching clause based on the pattern path in the conditional clause. Matching the path pattern with the path through traversal is time-consuming, which leads to low efficiency in querying based on the matching statement. The longer the path pattern and the more hops involved, the greater the query time overhead for filtering the matching results of the matching clause based on the path pattern, and the lower the efficiency.
[0072] For example, the matching sub-language is used to query the second-degree friend nodes of node A (i.e., the nodes that are connected to node A in two hops), and the filtering conditions of the path pattern are used to filter out the second-degree friend nodes that are "directly adjacent to A" (i.e., the second-degree friend nodes of A that are found cannot also be the first-degree friend nodes of A).
[0073] like Figure 1 Nodes B, C, and D are all first-degree friends of node A in the graph database. Nodes C, D, E, F, and C are second-degree friends of node A, found by starting from nodes B, C, and D. C is the same node obtained from different paths. C and D are first-degree connections to B. E is first-degree connection to C. F and C are first-degree connection to D. Since the goal of this matching clause is to query the second-degree friends of node A, the database does not perform the query with the sole purpose of obtaining the first-degree friends of node A. The first-degree friends of node A are only used to assist in determining the second-degree friends of node A. Only after finding the second-degree friends of node A can the database determine the first-degree friends of node A from the second-degree friends C, D, E, F, and C based on filtering conditions, and then filter out the first-degree friends of node A.
[0074] After finding the second-degree friends of node A, the database needs to traverse the paths between nodes A and C, A and D, A and E, A and F, and A and C, filtering them according to the aforementioned path pattern filtering conditions. Taking A and C as an example, the database needs to start from node A, traverse node A's neighboring nodes to find node C, and determine whether there exists a path between A and C such that C is a first-degree friend of A. If so, C is filtered out. The same logic applies to other second-degree friends.
[0075] Since C is connected to both B and D, there are two A and C. Furthermore, based on the path pattern in the filtering conditions, the path between A and C needs to be traversed twice.
[0076] After filtering out node A's first-degree friend nodes C and D, the final query results for the matching statement are obtained: nodes E and F.
[0077] As mentioned above, traversing a path based on a path pattern is an extremely time-consuming operation, and the same node may be traversed multiple times, resulting in low query efficiency.
[0078] To address the aforementioned issues, this specification provides a data query method. Based on this method, the previous approach of filtering nodes matched by path patterns through path traversal is transformed into a method of filtering nodes matched by path patterns by matching nodes matched against filtering conditions. This significantly reduces computational resources and time consumption, thereby improving query efficiency.
[0079] For example, based on the query method provided in this specification, we can first collect the first-degree friend nodes of node A based on the filtering conditions of the path pattern to obtain nodes B, C, and D, and store them in a hash table set. Then, we can determine the second-degree friend nodes of node A, namely C, D, E, F, and C.
[0080] Then, it is determined whether the second-degree friend nodes of A are in the above hash table set. In order to filter out the second-degree friend nodes of node A, C, D, E, F, C, the nodes C and D located in the hash table set, the final query result is obtained: nodes E and F.
[0081] Since accessing a set takes far less time than traversing a path, the query method provided in this manual can simplify the query process and improve the efficiency of queries based on query statements.
[0082] To make the objectives, technical solutions, and advantages of this specification clearer, the technical solutions of this specification will be clearly and completely described below in conjunction with specific embodiments and corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of this specification, and not all of them. Based on the embodiments in this specification, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this specification.
[0083] The technical solutions provided in the various embodiments of this specification are described in detail below with reference to the accompanying drawings.
[0084] Figure 2 This is a flowchart illustrating a data query method described in this specification. This data query method is applied to a graph database and specifically includes the following steps:
[0085] S100: Receive a matching statement. When the matching statement contains a path pattern filtering condition, query the nodes in the graph database that match the path according to the path in the path pattern filtering condition to obtain a first node set.
[0086] The database can receive matching statements, which can be configured by the user from the user interface.
[0087] As mentioned above, a matching statement can contain several filtering conditions. When the database receives a matching statement containing path pattern filtering conditions, the query process can be optimized to avoid low query efficiency caused by traversing the path based on the path pattern.
[0088] Therefore, after the receiving end matches the statement, if the matching statement contains a path pattern filter condition, the database can query the nodes in the graph database that match the path according to the path in the path pattern filter condition, and obtain the first node set. For ease of distinction, the nodes in the first node set can be regarded as the first node.
[0089] S102: Based on the matching conditions of the matching statement, determine the nodes that satisfy the matching conditions from each node in the graph database.
[0090] For the matching conditions of a matching statement, the database can determine the nodes that satisfy the matching conditions from the nodes of the graph database. The nodes that satisfy the matching conditions can be used as the second nodes.
[0091] S104: For each node that meets the matching conditions, determine whether the node falls into the first node set.
[0092] S106: Based on the judgment result, determine whether the node belongs to the query result of the matching statement.
[0093] After identifying nodes that meet the matching criteria, the database can then determine whether each node falls into the first node set.
[0094] Then, based on the judgment result, it can be determined whether the node belongs to the query result of the matching statement, so as to obtain the nodes in each first node that belong to the query result of the matching statement.
[0095] Specifically, based on the judgment result and the filtering conditions of the path pattern, it can be determined whether the node belongs to the query result of the matching statement.
[0096] The filtering conditions of this path pattern are used to enable the database to determine whether the query purpose of the matching statement is to obtain nodes that meet the matching conditions and fall into the first node set, or to obtain nodes that meet the matching conditions but do not fall into the first node set.
[0097] based on Figure 2 The method described above, when the received matching statement contains a path pattern filter condition, queries the graph database for nodes that match the path in the path pattern to obtain a first node set. Then, based on the matching condition of the matching statement, it determines the nodes in the graph database that satisfy the matching condition. For each node that satisfies the matching condition, it determines whether the node falls into the first node set, and based on the determination result and the filter condition, it determines whether the node belongs to the query result of the query request.
[0098] As can be seen from the above method, this method can simplify the traversal of the path of the node that matches the matching statement based on the path pattern to the matching of the node and the node set, thereby improving the query efficiency of data query based on the matching statement.
[0099] Furthermore, since a matching statement can include multiple filter conditions, and each filter condition can contain other filter conditions associated with the path pattern in that filter condition. For example, if one of the other filter conditions is a condition that restricts the attributes of the node identifiers in the path pattern, then the path pattern can be merged and updated based on that other filter condition.
[0100] Therefore, in step S100, when querying nodes that match the path from each node of the graph database according to the path in the path pattern filtering conditions to obtain the first node set, specifically, the database can use the path pattern filtering conditions as target conditions and determine other filtering conditions in the matching statement.
[0101] Then, based on the node identifiers in the path pattern filtering conditions and the node identifiers in other filtering conditions, other conditions associated with the target condition can be determined from other filtering conditions.
[0102] Then, the database can merge the path pattern filtering conditions with other conditions based on the node identifiers in the path pattern filtering conditions and the node identifiers in other conditions associated with the target condition, to obtain the merged path pattern filtering conditions.
[0103] Finally, the database can query the nodes that match the path from each node in the graph database based on the path in the filtering conditions of the merged path pattern, and obtain the first node set.
[0104] Among them, the node identifier is a placeholder in the statement, which is a substitute for a specific node and can correspond to several specific nodes.
[0105] Specifically, the database can determine the attribute value of the attribute corresponding to the node identifier in other conditions associated with the target condition. Then, from the node identifiers of the filtering conditions for this path pattern, it identifies the node identifier that matches the node identifier corresponding to that attribute value, and uses this node identifier as the target identifier.
[0106] Then, based on this attribute value, the attribute of the target identifier can be assigned a value. The filtering condition of the path pattern after assigning this attribute value to the target identifier is then used as the filtering condition of the fused path pattern.
[0107] For example, suppose the filtering condition for the path pattern of the matching statement is `not exists((friend)-[:personIsLocatedIn]->()-[:isPartOf]->(countryX))`. Another filtering condition is `countryX.name = 'Laos'`, meaning that the name attribute corresponding to the node identifier `countryX` is Laos. If the name attribute of `countryX` is not restricted, this node identifier can correspond to several nodes, as shown in the diagram of different country nodes stored in the database.
[0108] In this path pattern's filtering conditions, `countryX` is the target identifier. After assigning the value `Laos` of the `name` attribute of `countryX` from the other filtering conditions to the `name` attribute of the target identifier, the resulting merged path pattern filtering condition is: `not exists((friend)-[:personIsLocatedIn]->()-[:isPartOf]->(country{name:'Laos'}))`, thus reducing the amount of data queried based on the path pattern's filtering conditions.
[0109] It should be noted that although the filtering conditions of the fused path pattern include the "exist" function, and its preceding keyword is "not," indicating negation, the semantics of the "exist" function are irrelevant when querying nodes matching the path from the graph database based on the path in the filtering conditions of the fused path pattern. The first node set is determined only by paths conforming to the path pattern (fri end)-[:personIsLocatedIn]->()-[:isPartOf]->(country{name:'Laos'}), not by paths that do not conform to this pattern. The semantics of the "exist" function in the path pattern filtering conditions are only used to determine the semantics corresponding to the path pattern filtering conditions, so as to determine whether the query results of the matching statement are restricted to being included in the first node set or not.
[0110] In addition, in step S106, when determining whether a node belongs to the query result of a matching statement based on the judgment result, the database can determine that the node belongs to the query result of a matching statement when the filtering condition of the path pattern corresponds to positive semantics and the node falls into the first node set.
[0111] When the filtering condition of the path pattern corresponds to a positive semantic, and the node does not fall into the first node set, the database can determine that the node does not belong to the query result of the matching statement.
[0112] When the filtering condition of the path pattern corresponds to negative semantics, and the node falls into the first node set, the database can determine that the node does not belong to the query results of the matching statement.
[0113] When the filtering condition of the path pattern corresponds to negative semantics, and the node does not fall into the first node set, it is determined that the node belongs to the query result of the matching statement.
[0114] The semantics of the filtering conditions for this path pattern, whether affirmative or negative, can be determined based on whether the filtering conditions contain the negative (not) keyword.
[0115] For example, when the filtering condition for this path pattern is `not exists((friend)-[:personIsLocatedIn]->()-[:isPar tOf]->(countryX))`, the inclusion of the `not` keyword indicates negation. This can be used to filter out either the `(friend)` node or the `(countryX)` node that matches the path pattern. Of course, which node in `(friend)` or `(countryX)` is specifically filtered out depends on the node being queried in the matching statement (defined by the return clause).
[0116] When the filtering condition for this path pattern is `exists((friend)-[:personIsLocatedIn]->()-[:isPartOf]->(countryX))`, it has positive semantics. It can be used to filter out either the `(friend)` node or the `(countryX)` node that matches the path pattern, as the query result. Similarly, which node in `(friend)` or `(countryX)` is selected depends on the nodes queried in the matching statement.
[0117] The `exists()` function is an existence function, encompassing both `exists` and `not exists` semantics. Typically, the path pattern for filtering conditions is contained within the existence function.
[0118] In a database, for a statement to be executed, the database's syntax parser and validator typically perform syntax parsing and validation respectively. After successful parsing, the execution plan generator generates an execution plan, which is then optimized by the execution plan optimizer. Finally, the execution engine executes the statement by following the optimized execution plan.
[0119] Therefore, in step S100 of this specification, after receiving the matching statement, the syntax parser and validator can first perform syntax parsing and validation respectively. After passing the validation, the execution plan generator generates an execution plan. After generating the execution plan, it is determined whether the matching statement contains a path pattern filtering condition. If so, the execution plan optimizer optimizes the execution plan and generates a new execution plan for execution.
[0120] In addition, to transform path pattern and path matching into node and node set matching, this specification also provides another data query method. For example... Figure 3 As shown.
[0121] Provided in this instruction manual Figure 3In one data query method shown in this specification, the received matching statement is optimized and then executed to transform the complex matching of path patterns and paths into matching of nodes.
[0122] Specifically, the matching statement can be split into two statement segments: the original conditional clause determines one segment, and the original matching clause determines the other. The matching results for the second segment can be filtered based on the results obtained from the matching of the first segment. That is, first, a set of nodes matching the path pattern of the original conditional clause is obtained based on the statement segment corresponding to that original conditional clause; then, the nodes matched based on the path pattern of the original matching clause are filtered according to the obtained node set.
[0123] Figure 3 This is a flowchart illustrating a data query method described in this specification. This data query method is applied to a graph database and specifically includes the following steps:
[0124] S300: Receive a matching statement. When the matching statement contains a path pattern filtering condition, determine a first matching condition based on the path pattern filtering condition and use it as a first statement fragment.
[0125] In this instruction manual, Figure 3 The corresponding data query method can be executed by the database, or it can be executed by the server. Figure 3 After the corresponding data query method generates an optimized matching statement, the database executes this optimized statement to perform the query. The following explanation will use a database as an example.
[0126] First, the database can receive matching statements. When the matching statement contains a path pattern filter condition, the first matching condition can be determined based on the path pattern filter condition, and used as the first statement fragment.
[0127] As described above, since path patterns in conditional clauses typically exist within the existence function of the conditional clauses, in one or more embodiments of this specification, the database can determine a first matching condition based on the filtering conditions of the path pattern when the conditional clause of the matching statement contains an existence function and the existence function contains a path pattern.
[0128] S302: Determine the set identifier of the node set obtained by matching the first statement fragment.
[0129] After identifying the first statement fragment, the database can determine the set identifier of the node set matched by the first statement fragment.
[0130] The identifier of the set can be a preset one or a randomly generated unique identifier.
[0131] S304: Take the original matching conditions in the matching statement as the second matching conditions, and update the filtering conditions according to the set identifier and the node identifier of the node queried by the matching statement to obtain the updated filtering conditions.
[0132] S306: Generate a second statement fragment based on the second matching condition and the updated filtering condition.
[0133] As described above, in this specification, two statement fragments are generated based on the matching statement.
[0134] This database can use existing matching conditions in a matching statement as secondary matching conditions.
[0135] Since the original path pattern filtering conditions have already been used to generate the first statement fragment, the database can update the filtering conditions of the matching statement based on the set identifier and the node identifier of the node queried by the matching statement, thus obtaining the updated filtering conditions. That is, the conditional clauses of the matching statement are rewritten to generate the new statement fragment.
[0136] After determining the updated filter conditions, the database can generate a second statement fragment based on the second matching condition and the updated filter conditions.
[0137] S308: Generate the optimized matching statement based on the first statement fragment and the second statement fragment.
[0138] After determining the first and second statement fragments, the database can generate an optimized matching statement based on them.
[0139] S310: Execute the optimized matching statement to determine the query results.
[0140] After generating the optimized matching statement, the database can execute the optimized matching statement to determine the query results. By executing the optimized matching statement, the database achieves the following: when the matching statement contains a path pattern filter condition, it queries the nodes in the graph database that match the path according to the path in the path pattern filter condition, obtaining a first node set; and according to the matching conditions of the matching statement, it determines the nodes in the graph database that satisfy the matching conditions; for each node that satisfies the matching conditions, it determines whether the node falls into the first node set; and based on the determination result and the filter conditions, it determines whether the node belongs to the query results of the matching statement.
[0141] based on Figure 3The method described above determines the first matching condition as the first statement fragment when the received matching statement contains a path pattern filter condition. It also determines the set identifier of the node set matched by the first statement fragment. Furthermore, it uses the original matching condition in the matching statement as the second matching condition, and updates the filter condition based on the set identifier and the node identifiers of the nodes queried by the matching statement, resulting in an updated filter condition. A second statement fragment is then generated based on the second matching condition and the updated filter condition. Finally, an optimized matching statement is generated based on the first and second statement fragments. The query result is determined by executing the optimized matching statement.
[0142] As can be seen from the above method, this method can simplify the traversal of path patterns in the filtering conditions based on the original matching statement to the matching of nodes and node sets by optimizing the matching statement, thereby simplifying the query operation and improving the query efficiency of data query based on the matching statement.
[0143] Furthermore, in one or more embodiments of this specification, when generating an optimized matching statement based on the first statement fragment and the second statement fragment, specifically, the database can obtain a preset statement template. This statement template includes an aggregation function template. The aggregation function template can be "collect node placeholder". The statement template can be "collect node placeholder as set placeholder". The node identifier corresponding to the node matched by the first statement fragment can carry node attributes. The node placeholder is the node identifier corresponding to the node matched by the first statement fragment. The set placeholder is the placeholder for the set identifier.
[0144] Since the database filters the nodes matched by the second matching segment based on the nodes matched by the first matching segment, the second matching segment needs to apply the results obtained by the first matching segment.
[0145] Therefore, the database can determine the node identifier corresponding to the node matched by the first statement fragment. Then, based on the node identifier corresponding to the node matched by the first statement fragment, the set identifier, and the aggregation function template, the database can determine an aggregation function used to store the node matched by the first statement fragment into the node set.
[0146] The database can then determine the join clause based on the statement template and the aggregate function.
[0147] The optimized matching statement is generated based on the first statement fragment, the connecting clause, and the second statement fragment.
[0148] This connect clause is used to make the output of the first statement fragment based on the second statement fragment determined by the matching statement. Furthermore, it can be used to declare a set identifier for the set of nodes from which the output (nodes) of the first statement fragment are collected.
[0149] In one or more embodiments of this specification, when updating the filter conditions based on the set identifier and the node identifier of the node queried by the matching statement to obtain the updated filter conditions, specifically, the database may use the path pattern filter conditions as the target conditions.
[0150] Then, the database can determine the target keyword from a preset set of connection keywords based on the semantics of the target conditions. This semantics includes either affirmative or negative semantics.
[0151] Furthermore, the database can use the target keyword to connect the node identifier of the node queried by the matching statement with the set identifier to obtain the constraint. This constraint is used to determine whether the node queried by the matching statement is contained in the node corresponding to the set identifier.
[0152] The database can then use this constraint as a new filter condition. Based on this new filter condition, and other filter conditions in the matching statement besides the target filter condition, the updated filter condition is determined.
[0153] The connecting keywords can include: in (in) and not in (ont……in).
[0154] When the semantics of the target condition are affirmative, the keyword "in" can be used as the target keyword. When the semantics of the target condition are negative, the keyword "not present" can be used as the target keyword.
[0155] As mentioned above, the second matching segment requires the result obtained from the matching of the first matching segment. The `with` keyword can be used to connect different statement segments, and the output of the statement segment preceding the `with` keyword can be input into the statement segment following the `with` keyword. Therefore, the above statement template can be a `with` statement template: `with collect node placeholder as set placeholder`.
[0156] In one or more embodiments of this specification, Figure 3 The corresponding data query method can be used before generating the execution plan. After obtaining the optimized matching statement, the database can generate an optimized execution plan for the matching statement and determine the query result by executing the execution plan.
[0157] In addition, for ease of understanding the information provided in this manual Figure 3 The following example illustrates the comparison between the corresponding data query methods and the optimized matching statements:
[0158] Suppose the matching statement is used to find the posts created or commented by the 1-degree and 2-degree friend nodes (excluding the starting person) of a given person node within a given date range. The country where the posts are located is Laos, and the cities where the friend nodes are located do not belong to this country. The matching result is to count the number of posts created or commented by each friend node of the person node.
[0159] The specific matching statement is as follows:
[0160] MATCH(person:Person{id:17592186055119})-[:knows*1..2]-(friend)<-[:postHasCreator|commentHasCreator]-(messageX)-[:postIsLocatedIn|commentIsLocatedIn]->(countryX)
[0161] WHERE
[0162] person.id<>friend.id
[0163] AND countryX.name='Laos'
[0164] AND messageX.creationDate>=20110601000000000
[0165] AND messageX.creationDate<2011071>2160000000
[0166] AND not exists((friend)-[:personIsLocatedIn]->()-[:isPartOf]->(countryX))
[0167] return friend.id,count(DISTINCT messageX)AS xCount。
[0168] Among them, (person:Person{id:17592186055119})-[:knows*1..2]-(friend)<-[:postHasCreator|commentHasCreator]-(messageX)-[:postIsLocatedIn|commentIsLocatedIn]->(countryX) is the matching condition. (person:Person{id:17592186055119}) means that the label of the given node identifier person is person, and the identity number attribute of this node identifier is 17592186055119. id is the identity number. Through the id attribute, the node identifier is restricted to a specific node, that is, the node with id 17592186055119.
[0169] The conditional clause is:
[0170] person.id<>friend.id
[0171] AND countryX.name = 'Laos'
[0172] AND messageX.creationDate >= 20110601000000000
[0173] AND messageX.creationDate <2011071>2160000000
[0174] AND not exists((friend)-[:personIsLocatedIn]->()-[:isPartOf]->(countryX))
[0175] return friend.id,count(DISTINCT messageX) AS xCount.
[0176] The filtering conditions included in the conditional clause are: person.id<>friend.id, countryX.name = 'Laos', messageX.creationDate >= 20110601000000000, messageX.creationDate <2011071>2160000000 and not exists((friend)-[:personIsLocatedIn]->(( )-[:isPartOf]->(countryX))。not exists((friend)-[:personIsLocatedIn]->( )-[:isPartOf]->(countryX)) is the filtering condition of the path pattern.
[0177] Among them, *1..2 means that the queried friend nodes are the nodes with 1-hop connection or 2-hop connection to the person node with id 17592186055119. person.id<>friend.id means that the id attribute of the queried node is different from the id of the person node with id 17592186055119. This is used to meet the above filtering condition of "excluding the starting person". countryX.name = 'Laos' means that the name attribute of the country node countryX is Laos. name is the name attribute. messageX.creationDate >= 20110601000000000 means that the creation or comment time attribute of the post node messageX is greater than or equal to 20110601000000000. creationDate is the creation or comment time attribute. messageX.creationDate <2011071>2160000000 means that the creation or comment time attribute of the post node messageX is less than 20110712160000000. not exists((friend)-[:personIsLocatedIn]->( )-[:isPartOf]->(countryX)) means that the cities where the friend nodes of the person node with id 17592186055119 are located are not cities in Laos.
[0178] friend.id,count(DISTINCT messageX)AS xCount is the return clause, which means returning the id corresponding to the node identifier of each friend node of this person node, and for each friend node of this person node, counting the number of posts it creates or comments on without repetition, and showing it with xCount as the column name of the number of posts.
[0179] Among them, xCount is similar to a field in a relational database. DISTINCT is used to indicate that the returned data is not repeated.
[0180] After optimizing this matching statement, the obtained matching statement is as follows:
[0181] MATCH(countryX{name:'Laos'})<-[:isPartOf]-()<-[:personIsLocatedIn]-(x)
[0182] WITH collect(x.id) AS cx
[0183] MATCH(person:Person{id:17592186055119})-[:knows*1..2]-(friend)<-[:postHasCreator|commentHasCreator]-(messageX)-[:postIsLocatedIn|commentIsLocatedIn]->(countryX)
[0184] WHERE
[0185] person.id <> friend.id
[0186] AND NOT friend.id IN cx
[0187] AND countryX.name = 'Laos'
[0188] AND messageX.creationDate >= 20110601000000000
[0189] AND messageX.creationDate <2011071>2160000000
[0190] RETURN DISTINCT friend.id, count(DISTINCT messageX) AS xCount
[0191] Among them, match(countryX{name:'Laos'})<-[:isPartOf]-()<-[:personIsLocatedIn]-(x) is the first statement fragment, and with collect(x.id) as cx is the connection clause. cx is the set identifier. x.id is the node identifier with attributes, which is the node identifier of the nodes matched by the first statement fragment. friend.id is the node identifier of the nodes queried by the matching statement with attributes.
[0192] The second statement fragment is:
[0193] MATCH(person:Person{id:17592186055119})-[:knows*1..2]-(friend)<-[:postHasCreator|commentHasCreator]-(messageX)-[:postIsLocatedIn|commentIsLocatedIn]->(countryX)
[0194] WHERE
[0195] person.id<>friend.id
[0196] AND not friend.id in cx
[0197] AND countryX.name='Laos'
[0198] AND messageX.creationDate>=20110601000000000
[0199] AND messageX.creationDate<2011071>2160000000
[0200] return DISTINCT friend.id,count(DISTINCT messageX)AS xCount。
[0201] Among them, MATCH(person:Person{id:17592186055119})-[:knows*1..2]-(friend)<-[:postHasCreator|commentHasCreator]-(messageX)-[:postIsLocatedIn|commentIsLocatedIn]->(countryX) is the second matching condition.
[0202] person.id<>friend.id
[0203] AND not friend.id in cx
[0204] AND countryX.name='Laos'
[0205] AND messageX.creationDate>=20110601000000000
[0206] AND messageX.creationDate <2011071>2160000000
[0207] return DISTINCT friend.id, count(DISTINCT messageX) AS xCount represents the updated filtering conditions connected by each AND keyword.
[0208] For the filtering condition not exists((friend)-[:personIsLocatedIn]->()-[:is PartOf]->(countryX)) in the original conditional clause of the path pattern, it can be changed to the constraint condition not friend.id in cx.
[0209] It should be noted that the node identifiers in the matching statement are all aliases, not the names of specific nodes themselves. For example, the node named friend finally matched can specifically be Zhao A, Qian B, and Sun C. And the query results returned by this matching statement are the ids of the three.
[0210] The above is the data query method provided by this specification. This specification also provides a corresponding data query device.
[0211] Figure 4 Schematic diagram of a data query device provided by this specification. For Figure 1 the device corresponding to the data query method. The data query device is applied to a graph database, and the device includes:
[0212] A set determination module 200, configured to receive a matching statement. When the matching statement contains a filtering condition of a path pattern, query nodes matching the path from each node in the graph database according to the path in the filtering condition of the path pattern, and obtain a first node set;
[0213] A matching module 201, configured to determine nodes that meet the matching conditions from each node in the graph database according to the matching conditions of the matching statement;
[0214] A judgment module 202, configured to judge whether each node that meets the matching conditions falls into the first node set;
[0215] A query result determination module 203, configured to determine whether the node belongs to the query result of the matching statement according to the judgment result.
[0216] Optionally, the set determination module 200 is further configured to: use the filtering conditions of the path pattern as target conditions, and determine other filtering conditions in the matching statement; determine other conditions associated with the target condition from the other filtering conditions based on the node identifiers in the filtering conditions of the path pattern and the node identifiers in the other filtering conditions; merge the filtering conditions of the path pattern with the other conditions based on the node identifiers in the filtering conditions of the path pattern and the node identifiers in the other conditions associated with the target condition to obtain the merged filtering conditions of the path pattern; and query nodes matching the path from each node in the graph database based on the path in the merged filtering conditions of the path pattern to obtain a first node set.
[0217] Optionally, the set determination module 200 is further configured to determine the attribute value of the attribute corresponding to the node identifier in other conditions associated with the target condition; determine the node identifier that is the same as the node identifier corresponding to the attribute value from the node identifiers of the path pattern filtering conditions, and use it as the target identifier; assign a value to the attribute of the target identifier according to the attribute value; and use the path pattern filtering conditions after assigning the attribute value to the target identifier as the filtering conditions of the fused path pattern.
[0218] Optionally, the query result determination module 203 is further configured to: determine that a node belongs to the query result of the matching statement when the filtering condition of the path pattern corresponds to a positive semantic and the node falls into the first node set; determine that a node does not belong to the query result of the matching statement when the filtering condition of the path pattern corresponds to a positive semantic and the node does not fall into the first node set; determine that a node does not belong to the query result of the matching statement when the filtering condition of the path pattern corresponds to a negative semantic and the node falls into the first node set; and determine that a node belongs to the query result of the matching statement when the filtering condition of the path pattern corresponds to a negative semantic and the node does not fall into the first node set.
[0219] Figure 5 This is a schematic diagram of a data query device provided in this specification. Figure 3 The data query method corresponds to a device. The data query device is applied to a graph database, and the device includes:
[0220] The first generation module 400 is used to receive a matching statement. When the matching statement contains a path pattern filtering condition, it determines a first matching condition based on the path pattern filtering condition and uses it as a first statement fragment.
[0221] The identifier determination module 401 is used to determine the set identifier of the node set obtained by matching the first statement fragment;
[0222] The update module 402 is used to take the original matching conditions in the matching statement as the second matching conditions, and update the filtering conditions according to the set identifier and the node identifier of the node queried by the matching statement to obtain the updated filtering conditions.
[0223] The second generation module 403 is used to generate a second statement fragment based on the second matching condition and the updated filtering condition;
[0224] The optimization module 404 is used to generate the optimized matching statement based on the first statement fragment and the second statement fragment;
[0225] Execution module 405 is used to execute the optimized matching statement and determine the query results.
[0226] Optionally, the optimization module 404 is further configured to: obtain a preset statement template, the statement template including an aggregation function template; determine the node identifier corresponding to the node matched by the first statement fragment; determine an aggregation function based on the node identifier corresponding to the node matched by the first statement fragment, the set identifier, and the aggregation function template, the aggregation function being used to store the node matched by the first statement fragment into the node set; determine a connector clause based on the statement template and the aggregation function; and generate an optimized matching statement based on the first statement fragment, the connector clause, and the second statement fragment; wherein the connector clause is used to make the output and input of the first statement fragment based on the second statement fragment determined by the matching statement.
[0227] Optionally, the update module 402 is further configured to: use the filtering conditions of the path pattern as target conditions; determine target keywords from preset connection keywords according to the semantics of the target conditions, the semantics including either affirmative or negative semantics; connect the node identifier of the node queried by the matching statement with the set identifier through the target keyword to obtain constraint conditions, the constraint conditions being used to constrain whether the node queried by the matching statement is included in the node corresponding to the set identifier; use the constraint conditions as new filtering conditions; and determine updated filtering conditions based on the new filtering conditions and other filtering conditions in the matching statement besides the target filtering conditions.
[0228] This specification also provides a computer-readable storage medium storing a computer program that can be used to execute the above-described data query method.
[0229] This instruction manual also provides Figure 6 The diagram shows a schematic structural representation of the electronic device. Figure 6At the hardware level, the electronic device includes a processor, an internal bus, a network interface, memory, and non-volatile memory, and may also include other hardware required for business operations. The processor reads the corresponding computer program from the non-volatile memory into memory and then runs it to implement the aforementioned data query method. Of course, in addition to software implementation, this specification does not exclude other implementation methods, such as logic devices or a combination of hardware and software, etc. That is to say, the execution subject of the following processing flow is not limited to individual logic units, but can also be hardware or logic devices.
[0230] In the 1990s, improvements to a technology could be clearly distinguished as either hardware improvements (e.g., improvements to the circuit structure of diodes, transistors, switches, etc.) or software improvements (improvements to the methodology). However, with technological advancements, many methodological improvements today can be considered direct improvements to the hardware circuit structure. Designers almost always obtain the corresponding hardware circuit structure by programming the improved methodology into the hardware circuit. Therefore, it cannot be said that a methodological improvement cannot be implemented using hardware physical modules. For example, a Programmable Logic Device (PLD) (such as a Field Programmable Gate Array (FPGA)) is such an integrated circuit whose logic function is determined by the user programming the device. Designers can program and "integrate" a digital system onto a PLD themselves, without needing chip manufacturers to design and manufacture dedicated integrated circuit chips. Furthermore, nowadays, instead of manually manufacturing integrated circuit chips, this programming is mostly implemented using "logic compiler" software. Similar to the software compiler used in program development, the original code before compilation must be written in a specific programming language, called a Hardware Description Language (HDL). There are many HDLs, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, and RHDL (Ruby Hardware Description Language). Currently, the most commonly used are VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog. Those skilled in the art should understand that by simply performing some logic programming on the method flow using one of these hardware description languages and programming it into an integrated circuit, the hardware circuit implementing the logical method flow can be easily obtained.
[0231] The controller can be implemented in any suitable manner. For example, it can take the form of a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro)processor, logic gates, switches, application-specific integrated circuits (ASICs), programmable logic controllers, and embedded microcontrollers. Examples of controllers include, but are not limited to, the following microcontrollers: ARC625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicon Labs C8051F320. A memory controller can also be implemented as part of the control logic of the memory. Those skilled in the art will also recognize that, in addition to implementing the controller in purely computer-readable program code form, the same functionality can be achieved by logically programming the method steps to make the controller take the form of logic gates, switches, ASICs, programmable logic controllers, and embedded microcontrollers. Therefore, such a controller can be considered a hardware component, and the means included therein for implementing various functions can also be considered as structures within the hardware component. Alternatively, the means for implementing various functions can be considered as both software modules implementing the method and structures within the hardware component.
[0232] The systems, devices, modules, or units described in the above embodiments can be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, a computer can be, for example, a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or any combination of these devices.
[0233] For ease of description, the above devices are described in terms of function, divided into various units. Of course, in implementing this specification, the functions of each unit can be implemented in one or more software and / or hardware.
[0234] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0235] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0236] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0237] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0238] In a typical configuration, a computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.
[0239] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.
[0240] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.
[0241] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0242] Those skilled in the art will understand that the embodiments of this specification can be provided as methods, systems, or computer program products. Therefore, this specification may take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this specification may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0243] This specification can be described in the general context of computer-executable instructions that are executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform a specific task or implement a specific abstract data type. This specification can also be practiced in distributed computing environments, where tasks are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.
[0244] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to interchangeably. Each embodiment focuses on describing the differences from other embodiments. In particular, the system embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments.
[0245] The above description is merely an embodiment of this specification and is not intended to limit this specification. Various modifications and variations can be made to this specification by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this specification should be included within the scope of the claims of this specification.
Claims
1. A data query method, the data query method being applied to a graph database, the method comprising: The system receives a matching statement, which includes matching conditions and filtering conditions. The matching conditions are used to characterize the label or path pattern on which the query for data in the graph stored in the graph database is based. The filtering conditions are used to filter the results queried based on the matching conditions. When the matching statement contains a path pattern filtering condition, the nodes that match the path are queried from each node of the graph database according to the path in the path pattern filtering condition to obtain the first node set. Based on the matching conditions of the matching statement, determine the nodes that satisfy the matching conditions from each node in the graph database; For each node that meets the matching conditions, determine whether the node falls into the first node set; Based on the judgment result, determine whether the node belongs to the query result of the matching statement.
2. The method as described in claim 1, wherein, based on the path in the filtering conditions of the path pattern, nodes matching the path are queried from each node of the graph database to obtain a first node set, specifically including: The filtering conditions of the path pattern are used as target conditions, and other filtering conditions in the matching statement are determined. Based on the node identifiers in the filtering conditions of the path pattern and the node identifiers in the other filtering conditions, other conditions associated with the target condition are determined from the other filtering conditions; Based on the node identifiers in the filtering conditions of the path pattern and the node identifiers in other conditions associated with the target condition, the filtering conditions of the path pattern are merged with the other conditions to obtain the fused filtering conditions of the path pattern. Based on the path in the filtering conditions of the fused path pattern, a first node set is obtained by querying the nodes in the graph database that match the path.
3. The method as described in claim 2, wherein the filtering conditions of the path pattern are fused with the other conditions based on the node identifiers in the filtering conditions of the path pattern and the node identifiers in other conditions associated with the target condition to obtain the fused filtering conditions of the path pattern, specifically including: Determine the attribute value of the attribute corresponding to the node identifier in other conditions associated with the target condition; From the node identifiers of the filtering conditions of the path pattern, determine the node identifier that is the same as the node identifier corresponding to the attribute value, and use it as the target identifier; Assign values to the attributes of the target identifier based on the attribute values; The path pattern filtering conditions after assigning values to the attributes of the target identifier are used as the filtering conditions for the fused path pattern.
4. The method as described in claim 1, wherein determining whether the node belongs to the query result of the matching statement based on the judgment result specifically includes: When the filtering condition of the path pattern corresponds to a positive semantic and the node falls into the first node set, it is determined that the node belongs to the query result of the matching statement; When the filtering condition of the path pattern corresponds to a positive semantic and the node does not fall into the first node set, it is determined that the node does not belong to the query result of the matching statement. When the filtering condition of the path pattern corresponds to negative semantics, and the node falls into the first node set, it is determined that the node does not belong to the query result of the matching statement; When the filtering condition of the path pattern corresponds to negative semantics, and the node does not fall into the first node set, it is determined that the node belongs to the query result of the matching statement.
5. A data query method, said data query method being applied to a graph database, said method comprising: Receive a matching statement. When the matching statement contains a path pattern filtering condition, determine a first matching condition based on the path pattern filtering condition and use it as a first statement fragment. Determine the set identifier of the node set obtained by matching the first statement fragment; The original matching conditions in the matching statement are used as the second matching conditions, and the filtering conditions are updated according to the set identifier and the node identifier of the node queried by the matching statement to obtain the updated filtering conditions. A second statement fragment is generated based on the second matching condition and the updated filtering condition; Based on the first statement fragment and the second statement fragment, the optimized matching statement is generated; Execute the optimized matching statement to determine the query results.
6. The method as described in claim 5, wherein generating the optimized matching statement based on the first statement fragment and the second statement fragment specifically includes: Obtain a preset statement template, which includes an aggregate function template; Determine the node identifier corresponding to the node obtained by matching the first statement fragment; Based on the node identifier corresponding to the node obtained by matching the first statement fragment, the set identifier, and the aggregation function template, an aggregation function is determined. The aggregation function is used to store the node obtained by matching the first statement fragment into the node set. Based on the statement template and the aggregate function, determine the join clause; The optimized matching statement is generated based on the first statement fragment, the connecting clause, and the second statement fragment; The connecting clause is used to make the output of the first statement fragment input based on the second statement fragment determined by the matching statement.
7. The method as described in claim 5, wherein the filter conditions are updated based on the set identifier and the node identifier of the node queried by the matching statement to obtain the updated filter conditions, specifically including: The filtering conditions of the path pattern are used as the target conditions; Based on the semantics of the target conditions, the target keyword is determined from the preset connection keywords, wherein the semantics include either affirmative or negative semantics; By connecting the node identifier of the node queried by the matching statement with the set identifier using the target keyword, a constraint condition is obtained. The constraint condition is used to constrain whether the node queried by the matching statement is included in the node corresponding to the set identifier. The aforementioned constraints will be used as new filtering conditions. Based on the newly added filtering condition and other filtering conditions in the matching statement besides the target filtering condition, the updated filtering condition is determined.
8. A data query device, the data query device being applied to a graph database, the device comprising: A set determination module is used to receive a matching statement. When the matching statement contains a path pattern filtering condition, it queries the nodes in the graph database that match the path according to the path in the path pattern filtering condition to obtain a first node set. The matching statement includes a matching condition and a filtering condition. The matching condition is used to characterize the label or path pattern on which the query of data in the graph stored in the graph database is based. The filtering condition is used to filter the results queried based on the matching condition. The matching module is used to determine, based on the matching conditions of the matching statement, nodes that satisfy the matching conditions from the nodes of the graph database; The judgment module is used to determine whether each node that meets the matching conditions falls into the first node set. The query result determination module is used to determine whether the node belongs to the query result of the matching statement based on the judgment result. 9.The apparatus of claim 8, wherein the set determination module is further configured to determine other filter conditions in the matching statement as target conditions for the path pattern. Based on the node identifiers in the filtering conditions of the path pattern and the node identifiers in the other filtering conditions, other conditions associated with the target condition are determined from the other filtering conditions; Based on the node identifiers in the filtering conditions of the path pattern and the node identifiers in other conditions associated with the target condition, the filtering conditions of the path pattern are merged with the other conditions to obtain the fused filtering conditions of the path pattern; based on the path in the filtering conditions of the fused path pattern, nodes matching the path are queried from each node in the graph database to obtain a first node set.
10. The apparatus of claim 9, wherein the set determination module is further configured to: determine the attribute value of the attribute corresponding to the node identifier in other conditions associated with the target condition; determine, from the node identifiers of the filtering conditions of the path pattern, a node identifier that is the same as the node identifier corresponding to the attribute value, as the target identifier; assign a value to the attribute of the target identifier according to the attribute value; and use the filtering conditions of the path pattern after assigning the value to the attribute of the target identifier as the filtering conditions of the fused path pattern.
11. The apparatus of claim 8, wherein the query result determination module is further configured to determine that the node belongs to the query result of the matching statement when the filtering condition of the path pattern corresponds to a positive semantic and the node falls into the first node set; When the filtering condition of the path pattern corresponds to a positive semantic and the node does not fall into the first node set, it is determined that the node does not belong to the query result of the matching statement. When the filtering condition of the path pattern corresponds to negative semantics, and the node falls into the first node set, it is determined that the node does not belong to the query result of the matching statement; When the filtering condition of the path pattern corresponds to negative semantics, and the node does not fall into the first node set, it is determined that the node belongs to the query result of the matching statement.
12. A data query device, the data query device being applied to a graph database, the device comprising: The first generation module is used to receive a matching statement. When the matching statement contains a path pattern filtering condition, it determines a first matching condition based on the path pattern filtering condition and uses it as a first statement fragment. The identifier determination module is used to determine the set identifier of the node set obtained by matching the first statement fragment; The update module is used to take the original matching conditions in the matching statement as the second matching conditions, and update the filtering conditions according to the set identifier and the node identifier of the node queried by the matching statement to obtain the updated filtering conditions. The second generation module is used to generate a second statement fragment based on the second matching condition and the updated filtering condition; An optimization module is used to generate an optimized matching statement based on the first statement fragment and the second statement fragment; The execution module is used to execute the optimized matching statement and determine the query results.
13. The apparatus of claim 12, wherein the optimization module is further configured to obtain a preset statement template, the statement template including an aggregation function template; and determine the node identifier corresponding to the node obtained by matching the first statement fragment; Based on the node identifier corresponding to the node obtained by matching the first statement fragment, the set identifier, and the aggregation function template, an aggregation function is determined. The aggregation function is used to store the node obtained by matching the first statement fragment into the node set. Based on the statement template and the aggregation function, a connector clause is determined; based on the first statement fragment, the connector clause, and the second statement fragment, an optimized matching statement is generated; wherein, the connector clause is used to make the output and input of the first statement fragment based on the second statement fragment determined by the matching statement.
14. The apparatus of claim 12, wherein the updating module is further configured to: use the filtering conditions of the path pattern as target conditions; determine target keywords from preset connection keywords according to the semantics of the target conditions, wherein the semantics include either affirmative semantics or negative semantics; connect the node identifier of the node queried by the matching statement and the set identifier through the target keyword to obtain constraint conditions, wherein the constraint conditions are used to constrain whether the node queried by the matching statement is included in the node corresponding to the set identifier; use the constraint conditions as new filtering conditions; and determine updated filtering conditions according to the new filtering conditions and other filtering conditions in the matching statement besides the target filtering conditions.
15. A computer-readable storage medium storing a computer program that, when executed by a processor, implements the method described in any one of claims 1 to 7.
16. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the method described in any one of claims 1 to 7.