Linear recursive queries on graph databases

The method addresses performance overhead in RDF databases by implementing a computer-based solution for linearly recursive queries, enhancing query engine efficiency and reducing computational costs.

JP2026104828APending Publication Date: 2026-06-25DASSAULT SYSTEMES SA

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
DASSAULT SYSTEMES SA
Filing Date
2025-12-08
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Executing linearly recursive queries on large RDF databases incurs significant overhead, negatively impacting query engine performance and increasing the total cost of ownership for RDF modeling-based applications.

Method used

A computer implementation method for executing linearly recursive queries on RDF graph databases using a query engine, involving the retrieval of initial conditions and recursive execution clauses, including a first clause to define input for subsequent query executions and a second clause to specify how the output of the query is queried, utilizing operators and gate operators to manage recursion.

Benefits of technology

This method significantly reduces computational resources and time required for repeated calls, improving query engine performance and enabling efficient traversal of RDF graph databases through linear recursive queries.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026104828000001_ABST
    Figure 2026104828000001_ABST
Patent Text Reader

Abstract

This provides a computer implementation method for executing linearly recursive queries on an RDF (Resource Description Framework) graph database using a query engine. [Solution] The method includes the steps of: obtaining a first part of a query that defines one or more initial conditions for query elements in a second part of a query on an RDF graph database as input; and obtaining at least two clauses that specify that the second part of the query is executed recursively. The first clause defines how the output of the execution of the second part of the query is used as input for the next execution of the second part of the query. The second clause includes query elements that define the second part of the query and describe how the output of the execution is queried using (i) one or more initial conditions as input for the first execution or (ii) the output of the previous execution for the next execution.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present disclosure relates to the field of computer programs and systems, and more particularly, to a computer-implemented method for executing linear recursive queries on an RDF graph database by a query engine.

Background Art

[0002] An in-memory database is a dedicatedly constructed database that mainly relies on memory as data storage, in contrast to databases that store data on disks or SSDs. Among such databases, RDF (Resource Description Framework) graph databases have attracted particular attention due to their great flexibility in data modeling and data storage.

[0003] An RDF graph is a conventional data model used for storing and querying these graphs. In order to represent information as graphs, the RDF specification has been published by the World Wide Web Consortium (W3C). See, for example, "RDF 1.1 Concepts and Abstract Syntax" published at (https: / / www.w3.org / TR / rdf11-concepts / ).

[0004] The core structure of the abstract syntax used is a set of triples, each consisting of a subject, a predicate, and an object. Such a set of RDF triples is called an RDF graph, and each triple is represented as a node-arc-node link. For example, an RDF triple can have two nodes, the subject and the object, and a predicate, which is the arc connecting them. By adding graph labels to RDF triples, we can obtain RDF quads. More detailed information about RDF can be found at (https: / / www.w3.org / TR / rdf11-concepts / #data-model).

[0005] RDF applications (e.g., computer-aided design, computer-aided engineering, product lifecycle management) typically require a large number of RDF quads for storage and manipulation. RDF graphs have grown to billions of quads (e.g., Uniprot has 100 billion triples, Wikidata has 17 billion triples, and 3DS SIEM has up to 500 billion triples), thus impacting read / write performance and total cost of ownership (TCO) of storage.

[0006] A database is any collection of data (i.e., information) organized for retrieval and searching. Databases are designed to handle large amounts of data, enabling efficient storage, retrieval, and manipulation. When stored in memory, databases allow computers to quickly search and retrieve data.

[0007] RDF graph databases are a type of database designed to store, manage, and query data structured as graphs.

[0008] Navigating and querying RDF graph databases can be challenging. RDF databases often contain highly interconnected, hierarchical data, which makes efficient navigation and querying difficult. While graph structures are flexible, they can become extremely complex as the number of quads increases.

[0009] Recursive queries in RDF graph databases can retrieve not only directly related data but also related data that may be several steps away in the graph. This is particularly useful for revealing indirect connections and understanding the broader context of the data stored in the database.

[0010] Path queries are a specific type of recursive query that focuses on finding paths between nodes in a graph. In RDF databases, these queries are often used to discover relationships between entities that are not directly connected but can be reached through a series of intermediate nodes.

[0011] Linear recursive queries are a subset of recursive queries, and path queries are a subset of linear recursive queries. These queries are particularly useful for exploring hierarchical and / or sequential relationships within RDF graph databases.

[0012] In the context of industrial applications such as RDF applications (e.g., computer-aided design, computer-aided engineering, product lifecycle management), linear recursive queries are used to find indirect connections between quads in an RDF database. A linear recursive query can include recursive (i.e., iterative) calls to the same query on an RDF graph database.

[0013] In this context, "recursive" means that the input of a call includes the output of the previous call. In this context, "linear recursion" means that the input of a call includes "only" the output of the previous call. [Overview of the project] [Problems that the invention aims to solve]

[0014] Executing linearly recursive queries on large RDF databases incurs significant overhead with each iterative call, thus negatively impacting query engine performance and increasing the total cost of ownership for RDF modeling-based applications.

[0015] Given this background, there is a need for improved computer implementations for executing linearly recursive queries on RDF graph databases using query engines. [Means for solving the problem]

[0016] Therefore, a computer implementation method is provided for executing linearly recursive queries on an RDF graph database using a query engine. This method includes: - A step of retrieving the first part of the query (FROM) from an RDF graph database, defining at least one initial condition for one of the query elements in the second part of the query as input. - Steps to obtain at least two clauses that together specify that the second part is executed recursively: - The first clause (WITH) defines how the output of the second part of the query is used as input for the next execution of the second part of the query, and - A second clause (ON) defining the second part of a query, which includes a query element that describes how the output of the execution of the second part is queried using (i) one or more initial condition inputs for the first execution of the second part of the query, or (ii) the output of the previous execution of the second part of the query for the next execution of the second part of the query.

[0017] This computer implementation method may include one or more of the following: - The first part (FROM) of the retrieved query includes a first group graph pattern that defines one or more initial conditions, each initial condition includes one or more variables, each variable is assigned an initial value, and the first group graph pattern is a syntactic expression. - The output of executing the second part of the query includes a set of solutions, each solution includes a set of variables assigned to each set of values, and the first clause (WITH) is: - Extract one or more values ​​from a set of solutions according to one or more formulas, and - Assign one or more extracted values ​​to one or more variables, which will be used as input for the next execution of the second part of the query. - The second clause (ON) contains a second group graph pattern, which is a syntactic expression. - The first part of the query (FROM) is executed by the first operator (ggp_from), which assigns one or more initial conditions. The first clause (WITH) contains one or more expressions for each of the one or more variables that will be used as input for the next execution of the second part of the query. and / or the second clause (ON) is executed by the second operator (ggp_on), which recursively executes the query elements and outputs a set of solutions for each execution. - The gate operator (R_gate) transfers a set of previous solutions from a preceding execution to the first operator and the start operator (R_begin). The start operator (R_begin) takes one or more initial conditions and a set of previous solutions as input, combines the set of previous solutions and one or more initial conditions, and transfers them to the second operator. The end operator (R_end) takes each set of solutions from the second operator as input, extracts one or more values ​​from the set of previous solutions according to one or more expressions for one or more variables to be used as input for the next execution, and assigns the extracted one or more values ​​to one or more variables to be used as input for the next execution of the second part of the query. - The second part of the query is executed iteratively until the following occurs: - Until the set of solutions is empty, and / or - Until a predetermined number of recursive steps is reached. - The step of obtaining the first clause (WITH) further includes the step of assigning one or more extracted values ​​of one or more expressions to one or more respective variables using the clause (AS) combined with the first clause (WITH). - Linear recursive queries are executed by linear recursive patterns (RECURSE), and these linear recursive patterns have the following types of SPARQL syntax. [Table 1] - The linear recursive pattern further includes a fourth clause (FILTER), which defines a filter for a set of solutions based on one or more constraints, and the linear recursive pattern has the following type of SPARQL grammar: [Table 2] - The query engine is the SPARQL query engine.

[0018] In the present invention, there is further provided a computer program including instructions which, when executed by a processing unit, cause a computer to execute the method.

[0019] In the present invention, there is further provided a computer-readable storage medium on which the computer program is recorded.

[0020] There is further provided a system including a processor coupled to a memory, with the computer program recorded in the memory. For non-limiting examples of the present invention, the following description will refer to the accompanying drawings.

Brief Description of the Drawings

[0021] [Figure 1] shows a flowchart of an example of the method. [Figure 2] shows an example of the system. [Figure 3] shows an illustration of an RDF graph database.

Modes for Carrying Out the Invention

[0022] Referring to the flowchart in Figure 1, a computer implementation method for executing linearly recursive queries on an RDF graph database by a query engine is proposed. The method includes the step of obtaining the first part of the query (FROM) on the RDF graph database, which defines one or more initial conditions for at least one query element in the second part of the query as input. The method also includes the step of obtaining at least two clauses. The at least two clauses together specify that the second part is executed recursively. The at least two clauses include a first clause (WITH) and a second clause (ON). The first clause (WITH) defines how the output of the execution of the second part of the query will be used as input for the next execution of the second part of the query (S20a). The second clause (ON) defines the second part of the query. The second clause includes a query element that describes how the output of the execution of the second part will be queried using (i) one or more initial conditions as input for the first execution of the second part of the query, or (ii) the output of the previous execution of the second part of the query for the next execution of the second part of the query.

[0023] This approach provides an improved solution for query engines to execute linearly recursive queries on RDF graph databases. In fact, this method corresponds to a linearly recursive query mechanism on RDF graph databases. This linearly recursive query mechanism consolidates the recursion of repeated calls to the same query on an RDF graph database (in which parameters are expanded) into a single query, thereby significantly reducing the additional computational resources and time (often referred to as query overhead) required due to each of these repeated calls.

[0024] Notably, the method of the present invention features extensions to the syntax and algebra of the query engine. These extensions work for all types of query elements. The implementation of the syntax and algebra extensions enables the query engine to execute linearly recursive queries on an RDF graph database. The method holds for any linearly recursive query without restrictions on specific constraints within the query and without the need to insert intermediate input from the user during the execution of the linearly recursive query. For example, the method may involve filtering (e.g., using FILTER in the SPARQL language) at any step of the linearly recursive query, i.e., at any step during the traversal of the graph database in the linearly recursive query. As a result, the method greatly improves the execution of linearly recursive queries and provides excellent functionality for traversing RDF graph databases in a linearly recursive manner.

[0025] A query engine is a software component that allows users to interact with data stored in a database or server by submitting queries and retrieving results. A query engine provides a set of standard operations and transformations that users can combine in various ways through a simple query language. The query engine may also be a SPARQL query engine.

[0026] A query may contain one or more clauses, and each of those clauses may contain one or more sub-clauses.

[0027] A recursive query is a type of query that executes one or more sub-clauses where the input contains the output of any previous step. Recursive queries are particularly useful for traversing RDF graphs to retrieve data with hierarchical and / or graph structures.

[0028] A linear recursive query is a type of recursive query where each clause refers to the output of at most one sub-clause. In other words, a linear recursive query is a type of query where the input of any recursive step depends only on the output of the previous step (and not the step before it). Another way to put it is that a linear recursive query is a type of query that recursively traverses a graph. An example of traversing a graph path is an RDF graph where nodes are users and edges represent relationships such as "IsFriendOf". A linear recursive query effectively recursively traverses the graph to find all the friends of a friend of a particular user (often called the root of the graph) and find all the nodes reachable through a set of "IsFriendOf" relationships.

[0029] Linear recursive queries are useful for traversing paths due to their linearity, that is, the fact that the input of any recursive step depends only on the output of the previous step, and thus allow the query engine to "forget" the data during the execution of a linear recursive query (for example, not remembering the data at each step of the query execution).

[0030] This method includes the step of retrieving the first part (FROM) of a query from an RDF graph database, which defines as input one or more initial conditions for at least one query element in the second part of the query (e.g., through user interaction).

[0031] A query element is a component or part of a query that specifies the data or information you want to retrieve or manipulate. Query elements can also be variables and / or parameters of the query.

[0032] The initial conditions for a query element may include initial values ​​and / or constraints that can be assigned to at least one of the query elements to initiate a query lookup. The initial conditions may also be inputs to the query, i.e., they may define a starting point where the execution of the query begins (e.g., the root of the query) and / or where the recursive query will be re-executed on the RDF graph database.

[0033] The first part (FROM) of the retrieved query may include a first group graph pattern. A group graph pattern is a set of graph patterns that allows multiple graph patterns to be combined into a single query pattern. In other words, a group graph pattern specifies a set of graph patterns that must match the data in the RDF graph. Graph patterns specify the structure of the data to be retrieved. For example, a graph pattern may include one or more triple patterns, each consisting of a subject (indicating the resource or variable to be queried), a predicate (the property or relationship of interest), and an object (the value or resource to which the subject relates). A graph pattern that includes a set of triple patterns, each consisting of a subject, predicate, and object that may contain variables, is also called a Basic Graph Pattern (BGP).

[0034] The first part (FROM) of the retrieved query may include at least one clause.

[0035] The at least one phrase may include a first group graph pattern.

[0036] The first group graph pattern may be a syntactic expression, which may define one or more initial conditions. Each initial condition may contain one or more variables, each of which may be assigned an initial value; that is, each variable is bound to its own initial value. Optionally, each initial condition may contain one or more variables and one or more parameters of a query, each of which may be assigned an initial value.

[0037] In other words, the first group graph pattern may define a set of initial solutions, each set of initial solutions containing one or more initial conditions, each initial condition containing one or more variables, each variable may be assigned an initial value. For example, the user may define one or more initial conditions as input to be assigned to each of one or more variables and parameters of a query.

[0038] This method also includes the step of obtaining at least two clauses. The at least two clauses together specify that the second part is executed recursively. In other words, the at least two clauses define the recursive rules of the linearly recursive query. The at least two clauses include a first clause (WITH) and a second clause (ON). The first clause (WITH) defines how the output of the execution of the second part of the query is used as input for the next (recursive) execution of the second part of the query. In other words, the first clause (WITH) may include the recursive rules of the linearly recursive query (e.g., recursive parameterization).

[0039] The second clause (ON) defines the second part of the query. The second clause contains query elements that describe how the output of the execution of the second part will be queried using (i) one or more initial condition inputs for the first execution of the second part of the query, or (ii) the output of the previous execution of the second part of the query (i.e., the output of the previous execution transformed according to the first clause) for subsequent executions of the second part of the query.

[0040] The second clause may contain a second group graph pattern, which is a syntactic expression. The second group graph pattern may be combined with the first group graph pattern. The combined group graph pattern may contain query elements and inputs for the next (recursive) execution of the second part of the query.

[0041] The output of executing the second part of the query may include a solution set (sometimes called a "set of solutions").

[0042] A solution may include a set of variables assigned to each set of values, and therefore each solution has at most one assigned value for each variable. Each solution may include all the individual patterns within the combined group pattern; that is, each solution may match all the patterns within the combined group pattern.

[0043] A solution set is a multi-set of solutions. In one example, a solution set may be represented as a table (solely to visually represent the concept of a solution set in this context), where each column corresponds to a variable and each row corresponds to a solution. Thus, in each row, each variable is assigned only one value. In each column (i.e., each variable), the same variable may be assigned multiple values.

[0044] The first clause (WITH) may extract one or more values ​​from a set of solutions according to one or more expressions. The first clause (WITH) may also include expressions (in one or more expressions) for each variable (in the set of variables) to be used as input for the next execution of the second part of the query. Expressions may include variables, constants, functions, and operators. Expressions may be evaluated to obtain their respective values. The first clause (WITH) may also assign the extracted one or more values ​​to one or more respective variables. Each of these one or more variables may be used as input for the next execution of the second part of the query. In other words, the first clause (WITH) may transform the first set of solutions into a second set of solutions; that is, each solution (represented by a specific row) in the first set of solutions is transformed into another solution (represented by another row) in the second set of solutions. The step of obtaining the first clause (WITH) may further include the step of assigning the extracted one or more values ​​from one or more expressions to one or more respective variables using a third clause (AS) combined with the first clause (WITH).

[0045] The combination of a first part of a query (FROM) containing a first group graph pattern and a second clause (ON) containing a second group graph pattern, along with a first clause (WITH) (whether or not it is combined with a clause (AS)) that defines how the output of the execution of the second part of the query (i.e., the second clause (ON)) is used as input for the next execution of the second part of the query, constitutes an extension of the query language syntax. More precisely, taking the SPARQL language as an example, such a combination defines a new group graph pattern identified as a RecursePattern, which can be formally defined in the SPARQL query language grammar as follows: [Table 3]

[0046] A linear recursive query is equivalent to executing a group graph pattern, RecursePattern. The symbol "+" represents the repetition of '('expression 'AS' variable')' in a linear recursive query.

[0047] Now, let's explain how to run RecursePattern.

[0048] This method may further include a gate operator (R_gate) that can transfer a set of previous solutions (or solution sets) from a preceding execution to a first operator (ggp_from) and a start operator (R_begin). The first part (FROM) of the query may be executed by a first operator (ggp_from) that assigns one or more initial conditions to each of the one or more variables (and parameters) of the query. Here, the expression “executed by a first operator” means that the first part of the query is executed (or processed or acted upon) by a set of operators (of the query language) collectively called ggp_from. In other words, the operation of the first operator may be effectively implemented by a first algorithm that includes a set of (basic) operators of the query language. Such an algorithm may process the first part (FROM) of the query, i.e., the algorithm may assign one or more initial conditions.

[0049] The method may further include a start operator (R_begin) that can take one or more initial conditions and a set of previous solutions (i.e., a set of previous solutions) as input. The start operator (R_begin) may also combine the set of previous solutions and one or more initial conditions, and further forward them (i.e., the combination) to a second operator (ggp_on).

[0050] The first clause (WITH) may contain one or more expressions for each of the one or more variables to be used as input for the next execution of the second part of the query. The second clause (ON) may be executed by a second operator (ggp_on) which recursively executes the query elements and outputs a set of solutions for each execution.

[0051] The method may further include a termination operator (R_end) which may take as input each set of solutions (i.e., each set of solutions) from the output of the second operator, and the termination operator may extract one or more values ​​from the previous set of solutions. The extraction of one or more values ​​from the previous set of solutions may be performed according to one or more expressions for one or more variables contained in the first clause (WITH) which are used as input for the next execution. The termination operator (R_end) may assign the extracted one or more values ​​to one or more variables which are used as input for the next execution of the second part of the query.

[0052] The execution of the second part of the query may be repeated (e.g., recursively) until a predetermined number of recursive steps (e.g., 20, 50, 100, or 1000) is reached and / or until the solution set is empty (e.g., until there are no more values ​​to extract from the previous set of solutions).

[0053] The linear recursion pattern may further include a fourth clause (FILTER) which can define a filter on the solution set. The filter may be based on one or more constraints on the query elements. An example in the SPARQL language may be as follows: [Table 4]

[0054] This method is computer-implemented. This means that the steps (or substantially all steps) of this method are performed by at least one computer, or any similar system. Thus, the steps of this method are performed by the computer, possibly fully automatically or semi-automatically. In one example, at least some of the triggers for the steps of this method may be performed through user-computer interaction. The required level of user-computer interaction may depend on the expected level of automation, balanced with the need to implement user requirements. In one example, this level may be user-defined and / or predefined.

[0055] A typical computer implementation of the method is to perform the method using a system adapted for this purpose. The system may include a processor coupled to memory, in which a computer program is stored containing instructions that, when executed by the processing unit, cause the computer to perform any of the methods of the disclosure. The memory is any hardware adapted to such storage and may comprise several physically distinct parts (e.g., one for the program and possibly one for the database).

[0056] Figure 2 shows an example of a system, which is a client computer system, such as a user's workstation. This system can be used for building, maintaining, storing, and using data structures, and / or, for performing the methods of this disclosure.

[0057] The client computer in this example comprises a central processing unit (CPU) 1010 connected to an internal communication bus 1000, and random access memory (RAM) 1070 also connected to the bus. The client computer further comprises a graphics processing unit (GPU) 1110 associated with video random access memory 1100 connected to the bus. The video RAM 1100 is also known in the art as a frame buffer. A mass storage device controller 1020 manages access to mass memory devices such as a hard drive 1030. Mass memory devices suitable for tangibly realizing computer program instructions and data include all forms of non-volatile memory, including, for example, semiconductor memory devices such as EPROMs, EEPROMs, and flash memory devices, as well as magnetic disks such as internal hard disks and removable disks, and magneto-optical disks. Any of the above may be complemented by or incorporated into a specially designed ASIC (Application-Specific Integrated Circuit). A network adapter 1050 manages access to the network 1060. The client computer may also include haptic devices 1090 such as a cursor control device and a keyboard. A cursor control device is used in a client computer to allow the user to selectively position the cursor at any desired location on the display 1080. Furthermore, the cursor control device allows the user to select various commands and input control signals. The cursor control device includes a number of signal generating devices for input control signals to the system. Typically, the cursor control device may be a mouse, with the mouse buttons used to generate signals. Alternatively or additionally, the client computer system may include a pressure-sensitive pad and / or pressure-sensitive screen.

[0058] The computer program may include instructions executable by the computer, which include means for causing the system to perform any method of the present invention. The program may be recordable on any data storage medium, including the system's memory. The program may be implemented, for example, in a digital electronic circuit, or in computer hardware, firmware, software, or a combination thereof. The program may be implemented as a device, for example, as a product tangibly embodied in a machine-readable storage device for execution by a programmable processor. The steps of the method may be performed by a programmable processor that executes a program of instructions for performing the functions of the method by performing operations on input data to produce an output. Thus, the processor may be programmable and may be coupled to receive data and instructions from a data storage system, at least one input device, and at least one output device, and to transmit data and instructions to them. The application program may be implemented in a high-level procedural or object-oriented programming language, or in assembly language or machine language as needed. In either case, the language may be a compiled or interpreted language. The program may be a complete installation program or an update program. When the program is applied to a system, in either case, instructions for performing the method are provided. Alternatively, the computer program may be stored and executed on a server in a cloud computing environment, which communicates with one or more clients over a network. In such a case, the processing unit executes the instructions contained in the program, thereby causing this method to be executed on the cloud computing environment.

[0059] This method belongs to linear recursive queries, a subfield of recursive queries. Linear recursive queries are a field that uses specific strategies to control the computational cost of recursive queries. These strategies may employ (1) a syntax for recursion, and / or (2) an algorithm (e.g., including operators) to execute the syntax. Points (1) and (2) may be used in combination or individually to control the computational cost of the recursive query.

[0060] An example of a linear recursive query is provided below.

[0061] Starting with algorithm (2) for performing recursive queries, a linear recursive procedure can rely only on newly generated data during execution, and therefore does not need to be considered again after the data has been used to generate new data. For illustrative purposes, we will discuss an example of a linear recursive procedure here, but it will be understood that linear recursive procedures are not limited to this example. If a user wants to obtain the smallest component of a machine, the user may employ an algorithm that starts with the machine and examines the components listed in its configuration. Here, any component of the original machine can be one of the components just found, or one of its subcomponents. Therefore, since the user is only interested in the smallest component, we examine only the newly found components here. If the examination reveals that a component cannot be disassembled further, such as a screw, the user adds it to the list of components. Otherwise, the user disassembles the newly found component again. In this way, the user only needs to consider any found component exactly once. This degree of freedom to "forget" the data during execution is very advantageous for any search operations that occur during the execution of the algorithm.

[0062] Moving on to the syntax for recursion (1), a recursive query is linear if, within the context of a particular language, any clause refers to the output of at most one sub-clause. For example, in Datalog, a Datalog rule defines an output table in terms of input and output tables (for the sake of explanation within the context of Datalog, we assume the input is a table and not a graph). A Datalog program is a finite list of such rules and can be viewed as a query that outputs an output table. For example, [Table 5] It consists of two rules, A and B, which together define an output table, Reachable, with two columns, based on an input table, Connected, with two columns. Rule B states that if a pair (x, y) exists in the input table, it must be added to the output table. Rule A states that if a pair (z, y) exists in the input table and yet another pair (x, z) already exists in the output table, then (x, y) must be added to the output table. Such a program is linear by the definition above. In fact, all output tables (in this case, only Reachable) are defined using only one occurrence of the output table (again, in this case, only Reachable). Linear recursion in Datalog is well understood because Datalog is a simple language. In the case of the Datalog implementation, the corresponding algorithm is a further specialization of the (semi-naive) algorithm, which uses syntax restrictions to justify "forgetting" the data during execution, as previously mentioned for algorithm (2).

[0063] Examples of expand queries are provided below.

[0064] Data models often define relationships that can be hierarchical or recursive. For example, if a data model exposes a relationship such as "A is a component of B," it is often the case that B itself is a component of some C. Graph databases, such as RDF databases, are very well suited to such data models.

[0065] Given a data model that includes such recursive relationships, the recursive necessity of the following query patterns is observed. - Start from any element within the model that is part of such a hierarchy (hereinafter referred to as the root). - Queries some data on the root, including some of the root's sub-elements in that hierarchy. - For each of those sub-elements, query the same data along with its own sub-elements. - Continue until there are no more sub-elements, or until a certain condition is met, or until a certain depth is reached.

[0066] The data retrieved for each element (excluding the root) includes its parent element. This information is necessary for the application to reconstruct the entire hierarchy.

[0067] Such query patterns have several important characteristics: - The links between elements and sub-elements can be complex. For example, possible representations of hierarchical management within an RDF dataset include: [Table 6] This could include two triples. Expansion queries on such a dataset must be able to query such complex relationships between elements. - Expansion queries can involve various types of filtering at any step of recursion. - An expanded query can query any data on each traversed element.

[0068] These characteristics make it impossible to represent common expansion queries in, for example, standard SPARQL 1.1 queries or queries compatible with standard SPARQL 1.1 queries. Instead, all current implementations known in the art require executing multiple queries, resulting in significant performance overhead. Furthermore, some users manually execute SPARQL queries recursively on a hierarchical model in cases where they require the full expressiveness of SPARQL to define a single step of recursion.

[0069] As an example, consider an RDF dataset that includes companies that may have purchased products from each other in the past. If company A purchased from company B on a certain date, the model will include a "sales event" with the following relationship: [Table 7]

[0070] Given a company X, the following SPARQL query can be used to find other companies that have sold something to company X since 2020. [Table 8]

[0071] An example query would look like this: Find all companies Y that have purchased something from company X since 2020, find all companies Z that have purchased something from any company Y since 2020, and so on. The query "SELECT ?buyer WHERE{ ...}" cannot be expressed in SPARQL because it contains a FILTER.

[0072] The present invention extends the SPARQL syntax and semantics of the W3C standard by allowing the encapsulation of the central component of a Group Graph Pattern (GGP) into a “loop” that uses the output of the GGP as a parameter set for the next iteration. Quoting from Section 5 of the SPARQL specification (https: / / www.w3.org / TR / sparql11-query), “SPARQL is based on graph pattern matching. [...] The outermost graph pattern in a query is called the query pattern, which is grammatically identified by GroupGraphPattern [...].”

[0073] In practice, a GroupGraphPattern is represented by a block enclosed in curly braces, which can contain any pattern that SPARQL allows in a WHERE clause.

[0074] For illustrative purposes and as an example, refer to Figure 3 and consider the following graph database. The following is an example of a group graph pattern. [Table 9]

[0075] When executed, this group graph pattern yields a set of solutions, each solution assigning a value to a variable (the variable is said to be "bound" to this value), and solutions are allowed to appear multiple times within the set. In this example, referring to Figure 3, the resulting set of solutions will contain the following two solutions: [Table 10]

[0076] Here, we present an example of implementing the method disclosed herein. The following example refers to the data in the database shown in Figure 3.

[0077] This implementation example features the following linearly recursive query in the context of the SPARQL query language. [Table 11]

[0078] In this example, there are two group graph patterns: one following the "FROM" keyword and the other following the "ON" keyword. Both of these group graph patterns are standard SPARQL patterns, and their behavior is defined by the SPARQL standard.

[0079] The following explains how this example works.

[0080] The group graph pattern following the FROM keyword defines the following set of solutions. { ( ?seller -><Greenbushes_Lithium_Operations> )}

[0081] As with any solution mapping in SPARQL, this solution mapping is used as a VALUES clause in SPARQL, i.e., VALUES ?seller {<Greenbushes_Lithium_Operations> It can be written as}. The group graph pattern that appears after the ON keyword is a standard SPARQL group graph pattern and can be queried by any query engine. This pattern will be executed recursively. First, it is executed as if it were concatenated with the previously defined VALUES clause (formally, this corresponds to a JOIN operation). [Table 12]

[0082] This results in a new result set with the following two results: [Table 13]

[0083] The WITH clause defines how recursion occurs. In this case, a new set of values ​​for the ?seller variable is extracted from the ?buyer variable of the previous result set. In practice, this means a new VALUES clause is extracted, where the values ​​are taken from the ?buyer variable of the previous set. VALUES ?seller { <catl><LG_Energy_Solution_Ltd>} Again, the group graph pattern after the ON keyword is queried as if it were concatenated with this latest VALUES clause, giving a new result set, i.e., a solution set.

[0084] This operation is repeated until no more results are found in the result set, or until a certain depth is reached.

[0085] By definition, the result set corresponding to executing this entire recursive clause (i.e., the final solution set) is the union of all result sets obtained by executing the "ON" group graph pattern.

[0086] Next, we will discuss examples of syntactic extensions.

[0087] Syntax extensions are achieved by adding one or more definitions to the grammar. In the context of SPARQL, the grammar that should be modified is as follows: [Table 14]

[0088] Therefore, the syntax extension involves adding a new GGP (Group Graph Pattern) named RecursePattern, as follows: [Table 15]

[0089] Here, RecursePattern is defined as follows: [Table 16]

[0090] It should be noted that the specific names of the clauses (e.g., "RECURSE," "FROM," "WITH," "AS," and "ON") and their order may differ in other implementations of the method of this disclosure.

[0091] Next, we will discuss examples of semantic extensions.

[0092] The semantics of SPARQL are defined in the specification (https: / / www.w3.org / TR / sparql11-query) and are introduced by the following statement: "The result of executing a SPARQL query is defined by a series of steps, starting with the SPARQL query as a string, then (a) converting that string into an abstract syntax form, and then (b) converting that abstract syntax into a SPARQL abstract query containing SPARQL algebra operators. This abstract query is then (c) evaluated on an RDF dataset." A detailed explanation of the three steps (a), (b), and (c) is given below.

[0093] (a) Creating an abstract syntactic form means transforming a string of symbols into a tree structure that explicitly shows the syntactic relationships between the symbols. Thus, it becomes clear which symbols form a "word" and which combinations of words form a "phrase".

[0094] (b) A SPARQL query takes a graph as input and produces a set of solutions as output. The effect of a particular clause on these solutions is determined by a SPARQL algebra. The algebra consists of operators whose compositionality is mathematically defined. Such operators are always accompanied by an evaluation specification that determines the role of the operator in the next phase (c). In this evaluation definition, the operator takes a tuple of solution sets and a graph as input and outputs another set of solutions. The evaluation of some operators, such as certain types of filters, takes only a set of solutions as an argument and produces another set of solutions. The steps from (a) to (b) involve sorting, merging, and uniformizing operations when two clauses in SPARQL syntax represent input and / or output in different ways. Thus, the SPARQL algebra, abstracted from its evaluation, is a second syntactic layer that can be directly interpreted by constructing an evaluation specification.

[0095] Therefore, in one example of an implementation of the method disclosed herein, the new operators added to the SPARQL algebra are as follows: Recurse: Pattern x Pattern x (Var x Expression)List -> Pattern

[0096] Therefore, the name of the new operator in this implementation example is Recurse. The above expression describes the configuration in which the Recurse operator can be introduced: namely, in a configuration in which the operation is a parameter in another operation that takes Pattern as a parameter. Furthermore, the Recurse operator can only accept parameters of the form (patternFrom, patternOn, ((var_0, expr_0), (var_1, expr_1), ..., (var_n, expr_n))).

[0097] (c) Finally, the evaluation of the Recurse operator is specified. Generally, the result of the Recurse operator is used by other operators. It should be noted that the overall evaluation result of a SPARQL query should match the result of execution by a SPARQL language implementation for the same query. SPARQL language implementations are discussed below.

[0098] From the expression of the Recurse operator, the evaluation of the Recurse clause requires patternProducer_from, patternProducer_on, a list of n pairs (var_0, expr_0), ..., (var_n, expr_n), and a graph at will. The evaluation of the Recurse clause can use all of these elements. In that case, the evaluation of the Recurse clause is to first evaluate patternProducer_from on the graph, and then evaluate patternProducer_on on the graph. They combine their respective results using another operator, such as the SPARQL JOIN operator, to obtain an intermediate result pResult_0. Next, the parameter values ​​of each expr_i are extracted from this result, where expr_i is one of the n expressions in the pair (var_i, expr_i). A new solution set is created using the results of evaluating expr_i on the extracted solution set. Then, it is "joins" (i.e., combined) again with the result of patternProducer_on. This process of extracting, evaluating the expression, and combining it with the results of patternProducer_on is repeated until a new set of solutions is empty and / or until a predetermined number of recursive steps are reached.

[0099] An example implementation of the Recurse operator is discussed in detail below.

[0100] For query execution, this algorithm and the conversion from user input to an Intermediate Representation (IR) are extended by a recursive mechanism. This IR is inspired by the fact that some query engines, like Low Level Virtual Machines or LLVM (see, for example, https: / / en.wikipedia.org / wiki / LLVM#Intermediate_representation), can generate compiled code. Code generation is outside the scope of this method, but, like vectorization, it is done from the same source of inspiration as single-node optimization of main memory. Therefore, it reuses many of the optimizations present in the system.

[0101] Returning to each phase (a), (b), and (c), the following explains how the implementation ensures a conformant solution.

[0102] Regarding (a), the SPARQL parser in the production environment has been extended to include the ability to parse according to the syntax extension.

[0103] (b) is an implementation that adheres to the spirit of IR described above. The resulting algebraic expression is further processed to become a directed graph of IR operators called nodes, which are traversed in the specified direction. Importantly, the combination (or union) of graph pattern A and basic graph pattern B is transformed into a sequence of nodes, in which the nodes of A are evaluated first, and the result is used to optimize the evaluation of the node corresponding to B. For the Recurse clause, the current implementation creates nodes for the group graph pattern ggp_from that appears after "FROM", and for ggp_on that appears after "ON". In addition, a set of three auxiliary nodes is created: R_gate, R_begin, and R_end. The first appears before all nodes of ggp_base, the second R_begin is inserted between the nodes of ggp_base and the nodes of ggp_step. The last type, R_end, is inserted after all nodes of ggp_step and contains a representation of information given in the form "expression AS variable".

[0104] Finally, derived by (c), these operators do the following: (i) The R_gate node receives the output of the previous node in IR, passes it to the next node, and also passes it to R_begin.

[0105] Next, the nodes generated for ggp_from are executed. Any path through these nodes ends with an R_begin node.

[0106] (ii) The R_begin node receives the input variables defined by ggp_from and the input variables passed to it by R_gate, combines them, and passes them to the subsequent node. This subsequent node is part of the subgraph defined by ggp_on.

[0107] After this, the node corresponding to ggp_on is traversed normally until it reaches the R_end node.

[0108] (iii) The R_end node does two things with respect to its input. First, it passes the input as an output to any subsequent node. Second, it transforms the input according to each clause of "Expression AS Variable" and feeds it back to R_begin.< / catl>

Claims

1. A computer implementation method for executing linearly recursive queries on an RDF graph database using a query engine, Step (S10) of obtaining the first part of the query (FROM), which defines as input one or more initial conditions for at least one query element in the second part of the query on an RDF graph database, Step (S20) to obtain at least two clauses that together specify that the second part is executed recursively and Includes, The aforementioned two phrases are, -- The first clause (WITH) defines how the output of the second part of the query is used as input for the next execution of the second part of the query, and -- A second clause (ON) defining the second part of a query, the second clause (ON) including a query element that describes how the output of the execution of the second part is queried using (i) the input of one or more initial conditions for the first execution of the second part of the query, or (ii) the output of the previous execution of the second part of the query for the next execution of the second part of the query. Computer implementation method.

2. The first part (FROM) of the retrieved query includes a first group graph pattern defining one or more initial conditions, each initial condition includes one or more variables, each variable is assigned an initial value, and the first group graph pattern is a syntax expression. The computer implementation method according to claim 1.

3. The output of executing the second part of the query includes a set of solutions, each solution includes a set of variables assigned to each set of values, and the first clause (WITH) is: Extract one or more values ​​from the set of solutions according to one or more formulas, Assign the extracted one or more values ​​to one or more variables, which will be used as input for the next execution of the second part of the query. The computer implementation method according to claim 1 or 2.

4. The computer implementation method according to any one of claims 1 to 3, wherein the second clause (ON) includes a second group graph pattern which is a syntactic expression.

5. A computer implementation method according to any one of claims 1 to 4, The first part of the query (FROM) is executed by the first operator (ggp_from) which assigns one or more initial conditions. The first clause (WITH) includes the one or more expressions for each of the one or more variables used as input for the next execution of the second part of the query, and / or The second clause (ON) is executed by a second operator (ggp_on) which recursively executes the query elements and outputs a set of solutions for each execution. Computer implementation method.

6. A computer implementation method according to claim 5, A gate operator (R_gate) that transfers the set of previous solutions from a preceding execution to the first operator and the start operator (R_begin), The start operator (R_begin) takes the set of one or more initial conditions and the set of previous solutions as input, combines the set of previous solutions and the set of one or more initial conditions, and transfers them to the second operator, A termination operator (R_end) that takes each set of solutions from the second operator as input, extracts one or more values ​​from the previous set of solutions according to one or more expressions for one or more variables to be used as input for the next execution, and assigns the extracted one or more values ​​to each of the one or more variables to be used as input for the next execution of the second part of the query. Further including, Computer implementation method.

7. The second part of the query is executed iteratively until the set of solutions is empty and / or until a predetermined number of recursive steps are reached. The computer implementation method according to any one of claims 1 to 6.

8. The computer implementation method according to any one of claims 1 to 7, wherein the step of obtaining the first clause (WITH) further includes the step of assigning the one or more values ​​extracted from the one or more expressions to each of the one or more variables using a clause (AS) combined with the first clause (WITH).

9. The aforementioned linear recursive query is executed by a linear recursive pattern (RECURSE), and this linear recursive pattern is of the following SPARQL grammar type. Table 1 A computer implementation method according to claim 8, comprising:

10. The linear recursive pattern further includes a fourth clause (FILTER), which defines a filter for a set of solutions based on one or more constraints, and the linear recursive pattern is of the following SPARQL grammar type Table 2 Having, The computer implementation method according to claim 9.

11. The computer implementation method according to any one of claims 1 to 10, wherein the query engine is a SPARQL query engine.

12. A computer program that, when executed by a processing unit, includes instructions that cause a computer to perform the method described in any one of claims 1 to 11.

13. A computer-readable storage medium on which the computer program described in claim 12 is recorded.

14. A system comprising a processor coupled to memory, wherein the computer program described in claim 12 is recorded in the memory.