Attribute graph structure construction method, data processing method, system and electronic device

By constructing an attribute graph structure and using the first and second edges to represent the attributes of facts and intermediate objects, the problem of low semantic reasoning efficiency in knowledge graph systems in attribute graph databases is solved. This achieves efficient semantic reasoning and storage integration, improving the system's execution efficiency and availability.

CN122242665APending Publication Date: 2026-06-19YINYUAN GALAXY (SUZHOU) INFORMATION TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
YINYUAN GALAXY (SUZHOU) INFORMATION TECHNOLOGY CO LTD
Filing Date
2026-03-09
Publication Date
2026-06-19

Smart Images

  • Figure CN122242665A_ABST
    Figure CN122242665A_ABST
Patent Text Reader

Abstract

This application provides a method for constructing an attribute graph structure, a data processing method, a system, and an electronic device. The method for constructing the attribute graph structure includes: acquiring an ontology semantic model and corresponding instance data; constructing an attribute graph structure based on the ontology semantic model and the instance data; the attribute graph structure includes multiple nodes; the multiple nodes correspond to multiple objects; the multiple nodes are connected by a first edge and a second edge; the first edge has a first-edge attribute representing the factual constraint relationship; the second edge connects nodes among the multiple nodes that are associated through an intermediate object; the second edge has a second-edge attribute representing the attribute of the intermediate object; generating compiled data based on the attribute graph structure and the semantic constraint relationship; the compiled data is used to perform semantic reasoning on the attribute graph structure.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of data processing technology, and in particular to a method for constructing an attribute graph structure, a data processing method, a system, and an electronic device. Background Technology

[0002] When using attribute graph databases as storage bases, knowledge graph systems in related technologies struggle to perform ontology semantic reasoning efficiently, resulting in limited semantic reasoning capabilities in large-scale graph data scenarios. Summary of the Invention

[0003] In view of this, this application provides a method for constructing an attribute graph structure, a data processing method, a system, and an electronic device to overcome the shortcomings of the prior art.

[0004] According to a first aspect of this application, a method for constructing an attribute graph structure is provided, comprising: obtaining an ontology semantic model and corresponding instance data; the ontology semantic model includes multiple objects, object attributes, and semantic constraint relationships between multiple objects; the instance data represents factual constraint relationships between multiple objects; constructing an attribute graph structure based on the ontology semantic model and instance data; the attribute graph structure includes multiple nodes; multiple nodes correspond to multiple objects; multiple nodes are connected by a first edge and a second edge; the first edge has a first edge attribute representing factual constraint relationships; the second edge connects nodes among the multiple nodes that are associated through an intermediate object; the second edge has a second edge attribute representing the attributes of the intermediate object; generating compiled data based on the attribute graph structure and semantic constraint relationships; the compiled data is used to perform semantic reasoning on the attribute graph structure; the compiled data includes at least first data, the first data representing the mapping relationship between semantic derivation conditions obtained from semantic constraint relationships and corresponding derivation results.

[0005] The second aspect of this application provides a data processing method, comprising: obtaining a query request; performing semantic reasoning on the query request based on the attribute graph structure constructed by the above method and the compiled data corresponding to the attribute graph structure, and generating a reasoning result.

[0006] The third aspect of this application provides a system for constructing an attribute graph structure, comprising: a data acquisition module for acquiring an ontology semantic model and corresponding instance data; the semantic model includes multiple objects, object attributes, and semantic constraint relationships between multiple objects; the instance data represents factual constraint relationships between multiple objects; a structure construction module for constructing an attribute graph structure based on the ontology semantic model and instance data; the attribute graph structure includes multiple nodes; multiple nodes correspond to multiple objects; multiple nodes are connected by a first edge and a second edge; the first edge has a first-edge attribute representing factual constraint relationships; the second edge connects nodes among the multiple nodes that are associated through an intermediate object; the second edge has a second-edge attribute representing the attributes of the intermediate object; and a compilation generation module for generating compiled data based on the attribute graph structure and semantic constraint relationships; the compiled data is used to perform semantic reasoning on the attribute graph structure; the compiled data includes at least first data, which represents the mapping relationship between semantic derivation conditions obtained from semantic constraint relationships and the corresponding derivation results.

[0007] A fourth aspect of this application provides an electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements a method for constructing any of the attribute graph structures described above.

[0008] By adopting the technical solution of this application, firstly, an ontology semantic model containing multiple objects, object attributes, and semantic constraints, and instance data representing factual constraints are obtained. Then, an attribute graph structure adapted to the storage characteristics of attribute graphs is constructed based on the ontology semantic model and instance data. The factual constraints are directly represented by the first edge, and the nodes that originally required intermediate objects to be associated are directly connected by the second edge, and the attributes of the intermediate objects are transformed into attributes of the second edge. This deeply integrates the semantic model with the attribute graph storage structure. On this basis, compiled data for performing semantic reasoning is generated based on the attribute graph structure and semantic constraints. The compiled data contains first data representing the mapping relationship between semantic derivation conditions and corresponding derivation results, enabling semantic reasoning to be executed directly on the attribute graph structure without relying on external reasoning engines or data export and transformation processes. This solves the problem that attribute graph databases are difficult to execute ontology semantic reasoning efficiently in related technologies, realizes the integration of the reasoning system and the storage system, and improves the execution efficiency and system availability of semantic reasoning in large-scale graph data scenarios.

[0009] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this application, nor is it intended to limit the scope of this application. Other features of this application will become readily apparent from the following description. Attached Figure Description

[0010] The above and other objects, features and advantages of this application will become clearer from the following description of embodiments with reference to the accompanying drawings, in which: Figure 1 This illustration schematically depicts an application scenario of the method for constructing an attribute graph structure and a data processing method according to embodiments of this application. Figure 2 This is a flowchart illustrating a method for constructing an attribute graph structure according to an embodiment of this application; Figure 3 This is a flowchart illustrating a data processing method provided in an embodiment of this application; Figure 4 This is a schematic diagram of a system for constructing an attribute graph structure according to an embodiment of the present invention; Figure 5 This is a schematic diagram of the physical structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation

[0011] The embodiments of this application will now be described with reference to the accompanying drawings. However, it should be understood that these descriptions are exemplary only and are not intended to limit the scope of this application. In the following detailed description, numerous specific details are set forth to provide a thorough understanding of the embodiments of this application for ease of explanation. However, it will be apparent that one or more embodiments may be implemented without these specific details. Furthermore, descriptions of well-known structures and technologies are omitted in the following description to avoid unnecessarily obscuring the concepts of this application.

[0012] Figure 1 The illustration schematically depicts an application scenario of the method for constructing an attribute graph structure and a data processing method according to embodiments of this application. For example... Figure 1 As shown, application scenario 100 according to an embodiment of this application may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 serves as a medium for providing a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired or wireless communication links or fiber optic cables. For example, a user can use the first terminal device 101, the second terminal device 102, and the third terminal device 103 to interact with the server 105 through the network 104 to receive or send information, etc.

[0013] The first terminal device 101, the second terminal device 102, and the third terminal device 103 can be electronic devices such as smartphones, wearable devices, personal computers, intelligent voice interaction devices, smart home appliances, intelligent vehicles, in-vehicle terminals, aircraft, unmanned vending terminals, and extended reality devices. Extended reality devices can include virtual reality devices, augmented reality devices, and mixed reality devices. A client application for the target application can be installed and run on the terminal device. This target application can include, but is not limited to, knowledge graph applications, intelligent search applications, intelligent question-and-answer applications, data analysis applications, enterprise information management applications, financial risk control applications, healthcare applications, web browser applications, instant messaging tools, and social platform software (these are just examples). Furthermore, this application embodiment does not limit the form of the target application, including but not limited to applications, mini-programs, etc., installed on the terminal device, and can also be in web page form.

[0014] Server 105 can be a server providing various services, such as a backend management server supporting query requests sent by users using the first terminal device 101, the second terminal device 102, and the third terminal device 103 (this is just an example). The backend management server can obtain the ontology semantic model and corresponding instance data, construct an attribute graph structure based on the ontology semantic model and instance data, generate compiled data based on the attribute graph structure and semantic constraints, and perform semantic reasoning and other processing on the received query requests and other data. It then feeds back the processing results (such as reasoning results generated based on the query request, semantic association information, or derived implicit knowledge) to the terminal devices. The server can be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing cloud computing services such as cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks, and big data. The server can be the backend server of the aforementioned target application, used to provide backend services to the clients of the target application.

[0015] It should be noted that the method for constructing the attribute graph structure and the data processing method provided in the embodiments of this application can generally be executed by the server 105 and / or the terminal devices 101-103. Accordingly, the apparatus for constructing the attribute graph structure and the data processing apparatus provided in the embodiments of this application can generally be disposed in the server 105 and / or the terminal devices 101-103.

[0016] It should be understood that Figure 1 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.

[0017] Figure 2A flowchart illustrating a method for constructing an attribute graph structure according to an embodiment of this application is shown.

[0018] like Figure 2 As shown, the method for constructing this attribute graph structure includes steps S201 to S205.

[0019] Step S201: Obtain the ontology semantic model and the corresponding instance data; the ontology semantic model includes multiple objects, object attributes, and semantic constraint relationships between multiple objects; the instance data represents the factual constraint relationships between multiple objects; Step S202: Construct an attribute graph structure based on the ontology semantic model and instance data; the attribute graph structure includes multiple nodes; multiple nodes correspond to multiple objects; multiple nodes are connected by a first edge and a second edge; the first edge has a first edge attribute representing factual constraint relationships; the second edge connects the nodes among the multiple nodes that are associated through an intermediate object; the second edge has a second edge attribute representing the attributes of the intermediate object; Step S203: Generate compiled data based on the attribute graph structure and semantic constraint relationship; the compiled data is used to perform semantic reasoning on the attribute graph structure; the compiled data includes at least first data, which represents the mapping relationship between the semantic derivation conditions obtained by transforming the semantic constraint relationship and the corresponding derivation results.

[0020] In step S201, the ontology semantic model refers to a semantic framework that formally describes domain knowledge. It can be understood as a structured expression of the conceptual system and their interrelationships in a specific business domain, used to define knowledge rules and semantic constraints within the business domain.

[0021] Optionally, the ontology semantic model includes multiple objects, object attributes, and semantic constraint relationships between multiple objects.

[0022] In this context, the objects in the ontology semantic model refer to conceptual entities or categories within the business domain, used to define a set of entities with the same characteristics. Objects can be concrete entity types or abstract conceptual categories.

[0023] Similarly, an object's attributes refer to data fields used to describe the object's characteristics. They can be understood as the properties or characteristics of an object, used to characterize specific information about the object. Attributes can include data attributes and object attributes. Data attributes take the value of a specific numerical value or string, while object attributes take the value of other objects.

[0024] Similarly, semantic constraints between multiple objects refer to the rules or conditions that different objects should satisfy at the semantic level. These can be understood as logical deduction relationships and constraints between objects, used to define the hierarchical structure, attribute characteristics, and reasoning rules between objects. Semantic constraints include, but are not limited to, class inheritance relationships, attribute transitivity, symmetry, and mutual exclusion.

[0025] For example, in the ontology semantic model of enterprise management, "employee" and "personnel" are two objects. Employees have attributes such as employee ID and department, while personnel have attributes such as name and age. A semantic constraint relationship exists between employees and personnel, which can be represented as employees being a subclass of personnel, meaning all employees are personnel. Furthermore, "employed in" is an object attribute with transitive semantic constraints; that is, if employee A is employed in department B, and department B belongs to company C, then it can be deduced that employee A is employed in company C.

[0026] Correspondingly, instance data in an ontology semantic model refers to specific data instances that conform to the definition of the ontology semantic model. It can be understood as the instantiation of objects in the ontology semantic model, used to describe specific facts in real-world business scenarios. Instance data represents the factual constraint relationships between multiple objects, that is, the actual associations that exist between specific object instances.

[0027] In this context, factual constraints between multiple objects refer to the actual associations between object instances recorded in the instance data. These can be understood as real-world entity relationships within the business system, describing which object instances have what kind of association. Factual constraints are the manifestation of semantic constraints at the specific instance level.

[0028] For example, in the instance data, Zhang San is an instance of the employee object, and the technical department is an instance of the department object. Zhang San's employment in the technical department is a fact constraint relationship, which describes the actual association established between the object instance Zhang San and the object instance technical department through the employment attribute.

[0029] In one feasible implementation, a defined ontology semantic model file and corresponding instance data file can be obtained from an external data source.

[0030] In another feasible implementation, the ontology semantic model and instance data can be obtained in real time by receiving user input or calling a third-party knowledge base interface.

[0031] In step S202, the attribute graph structure refers to a graph data model that organizes data using nodes and edges. It can be understood as converting objects and relations in the ontology semantic model into a data structure that can be stored and queried in a graph database to support efficient graph traversal and semantic reasoning operations.

[0032] Optionally, the attribute graph structure includes multiple nodes; multiple nodes correspond to multiple objects; multiple nodes are connected by a first edge and a second edge.

[0033] The first edge refers to the connection edge that directly reflects the factual constraints in the instance data. It can be understood as a direct mapping of the actual relationships between object instances in the attribute graph. The first edge has a first-edge attribute that represents the factual constraints, which records the specific content and related information of the factual constraints.

[0034] The first-side attribute refers to the data field attached to the first side, which can be understood as a detailed description of the factual constraint relationship. It is used to store attribute information such as the type of association, the establishment time, and the data source.

[0035] The second edge refers to the simplified connection edge established after flattening. It can be understood as simplifying the relationship that originally required multiple hops through intermediate objects into a direct connection, which reduces the complexity of graph traversal and improves query efficiency. The second edge connects nodes that are associated through intermediate objects among multiple nodes, and the second edge has the second edge attribute that represents the properties of the intermediate object.

[0036] In this context, intermediate objects refer to objects that serve only as connectors in the ontology semantic model. They can be understood as auxiliary nodes that do not have independent business meaning but are used to associate other objects, expressing complex multi-dimensional relationships in the original data model. Intermediate objects typically have a fixed number of incoming and outgoing edges, and their attributes can be transferred to the connecting edges.

[0037] The second-side attribute refers to the data field attached to the second side. It can be understood as attribute information transferred from the intermediate object, used to retain the original semantic information after the intermediate object is removed. The second-side attribute contains all or part of the attribute values ​​originally stored in the intermediate object.

[0038] For example, in a business management scenario, employee Zhang San is associated with company A through a job instance, which records attributes such as salary and start date. When constructing the attribute graph structure, the node corresponding to employee Zhang San is connected to the node corresponding to company A via a first edge, whose attributes record basic information about the employment relationship. Simultaneously, through flattening, a second edge is created between the node corresponding to employee Zhang San and the node corresponding to company A. The attributes of this second edge include the salary and start date information originally stored in the job instance, thus eliminating the node corresponding to the job instance as an intermediary object and simplifying the graph structure.

[0039] In one feasible implementation, all objects in the ontology semantic model can be mapped to nodes in the attribute graph structure first. Then, based on the factual constraints in the instance data, a first edge is established between the corresponding nodes. Next, the intermediate nodes in the attribute graph structure that meet the flattening conditions are identified, a second edge is established between the nodes before and after them, and the intermediate nodes are deleted.

[0040] In another feasible implementation, flattening can be performed synchronously during the mapping process, that is, when an intermediate object is identified, a second edge is directly established between its associated source object and target object, without creating the node corresponding to the intermediate object.

[0041] It should be noted that the first and second sides can coexist in the same attribute graph structure. That is, some nodes are connected through the first side to preserve the original factual relationships, while some nodes are connected through the second side to simplify the graph structure. The choice between the first and second sides depends on the specific business scenario and query requirements. The first side is used for relationships that need to retain complete semantic information, while the second side is used for multi-hop relationships that can be simplified. Furthermore, the establishment of the second side does not affect the semantic integrity of the attribute graph structure, because all attribute information of the intermediate objects has been transferred to the attributes of the second side.

[0042] In step S203, compiling data refers to converting the semantic constraints in the ontology semantic model into a set of inference rules that can be executed on the attribute graph structure. It can be understood as structured data formed after parsing and reorganizing the semantic constraints, which is used to guide the attribute graph database to perform semantic inference operations and generate inference results.

[0043] Optionally, the compiled data may include first data, which refers to the core content of the semantic derivation rules. It can be understood as a rule mapping table that describes which conclusions can be derived from which preconditions. It is used to find data that meets specific semantic derivation conditions in the attribute graph structure and generate the corresponding derivation results.

[0044] The first data representation is the mapping relationship between the semantic inference conditions obtained from the semantic constraint relationship and the corresponding inference results. The semantic inference conditions describe the node patterns, edge patterns and attribute conditions that need to be matched in the attribute graph structure. The inference results describe the new node labels, new edges or new attribute values ​​that should be generated when the semantic inference conditions are met.

[0045] For example, in an enterprise management scenario, the ontology semantic model defines the semantic constraint that an employee is a subclass of a person. The first data generated based on this semantic constraint contains the following mapping: the semantic inference condition is that a node has the employee label, and the corresponding inference result is that the node should have the person label. In practical applications, when there is a node named Zhang San labeled as an employee in the attribute graph structure, according to the mapping relationship in the first data, it can be inferred that Zhang San should also have the person label, thus adding the person label to the node corresponding to Zhang San, achieving automatic inference of implicit knowledge.

[0046] For example, the ontology semantic model defines the transitivity of the "employment" relationship. The corresponding first data contains the following mapping relationship: the semantic derivation condition is that there exists a node A connected to node B through an "employment" edge, and node B connected to node C through an "employment" edge. The corresponding derivation result is that an "employment" edge should be established between node A and node C. In practical applications, when there is a relationship in the attribute graph structure that employee Zhang San is employed by department X, and department X belongs to company Y, according to the mapping relationship in the first data, it can be deduced that Zhang San is employed by company Y, thus establishing a new "employment" edge between the nodes corresponding to Zhang San and company Y.

[0047] It should be noted that the mapping relationships in the first set of data are pre-generated during the compilation phase based on the semantic constraints in the ontology semantic model, rather than being constructed temporarily during query or inference execution. This pre-compilation method can improve inference execution efficiency because the system does not need to repeatedly parse semantic constraints at runtime, but directly calls the pre-compiled mapping relationships for inference operations.

[0048] Optionally, the first data can be organized using a table structure, tree structure, or graph structure, with each mapping relationship stored as an independent data unit, facilitating quick retrieval and retrieval during inference execution.

[0049] By adopting the technical solution of this application, the effective application of ontology semantic models in attribute graph storage environments can be realized. Specifically, by acquiring ontology semantic models and instance data, the knowledge framework and factual basis required for semantic reasoning are clarified; by constructing an attribute graph structure containing a first edge and a second edge, the original factual constraints in the instance data are preserved, and the graph structure complexity caused by intermediate objects is eliminated through flattening, reducing the number of graph traversal jumps and improving query efficiency; by generating compiled data containing semantic derivation conditions and derivation result mapping relationships, the semantic constraints in the ontology semantic model are converted into reasoning rules that can be directly executed in the attribute graph database, realizing a deep integration of semantic reasoning and attribute graph storage.

[0050] Furthermore, the technical solution of this application generates compiled data through pre-compilation, avoiding the computational overhead of repeatedly parsing semantic constraint relationships during inference execution and improving inference execution efficiency. At the same time, the construction process of the attribute graph structure takes into account both data integrity and structural simplicity. It preserves the original semantic information through the first edge and optimizes the graph structure through the second edge, making the attribute graph structure suitable for storing complex semantic relationships and facilitating efficient graph traversal and inference operations.

[0051] Furthermore, by transforming semantic constraints into explicit semantic derivation conditions and derivation results into a mapping relationship, the automatic derivation process of implicit knowledge has a clear rule basis, which facilitates the traceability and verification of subsequent reasoning results and improves the interpretability and maintainability of the system.

[0052] Based on the above embodiments, as an optional embodiment, step 201 may further include the following steps: Step 301: Map objects in the ontology semantic model to nodes in the attribute graph structure, and map the attributes of objects to the node attributes of nodes; Step 302: Based on the factual constraint relationships between multiple objects represented in the instance data, establish a first edge between the corresponding nodes, and use the information representing the specific content of the factual constraint relationship as the first edge attribute of the first edge. Step 303: Identify multiple objects in the ontology semantic model that are indirectly related through intermediate objects. For multiple objects that are indirectly related through intermediate objects, establish a second side between the corresponding nodes and use the attributes of the intermediate objects as the second side attributes of the second side. Step 304: Construct an attribute graph structure based on the first and second edges.

[0053] In step S301, the mapping operation refers to the process of converting abstract concepts in the ontology semantic model into specific data elements in the attribute graph structure.

[0054] For example, in an enterprise management scenario, the ontology semantic model defines an employee object, which has attributes such as employee ID, name, and date of employment. During the mapping operation, a corresponding node is created for the employee object. This node is labeled as "employee," and its attributes include an employee ID field, a name field, and a date of employment field. For a specific employee instance, Zhang San, whose employee ID is E001, name is Zhang San, and date of employment is January 1, 2020, an employee type node is generated in the attribute graph structure. This node's employee ID attribute value is E001, its name attribute value is Zhang San, and its date of employment attribute value is January 1, 2020.

[0055] Optionally, during the mapping process, data format conversion can be performed based on the attribute type of the object. For example, date type attributes defined in the ontology semantic model can be converted to timestamp format supported by the attribute graph database, and enumeration type attributes can be converted to string type.

[0056] Optionally, the object hierarchy defined in the ontology semantic model can be expressed by adding multiple type labels to nodes. For example, if an employee object inherits from a personnel object, then both the employee and personnel labels can be added to the corresponding node in the attribute graph structure, thereby preserving the class hierarchy information of the objects.

[0057] In step S302, when there are multiple types of factual constraint relationships between the same object instances in the instance data, multiple first edges can be established between the corresponding nodes, and the edge type and first edge attributes of each first edge correspond to different association relationships.

[0058] In step S303, indirect association through an intermediate object refers to the situation where there is no direct association between two objects, but they are connected through a third object as a bridge. This can be understood as a multi-hop association mode between objects.

[0059] After determining the indirect relationships, for each group of objects indirectly related through an intermediate object, a second edge is established between the corresponding nodes. The establishment of the second edge bypasses the node corresponding to the intermediate object, directly connecting the source object node and the target object node, thus simplifying the original two-hop path into a one-hop path. Simultaneously, the attributes of the intermediate object are stored as second-side attributes of the second edge to preserve the original semantic information.

[0060] Optionally, after establishing the second side, you can choose to delete the node corresponding to the intermediate object and its associated edges, thereby further simplifying the attribute graph structure. Alternatively, you can choose to retain the intermediate object node, allowing the first and second sides to coexist in the attribute graph structure to meet the needs of different query scenarios.

[0061] It's important to note that building the second side does not alter the semantic content expressed by the attribute graph structure, because all attribute information of the intermediate objects has been transferred to the attributes on the second side. The main function of the second side is to optimize graph traversal performance, reduce the number of hops required for query operations, and thus improve query efficiency.

[0062] In step S304, the attribute graph structure contains both a first edge and a second edge. The first edge retains the original factual constraints in the instance data, while the second edge reflects the simplified relationships after flattening. Both types of edges coexist in the attribute graph structure, serving different application scenarios.

[0063] Optionally, during the construction of the attribute graph structure, distinguishing markers can be added to the first and second sides. For example, a specific edge attribute marker can be set for the second side, which originates from the flattening process, to facilitate the identification and processing of different types of edges in subsequent operations.

[0064] Optionally, after the attribute graph structure is constructed, the attribute graph structure can be verified for integrity, including checking whether the necessary attributes of the nodes are complete, whether the connection relationships of the edges are correct, and whether the data types of the attribute values ​​conform to the definition, so as to ensure that the constructed attribute graph structure meets the quality requirements.

[0065] By adopting the technical solution of this embodiment, the ontology semantic model and instance data can be transformed into a graph structure suitable for storage in an attribute graph database. Specifically, by mapping objects to nodes and object attributes to node attributes, the basic expression of the ontology semantic model in the attribute graph structure is realized; by establishing a first edge based on factual constraint relationships, the original association information in the instance data is preserved; by establishing a second edge for indirectly related objects and transferring the attributes of intermediate objects to the attributes of the second edge, the graph structure is flattened, reducing the complexity of graph traversal; by integrating the first and second edges to construct a complete attribute graph structure, a balance between data integrity and structural simplicity is achieved.

[0066] Based on the above embodiments, as an optional embodiment, step S303 may further include the following steps.

[0067] Step 401: Determine the node combinations in the attribute graph structure that satisfy at least a two-hop path structure; the node combination includes a source node, an intermediate node, and a target node; the source node is connected to the intermediate node through a first connecting edge, and the intermediate node is connected to the target node through a second connecting edge; the intermediate node corresponds to an intermediate object; Step 402: Determine whether the intermediate node meets the replacement condition; the replacement condition includes at least the following: the number of incoming edges and outgoing edges of the intermediate node are preset values, and the attributes of the intermediate node are all scalar types. Step 403: For node combinations that meet the replacement condition, based on the identifier of the source node, the identifier of the target node, and the edge types of the first and second connecting edges, multiple node combinations are grouped to obtain target node combinations; multiple node combinations within the same target node combination have the same source node and the same target node. Step 404: For multiple intermediate nodes within each group, generate aggregated attribute values ​​based on the attribute values ​​of the multiple intermediate nodes and the corresponding data source credibility; Step 405: Establish a second edge between the source node and the target node corresponding to each group, and use the aggregated attribute value as the second edge attribute.

[0068] In this embodiment, objects indirectly related through intermediate objects are represented in the attribute graph structure as a path structure with at least two hops. To reduce graph traversal complexity and improve query efficiency, indirect associations meeting specific conditions need to be flattened. This involves establishing a second edge directly between the source and target nodes, bypassing the intermediate nodes, thus simplifying the two-hop path into a one-hop path. When establishing the second edge, it's crucial to ensure that the attribute information of the intermediate objects is not lost; therefore, the attributes of the intermediate objects are migrated as edge attributes of the second edge.

[0069] In step S401, the two-hop path structure refers to the path pattern in the attribute graph structure that connects three nodes through two consecutive edges. It can be understood as a connection relationship that starts from the source node, passes through an intermediate node, and reaches the target node. It is used to identify candidate structures that need to be flattened.

[0070] Among them, a node combination refers to a set of three nodes that satisfy a two-hop path structure, which can be understood as a candidate unit for a flattening operation.

[0071] Here, the first connecting edge refers to the edge from the source node to the intermediate node, and the second connecting edge refers to the edge from the intermediate node to the target node. Together, they form the connection relationship of a two-hop path.

[0072] For example, in an enterprise management scenario, the node corresponding to employee Zhang San is connected to the node corresponding to job instance P001 via an edge representing the employee's role. The node corresponding to job instance P001 is connected to the node corresponding to company A via an edge representing the company's affiliation. This path forms a node combination, where the node corresponding to employee Zhang San is the source node, the node corresponding to job instance P001 is the intermediate node, the node corresponding to company A is the target node, the edge representing the employee's role is the first connecting edge, and the edge representing the company's affiliation is the second connecting edge. This node combination represents the relationship between employee Zhang San and company A indirectly through job instance P001.

[0073] Optionally, when determining node combinations, all node sequences that satisfy the two-hop path structure can be retrieved from the attribute graph structure, and node combinations whose intermediate node types belong to a predefined set of intermediate object types can be selected.

[0074] In step S402, the replaceable condition refers to the set of constraints used to determine whether intermediate nodes can be safely eliminated, and is used to filter out node combinations suitable for flattening.

[0075] Here, the number of incoming edges refers to the number of edges pointing to the intermediate node, and the number of outgoing edges refers to the number of edges pointing out from the intermediate node. The replaceability condition requires that the number of incoming and outgoing edges of the intermediate node be preset values, typically set to 1. This indicates that the intermediate node only serves as a connector within that node combination and does not have any additional relationships with other nodes. If the number of incoming or outgoing edges of an intermediate node is greater than 1, it means that the intermediate node may be shared in multiple business relationships, and directly eliminating the intermediate node would lead to the breakage of other relationships.

[0076] In this context, scalar types refer to attributes whose values ​​are simple data types, such as numeric, string, date, and boolean types, which can be directly stored in edge attributes. This distinguishes them from reference-type or complex structure-type attributes. The substitutability requirement dictates that all attributes of intermediate nodes are scalar types to ensure that the attribute information of intermediate nodes can be completely migrated to the edge attributes of the second edge. If an intermediate node's attribute contains a reference-type attribute (i.e., its value points to another node), this attribute cannot be directly expressed as an edge attribute, and flattening will result in the loss of this attribute information.

[0077] For example, the intermediate node corresponding to job instance P001 has 1 incoming edge, meaning it only has an edge indicating the employee Zhang San's employment relationship, and 1 outgoing edge, meaning it only has an edge indicating the company A's affiliation relationship. The intermediate node's attributes include a salary field with a value of 10,000 yuan (numeric type) and a job level field with a value of engineer (string type), both scalar types. Therefore, this intermediate node satisfies the replacement condition.

[0078] Optionally, the substitutability condition may also include a check of the ontology semantic constraints involved in the intermediate node, such as requiring that the type of the intermediate node does not participate in the association constraints involving third-party objects in the inference rule. If the type of the intermediate node is explicitly referenced by other semantic constraints and those constraints involve the association relationship between the intermediate node and the third-party node, then the intermediate node is determined not to meet the substitutability condition.

[0079] It should be noted that the replacement conditions are set to ensure the safety of the flattening process and avoid damage to the semantic integrity of the attribute graph structure after eliminating intermediate nodes. By limiting the number of incoming and outgoing edges, it is ensured that intermediate nodes only serve as connections in a single path; by limiting the attribute types, it is ensured that attribute information can be transferred without loss.

[0080] In step S403, the grouping operation refers to the process of grouping multiple nodes that meet the replacement condition according to the business key. It can be understood as identifying multiple paths in the attribute graph structure that represent the same business relationship.

[0081] In this context, the business key refers to a data combination used to uniquely identify a business relationship. It can be understood as a composite identifier consisting of the source node's identifier, the target node's identifier, and the edge type, used to distinguish different instances of a business relationship. Grouping based on the business key allows multiple paths established between the same pair of source and target nodes through different intermediate nodes to be grouped together.

[0082] Here, the target node combination refers to the set of node combinations obtained after grouping, which can be understood as the aggregation of all node combinations belonging to the same business relationship. Multiple node combinations within the same target node combination have the same source node and the same target node, but may have different intermediate nodes.

[0083] In real-world applications, there may be multiple indirect relationships between the same source and target objects, established through intermediate objects. These relationships may originate from different data sources or represent different time periods. By using grouping operations, these multiple paths can be identified as different expressions of the same business relationship.

[0084] For example, in a business management scenario, employee Zhang San may have multiple employment relationship records with company A. Assume there are three node combinations: the intermediate node of the first combination is job instance P001 from the human resources system; the intermediate node of the second combination is job instance P002 from the OA system; and the intermediate node of the third combination is job instance P003 from an external data source. The source node for all three combinations is employee Zhang San, and the target node is company A. The first and second connecting edges are both of employment relationship type. Through a grouping operation, these three node combinations are grouped into a single target node combination.

[0085] Optionally, when generating the business key, the edge types of the first and second connecting edges can be combined with the type of the intermediate node to form a unique composite edge type identifier, ensuring that different types of business relationships are not incorrectly grouped together. For example, the composite identifier can be formed by concatenating the first connecting edge type, the intermediate node type, and the second connecting edge type, and used as a component of the business key.

[0086] It should be noted that the purpose of grouping is to identify and integrate multiple redundant or segmented records with the same business relationship. Grouping can effectively handle the problem of duplicate records caused by importing data from multiple sources.

[0087] In step S404, the aggregated attribute value refers to the unified processing of the attribute values ​​of multiple intermediate nodes within the target node combination to obtain the second-side attribute, which can be understood as the operation of resolving attribute conflicts and forming a consistent attribute expression.

[0088] Data source credibility refers to the quantitative assessment of the data quality and reliability of different data sources. Data source credibility can be set based on the type of data source, historical accuracy, or business priority.

[0089] During the generation of aggregated attribute values, for each attribute field within the target node combination, the attribute values ​​of all intermediate nodes on that attribute field and their corresponding data source credibility are collected. For conflicting attribute values, a weighted average is used to determine the final aggregated attribute value.

[0090] For example, for the target node combination containing three nodes, the values ​​of the three intermediate nodes in the job level attribute are as follows: the job level of job instance P001 is Engineer, with a corresponding data source confidence of 0.9; the job level of job instance P002 is Senior Engineer, with a corresponding data source confidence of 0.6; and the job level of job instance P003 is Engineer, with a corresponding data source confidence of 0.8. Using a weighted selection algorithm, the weighted cumulative value for Engineer is calculated as 0.9 plus 0.8 equals 1.7, and the weighted cumulative value for Senior Engineer is 0.6. The Engineer with the larger weighted cumulative value is selected as the aggregate attribute value.

[0091] Optionally, when generating aggregated attribute values, the degree of conflict among attribute values ​​can be recorded. When the weighted cumulative values ​​of different attribute values ​​are similar or show significant discrepancies, a conflict flag field can be set in the second-side attribute for subsequent data quality control or manual review. The degree of conflict can be quantified by calculating the dispersion of attribute value distribution.

[0092] Optionally, when generating aggregate attribute values, an evidence count field can be recorded in the second-side attribute. The value of this field is the number of intermediate nodes within the target node combination, which is used to characterize how many original records support the business relationship expressed by the second side.

[0093] It should be noted that the generation of aggregated attribute values ​​fully considers the conflicts and differences between multi-source data. By introducing the credibility of the data source for weighted processing, the most reliable attribute value can be selected in the event of attribute conflicts, thereby improving the accuracy and credibility of the second-side attributes.

[0094] In step S405, when establishing the second edge, the starting point of the second edge is the source node corresponding to the target node combination, and the ending point is the target node corresponding to the target node combination. The edge type is set according to the composite edge type identifier in the business key. The second edge attributes include the aggregate attribute value generated in step S404, as well as optional auxiliary information such as evidence count and conflict identifier.

[0095] Optionally, when establishing the second side, the identifier set of the original intermediate nodes can be recorded in the second side attributes to establish a traceability relationship between the second side and the original data, which facilitates subsequent data auditing or rollback of flattening operations.

[0096] Optionally, after establishing the second edge, you can choose to retain or delete the intermediate nodes corresponding to the target node combination and their associated first and second connecting edges. If you choose to delete, you achieve complete flattening and simplify the attribute graph structure; if you choose to retain, the first and second edges coexist in the attribute graph structure to meet the needs of different query scenarios.

[0097] By adopting the technical solution of this embodiment, refined flattening processing of multi-hop paths in attribute graph structures can be achieved. Specifically, by determining the node combinations that satisfy the two-hop path structure, candidate structures requiring flattening processing are identified; by determining whether intermediate nodes meet the replaceability condition, the security of flattening processing is ensured, avoiding the loss of semantic information or structural anomalies; by grouping node combinations that meet the conditions, unified management of multiple paths of the same business relationship is achieved; by generating aggregated attribute values ​​based on data source credibility, the attribute conflict problem caused by multi-source data is effectively solved, and the accuracy of the second-side attributes is improved; by establishing a second side containing aggregated attribute values ​​between the source node and the target node, the graph structure is simplified, the number of hops in graph traversal is reduced, and query efficiency is improved.

[0098] Based on the above embodiments, as an optional embodiment, step S204 may further include the following steps.

[0099] Step S501: Parse the semantic constraint relationships to obtain multiple semantic derivation rules; each semantic derivation rule includes derivation conditions and derivation results; Step S502: Generate the target expression based on multiple semantic derivation rules; Step S503: Compile the target expression into an executable operator to generate the first data based on the executable operator.

[0100] In step S501, the semantic derivation rule refers to the condition-conclusion pair obtained by transforming semantic constraint relations. Each semantic derivation rule includes two parts: derivation condition and derivation result. The derivation condition describes the premise pattern that triggers the reasoning, and the derivation result describes the reasoning conclusion that should be generated.

[0101] For example, in an enterprise management scenario, the ontology semantic model defines the semantic constraint that an employee is a subclass of a person. By parsing this semantic constraint, a semantic inference rule is generated. The inference condition of this rule is that an object has an employee type, and the inference result is that the object should have a person type. This rule expresses the logical relationship of inferring the person type from the employee type.

[0102] For example, the ontology semantic model defines that employment relations are transitive. By parsing, a semantic inference rule is generated. The inference condition of this rule is that there is an employment relation between object A and object B and an employment relation between object B and object C. The inference result is that an employment relation should be established between object A and object C.

[0103] In step S502, the target expression refers to the intermediate expression form that converts the semantic derivation rules into a property graph structure.

[0104] Optionally, the derivation rules can be optimized when generating the target expression. Multiple rules with the same derivation conditions can be merged into a single expression to reduce redundant matching operations. Rules whose derivation conditions include optional paths can be split into multiple expressions for separate processing.

[0105] In step S503, the executable operator refers to the operation unit that can be directly executed in the attribute graph database, which can be understood as converting the target expression into query and write instructions that the database can recognize.

[0106] The process of compiling a target expression into an executable operator involves syntax parsing of the expression, execution plan optimization, and operator sequence generation. The compiled executable operator contains basic operation types such as graph pattern matching, data reading, result generation, and data writing. These basic operations are combined to form complete inference execution logic.

[0107] The process of generating first data based on executable operators includes extracting the semantic derivation conditions and derivation results corresponding to each executable operator, establishing a mapping relationship between the two, and storing this mapping relationship as a component of the first data. The organization of the first data facilitates the rapid retrieval and invocation of the corresponding executable operators during inference execution.

[0108] Based on the above embodiments, step S502 may include a combination of at least one or more of the following: Step S5021: For each semantic derivation rule, based on the node type, edge type, and node attributes of the attribute graph structure, the derivation conditions are converted into graph pattern matching expressions; the graph pattern matching expressions are used to find node combinations and edge combinations that satisfy the derivation conditions in the attribute graph structure. Step S5022: For each semantic derivation rule, based on the node type, edge type, and node attributes of the attribute graph structure, the derivation result is converted into a derivation result generation expression; the derivation result generation expression is used to describe the new node label, new edge, or new attribute value to be generated. Step S5023: For each semantic derivation rule, generate a lineage record construction expression; the lineage record construction expression is used to describe how to establish a tracing association between the derivation result generated by the expression and the node combination and edge combination matched by the graph pattern matching expression.

[0109] In step S5021, the graph pattern matching expression refers to the query expression that describes graph traversal and pattern matching in the attribute graph structure. It can be understood as converting the semantic patterns in the derivation conditions into a query statement that can be executed by the graph database, which is used to locate data objects that satisfy the derivation premises.

[0110] The process of generating graph pattern matching expressions involves mapping the object types involved in the derivation conditions to node type labels, mapping the relationships between objects to edge types, and mapping the attribute constraints of objects to node attribute filtering conditions. The generated expression describes the sequence of nodes, the sequence of edges, and the corresponding attribute constraints that need to be matched.

[0111] For example, for a rule whose inference condition is that objects have an employee type, its graph pattern matching expression is described as matching all nodes with the employee label, without involving edge matching. For a rule whose inference condition is that there exists an employment relationship between object A and object B and an employment relationship between object B and object C, its graph pattern matching expression is described as matching node triples that satisfy a two-hop path structure, where the type of the first hop edge is employment, and the type of the second hop edge is also employment.

[0112] In step S5022, the derivation result generation expression refers to the expression describing how the reasoning conclusion should be represented in the attribute graph structure. It can be understood as converting the derivation result into a modification operation on the attribute graph structure, which is used to guide the materialized storage or dynamic generation of the reasoning conclusion.

[0113] The content of the expression generated by the derivation result depends on the type of the derivation result. When the derivation result indicates that an object should have a certain type, the expression describes adding a specific type label to the corresponding node; when the derivation result indicates that an object should have a certain relationship, the expression describes creating an edge of a specific type between the corresponding nodes; when the derivation result indicates that an object's attribute should have a certain value, the expression describes setting or updating a specific attribute field on the corresponding node.

[0114] For example, for a rule whose derivation result is that an object should have a person type added, the derivation result generating expression is described as adding a person label to the matched nodes. For a rule whose derivation result is that an employment relationship should be established between object A and object C, the derivation result generating expression is described as creating an edge of employment type between node A and node C.

[0115] In step S5023, the bloodline record construction expression refers to the expression that describes how to establish traceability information for the derivation result. It can be understood as the operation definition that records the dependencies in the derivation process in a structured way, in order to achieve the traceability and interpretability of the reasoning result.

[0116] The pedigree record construction expression involves recording two types of information. The first type is rule identification information, which records which semantic derivation rule the derivation result originates from. This information is expressed through the rule's unique identifier. The second type is premise fact identification information, which records which specific nodes and edges the derivation result depends on, matched by the graph pattern matching expression. This information is expressed through a set of node identifiers and edge identifiers.

[0117] For example, for a rule that derives a person type from an employee type, its lineage record construction expression is described as creating a lineage record. This record includes a rule identifier field, whose value is the unique number of the rule, and a premise fact identifier field, whose value is the identifier of the matched employee node. When a person tag is added to the node corresponding to employee Zhang San, a lineage record is created simultaneously, recording that the person tag originates from the employee-to-person derivation rule and depends on the premise fact that the employee tag of the Zhang San node is a prerequisite.

[0118] Optionally, when generating the expression for constructing lineage records, adaptation can be made based on the storage method of the lineage records. When lineage records are stored using independent tables or independent nodes, the expression describes creating a new lineage record entity and establishing an association edge with the derivation result; when lineage records are stored using edge attributes or node attributes, the expression describes setting the lineage information attribute field on the data element corresponding to the derivation result. Through this adaptation process, it is ensured that lineage records can be effectively integrated with the attribute graph structure.

[0119] It should be noted that the generation of the lineage record construction expression is completed during the rule compilation phase, rather than being constructed temporarily during inference execution. This pre-compilation method allows the generation of lineage records and the generation of derivation results to be executed synchronously, ensuring that each derivation result is accompanied by corresponding lineage information when written into the attribute graph structure, thereby achieving the integration of inference execution and lineage maintenance.

[0120] Furthermore, the kinship record construction expression can also include initialization or update operations for the support count. When the derivation result is generated for the first time, its support count is initialized to 1; when the same derivation result is generated again by different derivation paths, its support count is incremented by 1. The support count information can be stored as an additional field of the kinship record, or it can be maintained as an attribute field of the derivation result itself.

[0121] By adopting the technical solution of this embodiment, semantic constraints in the ontology semantic model can be converted into rule expressions that can be executed on the attribute graph structure. Specifically, semantic derivation rules are obtained by parsing semantic constraints, clarifying the preconditions and conclusions of the derivation; a systematic conversion of semantic derivation rules into attribute graph operations is achieved by generating a target expression that includes a graph pattern matching expression, a derivation result generation expression, and a lineage record construction expression; and a mapping relationship between semantic derivation conditions and derivation results is established by compiling the target expression into an executable operator and generating first data.

[0122] Based on the above embodiments, as an optional embodiment, step S502 may further include the following steps.

[0123] Step S601: For each semantic inference rule, collect query statistics and change statistics; query statistics represent the number of queries that trigger the semantic inference corresponding to the semantic inference rule; change statistics represent the number of changes to the nodes or edges that affect the inference conditions of the semantic inference rule. Step S602: Based on query statistics and change statistics, calculate the materialized storage strategy cost and query rewriting strategy cost respectively; the materialized storage strategy cost represents the total computing and storage resources required to pre-execute semantic inference rules and persist the inference results; the query rewriting strategy cost represents the total computing resources required to dynamically calculate the inference results during query execution. Step S603: When the cost of the materialized storage strategy is less than or equal to the cost of the query rewrite strategy, a materialized storage execution operator is generated; the materialized storage execution operator is used to write the derivation result into the materialized storage area of ​​the attribute graph structure. Step S604: When the cost of the materialized storage strategy is greater than the cost of the query rewrite strategy, a query rewrite execution operator is generated; the query rewrite execution operator is used to dynamically execute the derivation logic of the semantic inference rules when a query request is received.

[0124] In step S601, the query statistics refer to the statistical quantity of query operations initiated within a preset time window for the inference result corresponding to a specific semantic inference rule, used to assess the frequency of use of the inference result. The scope of query statistics collection includes operations that directly query the inference result, as well as operations that perform subsequent graph traversal based on the inference result.

[0125] Optionally, the preset time window can be set to the most recent hour, the most recent day, or the most recent week, with the specific value configured according to the operating characteristics of the business system and the data update frequency.

[0126] Change statistics refer to the statistical quantity of additions, deletions, or modifications that occur to nodes or edges related to the preconditions upon which the semantic inference rules depend within a preset time window. This can be understood as a quantitative measure of the frequency of data changes, used to assess the update burden required to maintain the materialized inference results. The collection objects of change statistics include data change events corresponding to all node types and edge types involved in the graph pattern matching expression.

[0127] In step S602, the materialized storage strategy cost refers to the total resource overhead required to pre-calculate and persistently store the derivation results, and is used to compare with the query rewrite strategy cost to determine the optimal strategy.

[0128] The calculation of materialized storage strategy costs involves three categories of resource overhead. The first is derivation computation overhead, representing the computational resources required to execute graph pattern matching expressions and generate derivation results. The second is storage write overhead, representing the storage resources required to persist the derivation results to the materialized storage area of ​​the attribute graph structure and maintain related indexes. The third is maintenance update overhead, representing the resource consumption required for incremental update operations of the derivation results due to data changes within a preset time window.

[0129] The calculation of maintenance and update overhead depends on change statistics and the average update size triggered by a single change. The average update size triggered by a single change can be estimated from historical operational data or theoretically derived based on the complexity of the rule's derivation conditions. For rules whose derivation conditions involve multi-hop paths, a single change may trigger a large-scale incremental update; for rules whose derivation conditions only involve single-node attributes, a single change only triggers a local update.

[0130] The cost of query rewriting strategy refers to the total resource overhead required to dynamically calculate and derive results during querying, and is used to evaluate the applicability of the strategy under the current business load.

[0131] The cost calculation of query rewrite strategy mainly involves derivation computation overhead, which equals the computational resource consumption of a single derivation operation multiplied by the query statistics. Since query rewrite strategy does not perform pre-computation or persistent storage, it does not involve storage write overhead or maintenance update overhead. The advantage of query rewrite strategy is that it avoids the maintenance burden caused by data changes, but the disadvantage is that derivation computation needs to be re-executed for each query.

[0132] Optionally, when calculating the cost of materialized storage strategy, an incremental ratio coefficient can be introduced to characterize the proportion of the actual update size triggered by a single change to the size of the full inferred result. The incremental ratio coefficient ranges from 0 to 1, with a smaller value indicating stronger locality of incremental updates. By introducing the incremental ratio coefficient, the optimization effect of incremental maintenance can be reflected more accurately, avoiding the estimation of maintenance costs as the cost of full recalculation.

[0133] In step S603, when the calculation result shows that the cost of the materialized storage strategy is less than or equal to the cost of the query rewrite strategy, it is determined that the materialized storage strategy is more advantageous under the current business load, and the system generates a materialized storage execution operator.

[0134] Materialized storage execution operators refer to the sequence of operations used to pre-compute and persistently store the derivation results, and are used to write the derivation results into the materialized storage area of ​​the property graph structure during the inference initialization phase or incremental maintenance phase.

[0135] The sequence of operations performed by materialized storage operators can include, but is not limited to, matching, deduplication, and writing operations. Matching operations execute graph pattern matching expressions, retrieving node and edge combinations from the attribute graph structure that satisfy the derivation conditions. Deduplication operations filter derivation results already existing in the materialized storage area to avoid duplicate writes. Writing operations insert newly generated derivation results into the materialized storage area and update the relevant index structure.

[0136] Optionally, the materialized storage execution operator may also include a lineage record write operation and a support count initialization operation. The lineage record write operation constructs an expression based on the lineage record, generates a corresponding lineage record for each derivation result, and writes it to the lineage storage area. The support count initialization operation sets the support count field to an initial value of 1 for the first generated derivation result, or increments the support count field by 1 for an existing derivation result.

[0137] In step S604, when the calculation results show that the cost of the materialized storage strategy is greater than the cost of the query rewrite strategy, it is determined that the query rewrite strategy is more advantageous under the current business load, and the system generates a query rewrite execution operator.

[0138] The query rewrite execution operator refers to the sequence of operations used to dynamically calculate the derivation results during query execution. It can be understood as executable code that embeds the derivation logic of the derivation rules into the query statement. It is used to temporarily perform graph pattern matching and return the derivation results when a query request is received, without persistent storage.

[0139] The query rewrite execution operator's operation sequence includes query rewriting and dynamic matching. The query rewriting operation replaces the parts of the original query statement that involve derivation results with the corresponding graph pattern matching expression, enabling the query to directly derive the required results from the underlying data. The dynamic matching operation calls the graph pattern matching expression during query execution, retrieves node and edge combinations that satisfy the derivation conditions from the underlying data region of the attribute graph structure, and returns the matching results as the derivation results to the query requester.

[0140] Optionally, the query rewrite execution operator can include query result caching operations. For derivation rules with relatively high query frequency but still below the materialization threshold, the derivation results can be temporarily cached in memory after query execution, with a cache expiration period set. Within the cache expiration period, subsequent identical queries can directly retrieve results from the cache, avoiding repeated derivation calculations. The cache expiration period needs to comprehensively consider data change frequency and memory resource limitations.

[0141] It should be noted that the materialized storage execution operator and the query rewrite execution operator are two mutually exclusive implementation methods for the same semantic inference rule. The system selects and generates only one of these operators for each rule based on the cost assessment results. Different rules can adopt different strategies; that is, on the same attribute graph structure, some inference rules adopt the materialized storage strategy, while others adopt the query rewrite strategy, thereby achieving a hybrid strategy deployment and comprehensively optimizing the overall performance of the system.

[0142] By adopting the technical solution of this embodiment, an appropriate storage strategy can be selected for each semantic inference rule based on the actual characteristics of the business load. Specifically, by collecting query statistics and change statistics, quantitative indicators reflecting the operating status of the business system are obtained; by calculating the cost of materialized storage strategy and query rewrite strategy respectively, a unified cost measurement standard is established, making the two strategies comparable; by generating corresponding execution operators based on cost comparison results, the automation and optimization of strategy selection are realized, avoiding the strategy mismatch problem caused by relying on experience judgment.

[0143] Based on the above embodiments, as an optional embodiment, the compilation data also includes second data, and step S203 may further include the following steps.

[0144] Step S701, generating second data based on the attribute graph structure and semantic constraint relationships, including: Step S702: Extract multiple mapping relationships from the first data; Step S703: Based on the derivation conditions and derivation results of the semantic derivation rules corresponding to the mapping relationship, determine the dependency relationship between multiple mapping relationships; the dependency relationship represents the association relationship between the derivation result of the first mapping relationship and the derivation condition of the second mapping relationship. Step S704: Determine the execution order of multiple mapping relationships based on dependencies; the execution order satisfies that the dependent mapping relationship is executed before the dependent mapping relationship. Step S705: Generate second data based on dependencies and execution order.

[0145] In step S701, extracting multiple mapping relationships from the first data involves reading the semantic derivation conditions and derivation results stored in the first data one by one and forming a set of mapping relationships.

[0146] Each mapping relationship serves as an independent analysis unit, containing an identifier for the corresponding semantic derivation rule, a set of data types involved in the derivation conditions, and a set of data types involved in the derivation result. The data type set includes element types of the attribute graph structure, such as node type, edge type, and attribute name.

[0147] In step S702, dependency refers to the pre- and post-execution constraints that exist between two mapping relationships.

[0148] The process of determining dependencies involves performing a matching analysis between the derivation conditions and the derivation results for each pair of mappings. For mappings A and B, if the set of data types involved in the derivation result of mapping A intersects with the set of data types involved in the derivation conditions of mapping B, then mapping B is determined to depend on mapping A, and a dependency relationship from mapping A to mapping B is established.

[0149] The intersection determination of data type sets includes a step-by-step comparison of node types, edge types, and attribute names. If the derivation result of mapping relation A generates a certain node label, and the derivation condition of mapping relation B requires matching that node label, then the two are determined to have an intersection in data types. Similarly, if the derivation result of mapping relation A generates a certain edge type, and the derivation condition of mapping relation B requires traversing that edge type, then the two are also determined to have an intersection in data types.

[0150] For example, continuing the enterprise management scenario example above, mapping relationship A corresponds to the inference rule from employee to person, and its inference result is a person node label. Mapping relationship B corresponds to the attribute inference rule based on person labels, and its inference condition includes person node labels. Through matching analysis, the data type set of the inference result of mapping relationship A contains person labels, and the data type set of the inference condition of mapping relationship B also contains person labels. Since there is an intersection between the two, it is determined that mapping relationship B depends on mapping relationship A, establishing a dependency relationship from A to B. This dependency relationship indicates that only after mapping relationship A is executed and person labels are generated can mapping relationship B obtain complete input data for inference.

[0151] In step S703, the process of determining the execution order can employ a topological sorting algorithm. The topological sorting algorithm constructs a directed graph structure using mapping relationships as nodes and dependencies as directed edges. The algorithm starts with mapping relationships without incoming edges, i.e., rules that do not depend on any other mapping relationships, and uses this as the level zero execution sequence. Then, it removes the level zero mapping relationships and their outgoing edges, and continues searching for new mapping relationships without incoming edges, which are then used as the level one execution sequence. This process is repeated until all mapping relationships have been assigned to their corresponding execution levels.

[0152] In this context, the execution level refers to the execution batch identifier assigned to the mapping relationship during the topological sorting process. It can be understood as a hierarchical number representing the depth of dependency between mapping relationships, used to guide the inference execution engine to execute according to the hierarchical scheduling rules. Multiple mapping relationships within the same execution level do not have mutual dependencies and can be executed in parallel.

[0153] For example, in a scenario involving three mappings, mapping A does not depend on any other mappings, mapping B depends on mapping A, and mapping C depends on mapping B. Through topological sorting, mapping A is assigned to the zeroth-level execution sequence, mapping B to the first-level execution sequence, and mapping C to the second-level execution sequence. This execution order ensures that the system first executes mapping A to generate an intermediate result, then executes mapping B based on that intermediate result, and finally executes mapping C based on the result of mapping B.

[0154] Optionally, when a cycle exists in a directed graph, it indicates a circular dependency between multiple mappings, making topological sorting impossible. In this case, the system can perform cycle detection and processing. One approach is to merge the mappings in the cycle into an iterative execution group, repeatedly executing the fixed-point computation method until the derivation result no longer changes. Another approach is to break the circular dependency by adjusting the rule definitions or introducing intermediate breakpoints. The existence of cycles usually suggests the presence of recursive definitions or mutually derived constraints in the ontology semantic model.

[0155] It should be noted that the determination of the execution order is used not only for the full inference execution during the inference initialization phase, but also for the incremental maintenance execution during the runtime phase. During the initialization phase, the system executes all mapping relationships layer by layer according to the execution order, completing the full inference operation on the attribute graph structure. During the incremental maintenance phase, the system determines the affected subset of mapping relationships based on data change events, and performs incremental inference within this subset according to the execution order, ensuring that data dependencies are correctly maintained during the incremental inference process.

[0156] In step S704, the second data refers to the structured storage form of the dependencies and execution order between mapping relationships. It can be understood as the data expression of the topology that the inference rules depend on, which is used to quickly query the associations and execution order between rules during inference execution and incremental maintenance.

[0157] The second data can be organized using a directed graph structure, an adjacency list structure, or a dependency matrix structure. In a directed graph structure, each mapping corresponds to a graph node, with node attributes including a mapping identifier and an execution level number. Dependencies correspond to directed edges, with edge attributes including trigger condition identifiers. In an adjacency list structure, a list of predecessor and successor mappings is maintained for each mapping, facilitating forward and reverse lookups of dependencies. In a dependency matrix structure, a two-dimensional matrix is ​​used to record whether a dependency exists between any two mappings.

[0158] For example, in a scenario containing employee-to-person inference rules and person attribute inference rules, the second data is stored in a directed graph structure. The graph contains two nodes: node A corresponds to the employee-to-person mapping relationship with an execution level of 0, and node B corresponds to the person attribute mapping relationship with an execution level of 1. The graph contains a directed edge from node A to node B, and the edge attribute records the trigger condition as the person node label. This second data clearly expresses the dependency relationship and execution order between the two rules.

[0159] Optionally, the second data may also include summary information on the execution levels, recording the number of mapping relationships and the list of identifiers contained in each execution level. This summary information helps the inference execution engine quickly obtain the set of rules to be executed at each level during scheduling, improving scheduling efficiency.

[0160] Optionally, when generating the second data, transitive closure computation can be performed on the dependencies to supplement indirect dependencies. An indirect dependency refers to the relationship where mapping A indirectly depends on mapping B through an intermediate mapping. Although indirect dependencies can be derived from direct dependencies, explicitly recording indirect dependencies can speed up certain query operations, such as determining whether a dependency path exists between any two mappings.

[0161] By employing the technical solution of this embodiment, a dependency structure between semantic inference rules can be constructed and a reasonable execution order can be determined. Specifically, by extracting multiple mapping relationships from the first data, a set of objects for dependency analysis is obtained; by performing matching analysis based on the inference conditions and inference results, the sequential dependency relationships between mapping relationships are identified; by using a topological sorting algorithm to determine the execution order, a conflict-free execution sequence that satisfies dependency constraints is established; and by generating second data containing dependency relationships and execution order, a structured basis for rule scheduling is provided for the inference execution engine.

[0162] Based on the above embodiments, as an optional embodiment, the method for constructing the above attribute graph structure may further include the following steps.

[0163] Step S801: In response to a data change operation on a node or edge in the attribute graph structure, obtain the changed data; the changed data includes the change type and the identifier of the changed object; the change type includes insertion, update, or deletion; Step S802: Based on the object identifier, determine the target semantic derivation rule affected by the data change operation; the target semantic derivation rule is the semantic derivation rule whose derivation conditions involve the object identifier. Step S803: When the change type is insert or update, perform incremental derivation for the target semantic derivation rule, generate incremental derivation results, and write the incremental derivation results into the attribute graph structure. Step S804: When the change type is deletion, for the stored derivation results corresponding to the target semantic derivation rule, detect whether there is an alternative derivation path; the alternative derivation path is a derivation path that can still derive the same conclusion based on other nodes or edges in the attribute graph structure other than the changed object; In step S805, if there is no alternative derivation path, the corresponding stored derivation result is deleted from the attribute graph structure; if there is an alternative derivation path, the corresponding stored derivation result is retained.

[0164] In step S801, the data change operation refers to the addition, modification, or deletion of nodes or edges in the attribute graph structure. The changed data includes at least two elements: change type and change object identifier.

[0165] The change type refers to the category identifier of the data change operation, including three types: insertion, update, and deletion. Insertion operations represent adding nodes, adding edges, or adding new labels to nodes in the attribute graph structure. Update operations represent modifying the attribute values ​​of nodes or edges. Deletion operations represent removing nodes, removing edges, or deleting node labels from the attribute graph structure.

[0166] The change object identifier refers to the unique identifier of the data element that has been changed; it can be understood as the data address used to locate the change. Change object identifiers include node identifiers, edge identifiers, node label names, or attribute field names.

[0167] In step S802, the target semantic derivation rule refers to the semantic derivation rule whose derivation conditions are affected by data changes. It can be understood as the range of rules that need to be incrementally derived or undoed, which is used to limit the calculation range of incremental maintenance and avoid unnecessary re-execution of all rules.

[0168] The process of determining the target semantic derivation rules includes two steps: object type identification and rule matching. Object type identification determines the data type involved in the change based on the object identifier, including node type, edge type, or attribute name. Rule matching searches the first set of data for semantic derivation rules whose derivation conditions involve that data type, based on the data type involved in the change.

[0169] For example, when the changed data indicates the addition of a new employee node, the changed object type identification determines that the change involves the employee node type. The rule matching process traverses all mapping relationships in the first data, searching for rules whose derivation conditions include the employee node type. Assuming the first data contains a subclass derivation rule from employee to person, and the derivation condition of this rule is that the node has an employee label, this rule is identified as the target semantic derivation rule. If the first data also contains transitive derivation rules based on employment edges, since the derivation condition of this rule does not involve the employee node type, it is not identified as a target semantic derivation rule.

[0170] It should be noted that the determination of the target semantic derivation rules only identifies the scope of affected rules, without immediately executing the derivation operation. The actual derivation execution is handled by step S803 or step S804, which branches the process according to the change type. By separating the identification of the scope of impact from the derivation execution, the system can sort, optimize, or batch process the target rule set before execution, improving the efficiency of incremental maintenance.

[0171] In step S803, when the change type is insertion or update, the system performs incremental derivation operations for the target semantic derivation rule. The incremental derivation process includes three stages: incremental input determination, incremental matching, and result writing. In the incremental input determination stage, the range of input data for this derivation is identified based on the changed data. For insertion operations, the incremental input is the newly inserted node or edge; for update operations, the incremental input is the node or edge whose attribute value has changed. In the incremental matching stage, for each target semantic derivation rule, the incremental input is substituted into the graph pattern matching expression of the rule to find node and edge combinations that satisfy the derivation conditions. In the result writing stage, an expression is generated based on the derivation result, and the newly generated derivation result is written to the corresponding region of the attribute graph structure.

[0172] For example, when a new node corresponding to employee Li Si is added, incremental derivation is performed based on the subclass derivation rule from employee to personnel. The incremental input determination phase uses the Li Si node as the input for this derivation. The incremental matching phase checks whether the Li Si node has an employee label; if a match is successful, it is considered a successful match. The result writing phase adds a personnel label to the Li Si node and writes the derivation result to the materialized storage area of ​​the attribute graph structure. The entire incremental derivation process only processes the Li Si node and does not re-traverse all employee nodes in the attribute graph structure.

[0173] Optionally, during incremental derivation, a deduplication check can be performed on the generated incremental derivation results. The deduplication check determines whether the incremental derivation result already exists in the materialized storage area of ​​the attribute graph structure. If it already exists, the write operation is skipped to avoid repeatedly writing the same derivation result. For existing derivation results, if a lineage record is used to construct the expression, a new lineage record is added, and the support count of the derivation result is incremented by 1, indicating that the conclusion is supported by multiple derivation paths.

[0174] In step S804, when the change type is deletion, the system does not directly delete all derivation results that depend on the changed object. Instead, it performs alternative derivation path detection for each stored derivation result. An alternative derivation path refers to a derivation path that can still derive the same conclusion from other nodes or edges in the attribute graph structure after the changed object is deleted. It can be understood as other supporting evidence for the derivation results, used to determine whether the derivation results should be retained or deleted.

[0175] The process of detecting alternative derivation paths includes two stages: lineage record query and path validity verification. In the lineage record query stage, based on the changed object identifier, records whose premise fact set contains the changed object identifier are searched within the lineage records generated by the lineage record construction expression, thus locating the set of affected derivation results. In the path validity verification stage, for each affected derivation result, the total number of its lineage records is counted, and the number of lineage records invalidated by this deletion operation is determined. If the number of invalid lineage records is less than the total number of lineage records, it indicates that other lineage records for this derivation result are unaffected, meaning an alternative derivation path exists. If the number of invalid lineage records equals the total number of lineage records, it indicates that all derivation bases for this derivation result have become invalid, meaning no alternative derivation path exists.

[0176] For example, in an enterprise management scenario, the employee node Zhang San has both an employee tag and a student tag. The ontology semantic model defines both employees and students as subclasses of "personnel". The system generates two lineage records for the personnel tag of node Zhang San. The first lineage record indicates that the personnel tag originates from the employee tag, and the second lineage record indicates that the personnel tag originates from the student tag. At this time, the support count of the personnel tag for node Zhang San is 2. When the employee tag of node Zhang San is deleted, the lineage record query phase locates that the personnel tag is affected. The path validity verification phase finds that the first lineage record is invalid, but the second lineage record is still valid. Therefore, it is determined that there is an alternative derivation path, which is the path of deriving the personnel tag from the student tag.

[0177] In step S805, different processing operations are performed based on the detection results of alternative derivation paths. If no alternative derivation path exists, it indicates that the derivation result has lost all derivation basis and should be deleted from the attribute graph structure. The deletion operation includes removing the derivation result from the materialized storage area and deleting the relevant kinship records from the kinship record area. If an alternative derivation path exists, it indicates that the derivation result still has other valid derivation basis and should be retained in the attribute graph structure. The retention operation includes updating the support count of the derivation result and deleting invalid kinship records, but not deleting the derivation result itself.

[0178] For example, continuing with the example of node Zhang San above, when the employee tag is deleted and an alternative derivation path from the student tag to the personnel tag is detected, the system retains the personnel tag, deletes only the lineage record from the employee tag to the personnel tag, and updates the support count from 2 to 1. When the student tag is subsequently deleted, the alternative derivation path detection is performed again. This time, no alternative derivation path exists, the system deletes the personnel tag, and deletes the remaining lineage record.

[0179] Optionally, a cascading deletion operation can be triggered when deleting a derivation result. The deleted derivation result may serve as a derivation condition for other derivation rules; therefore, the deletion operation needs to be treated as a new change event, and steps S802 to S805 need to be re-executed to perform an impact scope analysis and derivation result processing on downstream rules that depend on the deleted derivation result. Through cascading deletion, the integrity and consistency of derivation results in the attribute graph structure are ensured.

[0180] By adopting the technical solution of this embodiment, semantic inference results can be maintained efficiently when the attribute graph structure undergoes data changes.

[0181] Figure 3 A flowchart illustrating a data processing method provided in an embodiment of this application is shown.

[0182] like Figure 3 As shown, the data processing method may specifically include the following steps.

[0183] Step S901: Obtain the query request; Step S902: Based on the attribute graph structure constructed by the above-mentioned attribute graph structure construction method and the corresponding compiled data of the attribute graph structure, semantic reasoning is performed on the query request to generate reasoning results.

[0184] In step S901, a query request refers to a data query operation instruction initiated by a user or application system for an attribute graph structure. A query request can contain two core elements: query conditions and query target.

[0185] For example, in an enterprise management scenario, a user initiates a query request to obtain a set of all personnel nodes. The query condition for this request is that the nodes have a personnel tag, and the query target is the set of nodes that satisfy this condition. As another example, a user initiates a query request to verify whether employee Zhang San works for company A. The query condition for this request is the existence of an employment edge from node Zhang San to node A, or a deducible employment relationship, and the query target is a Boolean value indicating whether the relationship exists.

[0186] In step S902, the process of performing semantic reasoning on the query request may include three stages: query parsing, strategy determination, and result generation. The query parsing stage performs syntactic and semantic analysis on the query request, identifying the data types and query patterns involved in the query conditions. The strategy determination stage determines whether the inference rules involved in the query conditions adopt a materialized storage strategy or a query rewriting strategy based on the strategy selection results recorded in the compiled data. The result generation stage performs corresponding data access operations according to different strategies and returns the query results.

[0187] For inference rules employing a materialized storage strategy, the query execution process directly accesses the materialized storage area of ​​the attribute graph structure, retrieving the pre-calculated and stored inference results. The inference results in the materialized storage area and the data in the basic data area are processed uniformly during the query, eliminating the need to distinguish data sources and thus providing transparent semantic reasoning support to the user.

[0188] For inference rules employing a query rewriting strategy, the query execution process calls the corresponding query rewriting execution operator, replacing the parts of the query statement involving the inference result with the corresponding graph pattern matching expression. The rewritten query statement directly performs matching operations in the basic data area, dynamically calculates the inference result, and returns it. The query rewriting operation is completed transparently during query execution, and users do not need to understand the underlying implementation details.

[0189] For example, a user queries a set of all personnel nodes. The query parsing phase identifies the query condition as personnel node labels. The strategy determination phase queries the compiled data and discovers that the subclass inference rule from employee to personnel adopts a materialized storage strategy. The result generation phase accesses the materialized storage area and reads all nodes already labeled as personnel, including nodes originally labeled as personnel in the basic data, as well as personnel nodes derived from employees through inference. The query result returns a complete set of personnel nodes.

[0190] It's important to note that the semantic reasoning operation performed on query requests is transparent to the user. Users do not need to explicitly invoke the reasoning function when writing their queries; the system automatically determines whether reasoning is necessary and, if so, which strategy to use during query execution. This transparency allows users to query both explicit data and implicit knowledge in a unified manner, reducing the complexity of using the system.

[0191] The inference result refers to the data set returned to the query requester after the query is executed. The inference result can include additional information about its derivation basis. For inference results generated by a materialized storage strategy, the rules and preconditions for the derivation can be obtained by querying lineage records. For inference results dynamically calculated using a query rewrite strategy, the result can be marked as originating from inference calculation rather than explicit storage. This derivation basis information enhances the interpretability of the query results, making it easier for users to understand and verify their correctness.

[0192] By adopting the technical solution of this embodiment, the data processing method of this embodiment fully utilizes the compiled data and strategy selection results established in the above-mentioned attribute graph structure construction method. For derivation rules using a materialized storage strategy, the query operation directly reads the pre-calculated derivation results, resulting in a fast response speed; for derivation rules using a query rewriting strategy, the query operation dynamically calculates the derivation results, without occupying storage space. Through hybrid strategy deployment, a balance is achieved between query performance, storage overhead, and maintenance costs.

[0193] Figure 4 This is a schematic diagram of a system for constructing an attribute graph structure, provided in an embodiment of the present invention. Figure 4 As shown, the system for constructing this attribute graph structure may include: The data acquisition module is used to acquire the ontology semantic model and the corresponding instance data; the semantic model includes multiple objects, object attributes, and semantic constraint relationships between multiple objects; the instance data represents the factual constraint relationships between multiple objects; The structure building module is used to construct an attribute graph structure based on the ontology semantic model and instance data. The attribute graph structure includes multiple nodes; multiple nodes correspond to multiple objects; multiple nodes are connected by a first edge and a second edge; the first edge has a first edge attribute that represents factual constraint relationships; the second edge connects the nodes that are associated with the intermediate objects among the multiple nodes; the second edge has a second edge attribute that represents the attributes of the intermediate objects. The compilation generation module is used to generate compiled data based on the attribute graph structure and semantic constraint relationships; the compiled data is used to perform semantic reasoning on the attribute graph structure; the compiled data includes at least first data, which represents the mapping relationship between the semantic derivation conditions obtained by transforming the semantic constraint relationships and the corresponding derivation results.

[0194] According to an embodiment of this application, the structure construction module is further configured to map objects in the ontology semantic model to nodes in the attribute graph structure, and map the attributes of objects to the node attributes of nodes; based on the factual constraint relationships between multiple objects represented in the instance data, establish a first edge between corresponding nodes, and use the information representing the specific content of the factual constraint relationships as the first edge attribute of the first edge; determine multiple objects in the ontology semantic model that are indirectly associated through intermediate objects, establish a second edge between corresponding nodes for the multiple objects that are indirectly associated through intermediate objects, and use the attributes of the intermediate objects as the second edge attribute of the second edge; and construct an attribute graph structure based on the first edge and the second edge.

[0195] According to an embodiment of this application, the structure construction module is further configured to determine node combinations in the attribute graph structure that satisfy at least a two-hop path structure; the node combination includes a source node, an intermediate node, and a target node; the source node is connected to the intermediate node through a first connecting edge, and the intermediate node is connected to the target node through a second connecting edge; the intermediate node corresponds to an intermediate object; it is determined whether the intermediate node satisfies the replacement condition; the replacement condition includes at least: the number of incoming edges and outgoing edges of the intermediate node are preset values, and the attributes of the intermediate node are all scalar types; for node combinations that satisfy the replacement condition, multiple node combinations are grouped based on the identifier of the source node, the identifier of the target node, and the edge types of the first and second connecting edges to obtain a target node combination; multiple node combinations within the same target node combination have the same source node and the same target node; for multiple intermediate nodes within each group, an aggregated attribute value is generated based on the attribute values ​​of the multiple intermediate nodes and the corresponding data source credibility; a second edge is established between the source node and the target node corresponding to each group, and the aggregated attribute value is used as the second edge attribute of the second edge.

[0196] According to an embodiment of this application, the compilation and generation module is further configured to parse semantic constraint relationships to obtain multiple semantic derivation rules; each semantic derivation rule includes derivation conditions and derivation results; generate a target expression based on the multiple semantic derivation rules; compile the target expression into an executable operator to generate first data based on the executable operator; wherein, generating the target expression based on the multiple semantic derivation rules includes a combination of at least one or more of the following: for each semantic derivation rule, converting the derivation conditions into graph pattern matching expressions based on the node type, edge type, and node attributes of the attribute graph structure; graph pattern matching The expression is used to find node and edge combinations that satisfy the derivation conditions in the attribute graph structure. For each semantic derivation rule, the derivation result is converted into a derivation result generation expression based on the node type, edge type, and node attributes of the attribute graph structure. The derivation result generation expression describes the new node labels, new edges, or new attribute values ​​to be generated. For each semantic derivation rule, a lineage record construction expression is generated. The lineage record construction expression describes how to establish a tracing association between the derivation result generated by the derivation result generation expression and the node and edge combinations matched by the graph pattern matching expression.

[0197] According to an embodiment of this application, the structure construction module is further configured to collect query statistics and change statistics for each semantic inference rule; the query statistics represent the number of queries that trigger the semantic inference corresponding to the semantic inference rule; the change statistics represent the number of changes to nodes or edges that affect the inference conditions of the semantic inference rule; based on the query statistics and change statistics, the materialized storage strategy cost and the query rewriting strategy cost are calculated respectively; the materialized storage strategy cost represents the sum of computing resources and storage resources required to pre-execute the semantic inference rule and persistently store the inference result; the query rewriting strategy cost represents the sum of computing resources required to dynamically calculate the inference result during query execution; when the materialized storage strategy cost is less than or equal to the query rewriting strategy cost, a materialized storage execution operator is generated; the materialized storage execution operator is used to write the inference result into the materialized storage area of ​​the attribute graph structure; when the materialized storage strategy cost is greater than the query rewriting strategy cost, a query rewriting execution operator is generated; the query rewriting execution operator is used to dynamically execute the inference logic of the semantic inference rule when a query request is received.

[0198] According to an embodiment of this application, the structure construction module is further configured to extract multiple mapping relationships from the first data; determine the dependency relationships between the multiple mapping relationships based on the derivation conditions and derivation results of the semantic derivation rules corresponding to the mapping relationships; the dependency relationship represents the association relationship where the derivation result of the first mapping relationship is used as the derivation condition of the second mapping relationship; determine the execution order of the multiple mapping relationships according to the dependency relationships; the execution order satisfies that the dependent mapping relationship is executed before the dependent mapping relationship; and generate the second data based on the dependency relationships and the execution order.

[0199] According to an embodiment of this application, the structure construction module is further configured to obtain changed data in response to data change operations on nodes or edges in the attribute graph structure; the changed data includes a change type and a change object identifier; the change type includes insertion, update, or deletion; based on the change object identifier, determine the target semantic derivation rule affected by the data change operation; the target semantic derivation rule is a semantic derivation rule whose derivation conditions involve the change object identifier; when the change type is insertion or update, perform incremental derivation for the target semantic derivation rule, generate incremental derivation results, and write the incremental derivation results into the attribute graph structure; when the change type is deletion, for the stored derivation results corresponding to the target semantic derivation rule, detect whether there is an alternative derivation path; the alternative derivation path is a derivation path that can still derive the same conclusion based on other nodes or edges in the attribute graph structure other than the changed object; if there is no alternative derivation path, delete the corresponding stored derivation results from the attribute graph structure; if there is an alternative derivation path, retain the corresponding stored derivation results.

[0200] Figure 5 This is a schematic diagram of the physical structure of an electronic device provided in an embodiment of this application, such as... Figure 5As shown, the electronic device may include a processor 510, a communications interface 520, a memory 530, and a communication bus 540. The processor 510, communications interface 520, and memory 530 communicate with each other via the communication bus 540. The processor 510 can call logical instructions from the memory 530 to execute a method for constructing an attribute graph structure.

[0201] Furthermore, the logical instructions in the aforementioned memory 530 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0202] On the other hand, this application also provides a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer is able to execute the method for constructing the attribute graph structure provided by the above methods.

[0203] In another aspect, this application also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method for constructing the attribute graph structure provided by the above methods.

[0204] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0205] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.

[0206] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application.

Claims

1. A method for constructing an attribute graph structure, characterized in that, include: Obtain the ontology semantic model and the corresponding instance data; The ontology semantic model includes multiple objects, the attributes of the objects, and the semantic constraint relationships between the multiple objects; The instance data represents the factual constraint relationships between the multiple objects; An attribute graph structure is constructed based on the ontology semantic model and the instance data; the attribute graph structure includes multiple nodes; the multiple nodes correspond to the multiple objects; the multiple nodes are connected by a first edge and a second edge; the first edge has a first edge attribute representing the factual constraint relationship; The second edge connects the nodes among the plurality of nodes that are associated through an intermediate object; the second edge has a second edge attribute that characterizes the attributes of the intermediate object; Compiled data is generated based on the attribute graph structure and the semantic constraint relationships; the compiled data is used to perform semantic reasoning on the attribute graph structure. The compiled data includes at least first data, which represents the mapping relationship between the semantic derivation conditions obtained by the semantic constraint relationship and the corresponding derivation results.

2. The method according to claim 1, characterized in that, The construction of the attribute graph structure based on the ontology semantic model and the instance data includes: The objects in the ontology semantic model are mapped to nodes in the attribute graph structure, and the attributes of the objects are mapped to the node attributes of the nodes. Based on the factual constraint relationships between the multiple objects represented in the instance data, the first edge is established between the corresponding nodes, and the information representing the specific content of the factual constraint relationship is used as the first edge attribute of the first edge. Identify multiple objects in the ontology semantic model that are indirectly associated through an intermediate object. For the multiple objects that are indirectly associated through an intermediate object, establish a second edge between the corresponding nodes and use the attributes of the intermediate object as the second edge attributes of the second edge. The attribute graph structure is constructed based on the first edge and the second edge.

3. The method according to claim 2, characterized in that, The step of determining multiple objects indirectly associated through intermediate objects in the ontology semantic model, and establishing a second edge between corresponding nodes for the multiple objects indirectly associated through intermediate objects, includes: Determine the node combination in the attribute graph structure that satisfies at least a two-hop path structure; the node combination includes a source node, an intermediate node, and a target node; the source node is connected to the intermediate node through a first connecting edge, and the intermediate node is connected to the target node through a second connecting edge; the intermediate node corresponds to the intermediate object; Determine whether the intermediate node meets the replacement condition; the replacement condition includes at least the following: the number of incoming edges and the number of outgoing edges of the intermediate node are preset values, and the attributes of the intermediate node are all scalar types; For node combinations that satisfy the replaceable condition, multiple node combinations are grouped to obtain target node combinations based on the identifier of the source node, the identifier of the target node, and the edge types of the first connecting edge and the second connecting edge; multiple node combinations within the same target node combination have the same source node and the same target node; For each group of multiple intermediate nodes, aggregated attribute values ​​are generated based on the attribute values ​​of the multiple intermediate nodes and the corresponding data source credibility. The second edge is established between the source node and the target node corresponding to each group, and the aggregation attribute value is used as the second edge attribute of the second edge.

4. The method according to claim 1, characterized in that, The generation of first data based on the attribute graph structure and the semantic constraint relationship includes: The semantic constraint relationship is parsed to obtain multiple semantic derivation rules; each semantic derivation rule includes a derivation condition and a derivation result; Generate the target expression based on the multiple semantic derivation rules; The target expression is compiled into an executable operator to generate first data based on the executable operator; The step of generating the target expression based on the multiple semantic derivation rules includes a combination of at least one or more of the following: For each semantic derivation rule, based on the node type, edge type, and node attributes of the attribute graph structure, the derivation condition is converted into a graph pattern matching expression; the graph pattern matching expression is used to find node combinations and edge combinations that satisfy the derivation condition in the attribute graph structure. For each of the semantic derivation rules, based on the node type, edge type, and node attributes of the attribute graph structure, the derivation result is converted into a derivation result generation expression; the derivation result generation expression is used to describe the new node label, new edge, or new attribute value to be generated. For each of the semantic derivation rules, a lineage record construction expression is generated; the lineage record construction expression is used to describe how to establish a tracing association between the derivation result generated by the derivation result generation expression and the node combination and edge combination matched by the graph pattern matching expression.

5. The method according to claim 4, characterized in that, The step of compiling the target expression into an executable operator includes: For each of the semantic inference rules, query statistics and change statistics are collected; the query statistics represent the number of queries that trigger the semantic inference corresponding to the semantic inference rule; the change statistics represent the number of changes to the nodes or edges that affect the inference conditions of the semantic inference rule. Based on the query statistics and the change statistics, the materialized storage strategy cost and the query rewrite strategy cost are calculated respectively; the materialized storage strategy cost represents the total computing resources and storage resources required to pre-execute the semantic inference rules and persistently store the inference results; the query rewrite strategy cost represents the total computing resources required to dynamically calculate the inference results during query execution; When the cost of the materialized storage strategy is less than or equal to the cost of the query rewrite strategy, a materialized storage execution operator is generated; the materialized storage execution operator is used to write the derivation result into the materialized storage area of ​​the attribute graph structure. When the cost of the materialized storage strategy is greater than the cost of the query rewrite strategy, a query rewrite execution operator is generated; the query rewrite execution operator is used to dynamically execute the derivation logic of the semantic inference rule when a query request is received.

6. The method according to claim 1, characterized in that, The compiled data also includes second data, which represents the dependencies and execution order among multiple mapping relationships in the first data; The generation of second data based on the attribute graph structure and the semantic constraint relationship includes: Extract multiple mapping relationships from the first data; Based on the derivation conditions and derivation results of the semantic derivation rules corresponding to the mapping relationship, the dependency relationship between the multiple mapping relationships is determined; the dependency relationship represents the association relationship between the derivation result of the first mapping relationship and the derivation condition of the second mapping relationship. The execution order of the multiple mapping relationships is determined based on the dependencies; the execution order satisfies that the dependent mapping relationship is executed before the dependent mapping relationship. The second data is generated based on the dependencies and the execution order.

7. The method according to claim 1, characterized in that, The method further includes: In response to data change operations on nodes or edges in the attribute graph structure, changed data is obtained; the changed data includes a change type and a change object identifier; the change type includes insertion, update, or deletion. Based on the changed object identifier, a target semantic inference rule affected by the data change operation is determined; the target semantic inference rule is a semantic inference rule whose inference conditions involve the changed object identifier; When the change type is insertion or update, incremental derivation is performed for the target semantic derivation rule to generate incremental derivation results, and the incremental derivation results are written into the attribute graph structure; When the change type is deletion, for the stored derivation results corresponding to the target semantic derivation rule, it is detected whether there is an alternative derivation path; the alternative derivation path is a derivation path that can still derive the same conclusion based on other nodes or edges in the attribute graph structure other than the changed object; If no alternative derivation path exists, the corresponding stored derivation result is deleted from the attribute graph structure; if an alternative derivation path exists, the corresponding stored derivation result is retained.

8. A data processing method, characterized in that, include: Get the query request; Based on the attribute graph structure constructed by the method described in any one of claims 1 to 7 and the corresponding compilation data of the attribute graph structure, semantic reasoning is performed on the query request to generate a reasoning result.

9. A system for constructing an attribute graph structure, characterized in that, include: The data acquisition module is used to acquire the ontology semantic model and the corresponding instance data; The semantic model includes multiple objects, the attributes of the objects, and the semantic constraint relationships between the multiple objects; the instance data represents the factual constraint relationships between the multiple objects. A structure building module is used to construct an attribute graph structure based on the ontology semantic model and the instance data; the attribute graph structure includes multiple nodes; the multiple nodes correspond to the multiple objects; the multiple nodes are connected by a first edge and a second edge; the first edge has a first edge attribute representing the factual constraint relationship; The second edge connects the nodes among the plurality of nodes that are associated through an intermediate object; the second edge has a second edge attribute that characterizes the attributes of the intermediate object; A compilation generation module is used to generate compilation data based on the attribute graph structure and the semantic constraint relationship; the compilation data is used to perform semantic reasoning on the attribute graph structure. The compiled data includes at least first data, which represents the mapping relationship between the semantic derivation conditions obtained by the semantic constraint relationship and the corresponding derivation results.

10. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the method for constructing the attribute graph structure as described in any one of claims 1 to 8.