Information retrieval method and system for multi-hop path reasoning on sparse temporal knowledge graph

By expanding the action space through a reinforcement learning framework and a dynamic completion module, and combining it with a multi-dimensional reward function, the problem of path sparsity and reward sparsity in sparse temporal knowledge graphs is solved, improving the accuracy and interpretability of multi-hop path reasoning and addressing the information retrieval challenge in sparse environments.

CN121168656BActive Publication Date: 2026-06-19DALIAN UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
DALIAN UNIV
Filing Date
2025-09-19
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

The sparsity of paths, rewards, and data in sparse temporal knowledge graphs limits the accuracy and effectiveness of multi-hop path reasoning models in practical applications, making it difficult to find effective paths. Furthermore, the models struggle to learn user preference features in sparse environments, leading to the cold start problem in recommendation systems.

Method used

By employing a reinforcement learning framework combined with a dynamic completion module, the action space is expanded through the interaction between the agent and the retrieval environment. Historical path information is encoded by the LSTM model, and a multi-dimensional reward function is designed, including hit reward, embedding reward, and path reward, to improve the accuracy and interpretability of path selection.

Benefits of technology

It improves the recall and accuracy of information retrieval in sparse environments, enhances the interpretability of inference results, reduces dependence on interactive data, and improves the generalization ability and stability of the model in sparse scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121168656B_ABST
    Figure CN121168656B_ABST
Patent Text Reader

Abstract

This invention discloses an information retrieval method and system for multi-hop path reasoning on sparse temporal knowledge graphs, relating to the field of knowledge graphs. It involves constructing a temporal knowledge graph dataset; generating a sparse subset dataset based on the dataset to construct a sparse retrieval scenario; building a reinforcement learning framework, defining states, action space, state transition rules, and basic rewards, and simulating path exploration and navigation logic; constructing a policy network to evaluate the value of candidate actions and select the optimal action; constructing a dynamic completion module to expand the action space in conjunction with a pre-trained temporal knowledge graph embedding model; constructing a semantic scoring system with a triple reward mechanism to guide the agent in optimizing path selection; inputting a user query, the agent, guided by the policy network, performs multi-hop path search using the expanded action space, and outputs the retrieval results using a multi-dimensional reward function. This invention effectively solves the problem of data sparsity in sparse temporal knowledge graphs and improves the accuracy of information retrieval.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of knowledge graphs, specifically to an information retrieval method and system for multi-hop path reasoning on sparse temporal knowledge graphs. Background Technology

[0002] In today's digital age, information retrieval and intelligent recommendation systems have become essential tools for people to obtain information and meet their personalized needs. However, with the continuous expansion of the data processing scale and the increasing demands for semantic understanding, traditional knowledge graph technologies are gradually showing their limitations when dealing with complex data reasoning tasks involving a time dimension. Against this backdrop, Temporal Knowledge Graph Reasoning (TKGR) technology has emerged and quickly become a research hotspot in the field of information retrieval and intelligent recommendation systems.

[0003] The core objective of TKGR technology is to accurately process temporal knowledge graph data containing entities, relationships, and timestamps. By constructing an efficient inference model, it deeply models the semantic features of entities, relationships, and timestamps, embedding these features into a low-dimensional vector space. Within this low-dimensional vector space, the similarity calculation of entities and relationships can be easily performed, thereby enabling inference about complex relationships within the temporal knowledge graph. With this core mechanism, TKGR technology effectively improves the representation capability and inference efficiency of information retrieval systems for temporal data, laying the foundation for systems to process query requests containing time information and provide more accurate retrieval results.

[0004] However, in current mainstream TKGR technologies, embedding-based reasoning methods occupy an important position. While these methods have certain advantages in reasoning efficiency, they generally suffer from significant drawbacks such as poor interpretability. Because their reasoning process mainly relies on numerical computation in low-dimensional vector spaces, they cannot clearly demonstrate the path followed in the reasoning process or the chain of evidence leading to the reasoning conclusion. This makes it difficult for users to understand the source of the reasoning results, and their application is greatly limited in scenarios where the reliability of the reasoning results is highly important.

[0005] To address the insufficient interpretability of embedding-based reasoning methods and enhance the transparency and logicality of the reasoning process, researchers have further developed a multi-hop path reasoning method. This type of method cleverly transforms the TKGR task into a path search problem based on a temporal knowledge graph structure. By tracing the multi-step connections between entities, it achieves the effective propagation of semantic information within the temporal knowledge graph. During the propagation process, the reasoning path from the initial entity to the target entity is clearly presented, which not only enhances reasoning performance but, more importantly, significantly improves the interpretability of the reasoning results. This advantage is particularly significant in the field of information retrieval, enabling information retrieval systems to delve deeper into complex user intents and accurately capture the potential semantic associations between entities and relationships, thereby providing users with more relevant and interpretable search results.

[0006] From a technical classification perspective, the mainstream TKGR methods can be mainly divided into the following two categories:

[0007] Firstly, there is the embedding-based reasoning method. This type of method continues the core idea of ​​traditional knowledge graph embedding technology. By designing a specific model architecture, it learns the joint vector representation of entities, relations, and time. Based on this joint vector representation, it assigns corresponding scores to fact quadruples (i.e., quadruples containing entity pairs, relations, and timestamps) in the temporal knowledge graph to determine the validity of the fact quadruples. During model training, temporal information is effectively integrated into the model, thereby improving the model's ability to characterize knowledge at different points in time, enabling it to better adapt to reasoning tasks in temporal knowledge graphs.

[0008] Secondly, there are reasoning methods based on multi-hop paths. These methods fully utilize the graph structure characteristics of temporal knowledge graphs, performing semantic propagation and path selection through multi-step connection of nodes (i.e., entities) within the graph structure. To achieve efficient path search and reasoning, these methods typically employ advanced techniques such as graph neural networks (GNNs) to extract node and path features from the temporal knowledge graph, thereby completing the path search and reasoning task and providing clear path support for the reasoning results.

[0009] Although TKGR technology has made some research progress and shown good application prospects in information retrieval and intelligent recommendation systems, there are still many problems that need to be solved in practical applications, mainly in the following three aspects:

[0010] First, there is the problem of sparse paths. The core of multi-hop path reasoning models lies in relying on complete and valid chains of evidence for reasoning search. However, in real-world applications of temporal knowledge graphs (STKGs), due to limitations in data collection and incomplete connections between entities and relationships, the number of valid paths contained in the graph is often very small. In this situation, multi-hop path reasoning models struggle to find a reasoning path from the initial entity to the correct target entity, severely impacting the accuracy and effectiveness of reasoning and limiting the application of multi-hop path reasoning methods in real-world scenarios.

[0011] Second, there is the problem of reward sparsity. In existing multi-hop path reasoning models based on reinforcement learning, the reward mechanism is mostly designed to focus only on the hit reward of the reasoning result. That is, the agent is only rewarded when the model finds a path that can accurately reach the target entity. This single reward design leads to a low probability of the model hitting the correct target entity when reasoning on the sparse STKG, and the agent has very few opportunities to receive a reward, resulting in a serious reward sparsity problem. This problem directly affects the training effect of the reinforcement learning agent, making it difficult for the agent to continuously optimize the path search strategy through training, and thus affecting the performance of the entire reasoning model.

[0012] Third, there is the issue of data sparsity. In practical applications of information retrieval and intelligent recommendation systems, the interaction information between users and objects (such as goods, information resources, etc.) is often very sparse. On the one hand, sparse interaction data makes it difficult for the model to fully learn the user's preference features and the object's attribute features during training, which can easily lead to overfitting and reduce the model's generalization ability on new data. On the other hand, the lack of sufficient interaction data makes it difficult for intelligent recommendation systems to accurately uncover users' potential needs and make effective recommendations, resulting in the cold start problem. That is, for new users or new objects, the system cannot make recommendations based on historical interaction data, which seriously affects the user experience and the service quality of the system. Summary of the Invention

[0013] The purpose of this invention is to propose an information retrieval method and system for multi-hop path reasoning on sparse temporal knowledge graphs, so as to improve the recall, accuracy and interpretability of reasoning results in information retrieval in sparse environments.

[0014] According to a first aspect of the present disclosure, an information retrieval method for multi-hop path reasoning on a sparse temporal knowledge graph is provided, comprising the following steps:

[0015] Construct a temporal knowledge graph dataset containing temporal information, and set up a retrieval task to retrieve target entities from the temporal knowledge graph that match the semantics of the user query;

[0016] Sparse retrieval scenarios are constructed by generating sparse subset datasets based on temporal knowledge graph datasets.

[0017] Establish a reinforcement learning framework, define the state, action space, state transition rules and basic rewards, and simulate path exploration and navigation logic;

[0018] By incorporating a policy network with an attention mechanism, the value of candidate actions is evaluated and the optimal action is selected.

[0019] A dynamic completion module is constructed, which is combined with a pre-trained temporal knowledge graph embedding model to expand the action space in sparse retrieval scenarios;

[0020] Design a multi-dimensional reward function that integrates hit rewards, embedding rewards, and path rewards to guide the agent to optimize path selection;

[0021] When a user query is input, the agent, guided by the policy network, performs multi-hop path search using the expanded action space and outputs the search results in conjunction with a multi-dimensional reward function.

[0022] In one embodiment, within the reinforcement learning framework, the state of the agent after completing the t-hop path search. By querying relationships Current Entity Historical search paths Query the timestamp information corresponding to the facts definition, Current state An action Using the temporal knowledge graph (TKG) and the current entity Related entities , and Relationship The time when the facts occurred Definition, action for The complete set of actions in the current state forms the action space. This includes all possible retrieval paths starting from the current retrieval point; the state transition rule is that if the agent chooses an action... For the next action, the state Transition to a new state The state transition sets the maximum number of hops to T. If the agent has not reached the correct target entity by t=T, the state transition will proceed to state T. Termination, at this time The basic reward is as follows: if the agent eventually reaches the correct target entity through a multi-hop inference path, the agent is given a reward of 1; otherwise, if the agent fails to reach the correct target entity, the agent is given a reward of 0.

[0023] In one embodiment, if the agent has located the correct target entity before reaching the maximum hop count limit, it must remain at the current entity node; a self-loop action is configured for each state, the expression of which is: If the current entity equals the target entity, then... Furthermore, if the number of hops in the search is within a given limit, the agent chooses to stay at the current node. The self-loop action acts as a stop operation, ensuring that the agent does not continue searching after finding the target entity, thereby avoiding invalid path expansion.

[0024] In one embodiment, the policy network that incorporates the attention mechanism, in order to enable the agent to start from the initial entity via the current entity Reach the target entity To determine the feasibility of each action, this invention combines attention and LSTM to construct a policy network for efficient agent reasoning. In this policy network, historical reasoning path information... Represented as:

[0025]

[0026] in The hidden state is represented by a combination of timestamps and relationships, containing past observation and action sequences. Historical reasoning path information is encoded as a continuous vector using an LSTM model. For an initial time t=0, a zero-padded vector is used as input to the initial hidden state, and the LSTM outputs the initial hidden state based on its weights. ,as follows:

[0027]

[0028] in, and These are the initial relationship and the initial timestamp, used to correlate with the initial entity. The initial startup action is formed, and the dynamic update process of LSTM is represented as follows:

[0029]

[0030] The agent in the current state Choose the action Using the concept of time difference modeling, the previous action is calculated. timestamp With the current action timestamp The difference Taking the absolute value yields the embedding of the time-relative displacement, as shown in the formula:

[0031]

[0032] action The embedded information is updated as follows:

[0033]

[0034] Current state All possible actions in the lower action space Stacking to form a matrix ,in Represents the real number field. The policy network represents the dimensions of entity vectors, relation vectors, and timestamp vectors. The definition of is:

[0035]

[0036] in, For the updated policy network; This represents the original policy network; m is a vector whose elements come from the set This indicates whether to select the corresponding action. In state s t The number of possible actions; It is a small positive number used to ensure that the updated policy is not completely zero, i.e., to prevent policy collapse; m i ~Bernoulli(1− ): indicates m i It is sampled from the Bernoulli distribution, which has a success probability of 1−β. The Bernoulli distribution is a discrete probability distribution with only two possible outcomes. Let m be the parameter of the Bernoulli distribution. i The probability is 1.

[0037] In one embodiment, the dynamic completion module uses a weight matrix. Current state Mapping to the space where relation-timestamp combinations match, the attention score for each relation-timestamp combination is obtained through a dot product; the attention scores are then normalized.

[0038]

[0039] in, This is the weight matrix. Represents the relation r p Attention score combined with timestamp;

[0040] Combine high-probability relationships with timestamps The input is fed into the pre-trained sparse temporal knowledge graph model, i.e., the TKGE model, to obtain the corresponding candidate tail entities. The probability of:

[0041]

[0042] in The current state The entity at time, E is the set of all candidate entities, This represents a pre-trained TKGE model; the top k entities with the highest probabilities are selected as target entities and combined with x high-probability relation-timestamp pairs. Generate a potential action space for the intelligent agent. .

[0043] In one embodiment, the potential action The probabilities are as follows:

[0044]

[0045] Use parameters Control the proportion of actions to be added, with parameter M representing the maximum number of additional actions and the size of the potential action space. Defined as:

[0046]

[0047] in This indicates the rounding up operation. Refers to the current state The size of the existing action space;

[0048] For each state The number of relationship-timestamp combinations selected, x, is as follows:

[0049]

[0050] The action space update process can be represented as:

[0051]

[0052] in For the action space, For state Existing action space.

[0053] In one embodiment, the hit reward for:

[0054]

[0055] Where 1() is an indicator function, The source entity being queried. The relationship to the query For the target entity, For the timestamp of the query, Temporal knowledge graph; returns 1 when the input is true; returns 0 when the input is false.

[0056] Embedded reward number for:

[0057]

[0058] It is a scoring function obtained based on an embedded sparse temporal knowledge graph model. When the target entity does not match the answer, the agent... Calculate quadruples The reward;

[0059] The path reward for:

[0060]

[0061] in, This represents a similarity evaluation function based on Euclidean distance;

[0062] Ultimately, the multi-dimensional reward function for:

[0063]

[0064] in, It is a custom hyperparameter, and .

[0065] According to a second aspect of the present disclosure, an information retrieval system for multi-hop path reasoning on a sparse temporal knowledge graph is provided, comprising:

[0066] The task definition module constructs a temporal knowledge graph dataset containing temporal information and sets up a retrieval task to retrieve target entities from the temporal knowledge graph that match the semantics of the user query.

[0067] The sparse retrieval scenario modeling module generates sparse subset datasets based on temporal knowledge graph datasets to construct sparse retrieval scenarios.

[0068] The learning framework building module establishes a reinforcement learning framework, defines the state, action space, state transition rules and basic rewards, and simulates path exploration and navigation logic.

[0069] The policy network construction module evaluates the value of candidate actions and selects the optimal action by incorporating a policy network with an attention mechanism.

[0070] The dynamic completion module is constructed by combining a pre-trained temporal knowledge graph embedding model to expand the action space in sparse retrieval scenarios.

[0071] The reward function design module designs a multi-dimensional reward function that integrates hit rewards, embedding rewards, and path rewards to guide the agent in optimizing path selection.

[0072] The retrieval and reasoning module takes a user query as input, and the agent, guided by the policy network, performs multi-hop path search using the expanded action space, and outputs the retrieval results in combination with a multi-dimensional reward function.

[0073] According to a third aspect of the present disclosure, an electronic device is provided, including a memory, a processor, and a computer program stored in the memory and running on it, wherein the processor executes the program to implement the information retrieval method for multi-hop path reasoning on a sparse temporal knowledge graph.

[0074] According to a fourth aspect of the present disclosure, a computer-readable storage medium is provided having a computer program stored thereon that, when executed by a processor, implements the information retrieval method for multi-hop path reasoning on a sparse temporal knowledge graph.

[0075] The advantages of the above technical solutions adopted in this invention compared with the prior art are as follows:

[0076] 1. This invention employs a reinforcement learning inference framework, enabling path exploration through dynamic interaction between the agent and the retrieval environment, and innovatively introduces a dynamic completion module. This module addresses the problem of insufficient action space in sparse temporal knowledge graphs by expanding the agent's exploration scope and generating potential effective paths, avoiding retrieval interruptions due to missing paths. Simultaneously, an LSTM model is used to encode historical retrieval path information. This model combines long short-term memory capabilities with an attention mechanism, capturing long-term temporal dependencies in path sequences while focusing on high-value path segments. It accurately evaluates the value of each candidate path (action) in the current query state and selects the optimal path, effectively guiding the system to efficiently locate target entities and improving retrieval efficiency and accuracy.

[0077] 2. This invention constructs a semantic scoring system with a triple reward mechanism integrating hit reward, embedding reward, and path reward. The hit reward directly reflects the hit result of the target entity, ensuring the guidance of the retrieval target; the embedding reward evaluates the semantic matching degree between entities based on a temporal knowledge graph embedding model, providing reasonable value judgments for candidate entities that are not directly hit; and the path reward considers the fit between historical paths and query requirements from the perspective of path semantic integrity. The three mechanisms work synergistically to overcome the limitations of traditional single-reward mechanisms with limited evaluation dimensions, more comprehensively and accurately measuring the effectiveness of retrieval paths, guiding the agent to learn high-quality reasoning paths, and further improving the reliability of retrieval results.

[0078] 3. This invention introduces a temporal knowledge graph, which not only includes the relationships between entities but also integrates temporal information, providing rich semantic information beyond purely interactive data. During information retrieval, the system can rely on the semantic relationships between entities in the temporal knowledge graph (such as entity attributes, relationship types, and temporal dimensions) to reduce dependence on single interactive data. This avoids the model overlearning local data features due to the sparsity of purely interactive data, effectively alleviating overfitting and improving the model's generalization ability and stability in different sparse retrieval scenarios. Attached Figure Description

[0079] The accompanying drawings, which form part of this application, are used to provide a further understanding of this application. The illustrative embodiments of this application and their descriptions are used to explain this application and do not constitute an undue limitation of this application.

[0080] Figure 1 This is a flowchart illustrating the principle of an information retrieval method for multi-hop path reasoning on sparse temporal knowledge graphs. Detailed Implementation

[0081] The present disclosure will be further described below with reference to the accompanying drawings and embodiments.

[0082] It should be noted that the following detailed descriptions are illustrative and intended to provide further explanation of this application. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains.

[0083] It should be noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the exemplary embodiments according to this application. As used herein, the singular form is intended to include the plural form as well, unless the context clearly indicates otherwise. Furthermore, it should be understood that when the terms "comprising" and / or "including" are used in this specification, they indicate the presence of features, steps, operations, devices, components, and / or combinations thereof.

[0084] It should be noted that the flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of methods and systems according to various embodiments of this disclosure. It should be noted that each block in a flowchart or block diagram may represent a module, segment, or portion of code, which may include one or more executable instructions for implementing the logical functions specified in the various embodiments. It should also be noted that in some alternative implementations, the functions marked in the blocks may occur in a different order than that shown in the drawings. For example, two consecutively represented blocks may actually be executed substantially in parallel, or they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the flowcharts and / or block diagrams, and combinations of blocks in the flowcharts and / or block diagrams, may be implemented using a dedicated hardware-based system that performs the specified functions or operations, or using a combination of dedicated hardware and computer instructions.

[0085] Example 1:

[0086] This embodiment provides an information retrieval method for multi-hop path reasoning on sparse temporal knowledge graphs, including the following steps:

[0087] S1. Construct a temporal knowledge graph dataset containing temporal information, and set up a retrieval task to retrieve target entities from the temporal knowledge graph that match the semantics of the user query;

[0088] Specifically, this embodiment uses two subsets of the Integrated Crisis Early Warning System (ICEWS) series: ICEWS14 and ICEWS05-15. The ICEWS dataset is divided into multiple subsets derived from automatically captured data from digital news media and social media, primarily used for early warning. ICEWS14 contains all facts recorded in 2014, while ICEWS05-15 contains all facts recorded from 2005 to 2015. The retrieval task is to retrieve the target entities most relevant to a given query from the TKG (Trusted Knowledge Group).

[0089] S2. Generate sparse subset datasets based on temporal knowledge graph datasets and construct sparse retrieval scenarios;

[0090] To analyze the reasoning performance on temporal knowledge graphs with different sparsity levels, this invention constructs six independent sparse subsets based on ICEWS14 and ICEWS05-15 by randomly retaining different proportions of data. These subsets with different sparsity rates are used to construct sparse knowledge graph retrieval scenarios to simulate fact queries under a "cold start" environment and evaluate the system's knowledge retrieval capabilities under sparse data conditions. For ICEWS14, the retention proportions are 2%, 3%, and 5%, respectively, while for ICEWS05-15, the retention proportions are 0.3%, 0.4%, and 0.5%, respectively. These datasets are named ICEWS14%-2%, ICEWS14%-3%, ICEWS14%-5%, ICEWS05-15%-0.3%, ICEWS05-15-0.4%, and ICEWS05-15%-0.5%.

[0091] S3. Establish a reinforcement learning framework, define the state, action space, state transition rules and basic rewards, and simulate path exploration and navigation logic;

[0092] Specifically, states, actions, and rewards are all modeled with the goal of correctly retrieving information. Through the interaction between the agent and its environment, the agent learns the optimal strategy under a given objective, thereby completing the reasoning task. In multi-hop path reasoning, the agent selects an action based on its current state, transitions to a new state through state transitions, and receives corresponding rewards based on the task objective, ultimately enabling the agent to learn an efficient reasoning path.

[0093] The state of the agent after completing the t-hop path search By querying relationships Current Entity Historical search paths Query the timestamp information corresponding to the facts Definition, State ;

[0094] Current status An action is generated by the temporal knowledge graph (TKG) and is related to the current entity. Related entities , and Relationship The time when the facts occurred Defined for Before reaching the maximum hop count limit, if the agent finds the correct target entity, it remains at the current entity node; furthermore, for each state... Add a self-loop action, represented as If the current entity equals the target entity, then... Furthermore, if the number of hops in the search is within a given limit, the agent chooses to stay at the current node. The self-loop action acts as a stopping operation, ensuring that the agent does not continue searching after finding the target entity, thereby avoiding invalid path expansion.

[0095] The complete set of actions in the current state forms the action space. This includes all possible search paths originating from the current search point;

[0096] State transition rules, if the agent chooses an action For the next action, the state Transition to a new state In this embodiment, the maximum number of hops for state transition is limited to T; if the agent has not reached the correct target entity by t=T, the state transition will proceed to state T. Termination, at this time ;

[0097] The basic reward is as follows: if the agent eventually reaches the correct target entity through a multi-hop inference path, the agent is given a reward of 1; otherwise, if the agent fails to reach the correct target entity, the agent is given a reward of 0.

[0098] Historical search paths It uses an attention-based mechanism and LSTM to encode the path information of the first t hops to capture and process long-term dependencies in the input sequence and stores the information in the historical search path. middle.

[0099] S4. By incorporating a policy network with an attention mechanism, evaluate the value of candidate actions and select the optimal action;

[0100] In order to enable the agent to start from the initial entity via the current entity Reach the target entity To determine the feasibility of each action, this invention combines attention and LSTM to construct a policy network for efficient agent reasoning. In this policy network, historical reasoning path information... Represented as:

[0101]

[0102] in The hidden state is represented by a combination of timestamps and relationships, containing past observation and action sequences. Historical reasoning path information is encoded as a continuous vector using an LSTM model. For an initial time t=0, a zero-padded vector is used as input to the initial hidden state, and the LSTM outputs the initial hidden state based on its weights. ,as follows:

[0103]

[0104] in, and These are the initial relationship and the initial timestamp, used to correlate with the initial entity. The initial startup action is formed, and the dynamic update process of LSTM is represented as follows:

[0105]

[0106] The agent in the current state Choose the action Using the concept of time difference modeling, the previous action is calculated. timestamp With the current action timestamp The difference The embedding of the time-relative displacement is calculated by taking the absolute value, and the formula is:

[0107]

[0108] action The embedded information is updated as follows:

[0109]

[0110] Current state All possible actions in the lower action space Stacking to form a matrix ,in Represents the real number field. The policy network represents the dimensions of entity vectors, relation vectors, and timestamp vectors. The definition of is:

[0111]

[0112] in Representing state Concatenation matrix in vector space and Two attention parameter matrices are used to extract state features and calculate action scores. Since existing multi-hop path reasoning on sparse temporal knowledge graphs based on reinforcement learning typically relies on terminal rewards to guide the search, there are more erroneous paths than correct paths. Erroneous paths are often discovered first, and subsequent path exploration may become increasingly biased towards erroneous paths. Therefore, the policy network of this invention uses the ActionDropout method. By randomly masking some outgoing edges of the agent during sampling, it ensures that all rewarded trajectories receive approximately the same weight. The updated policy network is defined as follows:

[0113]

[0114] in, For the updated policy network; This represents the original policy network; m is a vector whose elements come from the set This indicates whether to select the corresponding action. In state s t The number of possible actions; A small positive number is used to ensure that the updated policy is not completely zero, i.e., to prevent policy collapse; m i ~Bernoulli(1− ): indicates m i It is sampled from the Bernoulli distribution, which has a success probability of 1−β. The Bernoulli distribution is a discrete probability distribution with only two possible outcomes. Let m be the parameter of the Bernoulli distribution. i The probability of 1 indicates the degree of conservatism in the control policy update.

[0115] S5. Construct a dynamic completion module and combine it with a pre-trained temporal knowledge graph embedding model to expand the action space in sparse retrieval scenarios;

[0116] When querying user queries, the dynamic completion module utilizes the embedding model to generate possible extended candidate paths, essentially automatically completing semantic fragments of the query not explicitly provided by the user in a search engine, thereby improving retrieval recall and enhancing the system's "information discovery" capability. The embedding model is introduced into a multi-hop path reasoning model based on a reinforcement learning-based sparse temporal knowledge graph. Combining the global reasoning capability of the embedding model with the path search capability of the multi-hop reasoning model guides the overall model's learning, ensuring the model's reasoning accuracy while improving the interpretability of the reasoning results. The agent in state... If the existing action space is empty due to path sparsity, the target relationship and timestamp in the next action are unknown, making it impossible to directly obtain the probability distribution of the target entity. In this case, directly calculating the probability of all candidate entities under the candidate timestamp and relationship would be very time-consuming. Therefore, the dynamic completion module of this invention uses an approximate pruning strategy, where the agent determines the probability of the target entity based on the current state. Select some high-probability relation-timestamp combinations from the potential action space, and use... The model is first represented by a weight matrix. Current state The mapping is done to a space that matches relation-timestamp combinations; then, the attention score for each relation-timestamp combination is calculated using a dot product; finally, the attention scores are normalized.

[0117]

[0118] This is the weight matrix, used to weight the current state s. t Mapped to a space that matches relationships and timestamps. This indicates that the weight matrix Watt and the current state s are used to represent the weight matrix Watt and the current state s. t The calculated AND relation r p Attention score combined with timestamp;

[0119] Combine high-probability relationships with timestamps The input is fed into a pre-trained sparse temporal knowledge graph model based on an embedding model, namely the TKGE model, to obtain the corresponding candidate tail entities. The probability of:

[0120]

[0121] in The current state The entity at time, E is the set of all candidate entities, This represents a pre-trained TKGE model; the top k entities with the highest probabilities are selected as target entities and combined with x high-probability relation-timestamp pairs. Generate a potential action space for the intelligent agent. ;

[0122] Through the dynamic completion strategy described above, the agent can flexibly generate a series of high-quality latent actions based on the current state using a pre-trained TKGE model, thereby enriching the agent's action space. Ultimately, the latent actions... The probability calculation formula is as follows:

[0123]

[0124] To flexibly control the size of the potential motion space, parameters are used. The proportion of actions to be added is controlled by the parameter M, which represents the maximum number of additional actions. Therefore, the potential action space is [defined as a specific size or number of actions]. Defined as:

[0125]

[0126] in This indicates the rounding up operation. Refers to the current state Given the size of the existing action space, and using the aforementioned approximate pruning strategy, the top k high-probability target entities are selected for each relation-timestamp combination, where k is a custom hyperparameter; therefore, to generate a number of... The potential action space for each state The number of relationship-timestamp combinations selected, x, is as follows:

[0127]

[0128] In summary, during the reasoning process of a multi-hop path reasoning model on a sparse temporal knowledge graph based on reinforcement learning, this invention employs a dynamic completion method. This method relies on an approximate pruning strategy and a pre-trained TKGE model to obtain the probability of potential actions. For each state Create a group with a quantity of High-quality potential actions These additional actions are merged with the existing action space, enriching the agent's action space. This update process can be represented as:

[0129]

[0130] This is the action space.

[0131] S6. Design a multi-dimensional reward function that integrates hit rewards, embedding rewards, and path rewards to guide the agent to optimize path selection;

[0132] In addition to the common hit reward function, an embedding-based reward function and a path-based reward function are added. The embedding-based reward function measures the semantic matching degree between entities, while the path-based reward function measures the semantic integrity of the retrieval path, which comprehensively reflects the information relevance modeling of the entire chain from "search-path navigation-hit".

[0133] Hit reward :

[0134]

[0135] Here, 1() is an indicator function that returns 1 when the input is true and 0 when the input is false. Besides the reward sparsity problem, sparse temporal knowledge graphs are inherently incomplete, and the binary reward method for hit rewards gives the same reward to false negative search results as to true negative search results. To alleviate the above problems, an embedding-based reward function is added. Defined as:

[0136]

[0137] It is a scoring function obtained based on an embedded sparse temporal knowledge graph model. When the target entity does not match the answer, the agent... Calculate quadruples The reward;

[0138] Furthermore, to further alleviate the reward sparsity problem, this invention uses a path-based reward function. Its expression is:

[0139]

[0140] in, This represents a similarity evaluation function based on Euclidean distance. By introducing a path-based reward function, the model can more accurately consider the relationship between embedded information and the entire reasoning path, further improving the reward sparsity problem and enhancing the performance of multi-hop path reasoning models when dealing with sparse temporal knowledge graphs.

[0141] Ultimately, a new reward function is obtained. :

[0142]

[0143] in, and These represent the hit reward function, the embedding-based reward function, and the path-based reward function, respectively. It is a custom hyperparameter, and By combining the three reward functions mentioned above, the model can obtain more comprehensive and accurate rewards, thereby effectively improving the quality of the inference path.

[0144] When a user query is input, the agent, guided by the policy network, performs multi-hop path search using the expanded action space and outputs the search results in conjunction with a multi-dimensional reward function.

[0145] Example 2:

[0146] This embodiment provides an information retrieval system for multi-hop path reasoning on a sparse temporal knowledge graph, including:

[0147] The task definition module constructs a temporal knowledge graph dataset containing temporal information and sets up a retrieval task to retrieve target entities from the temporal knowledge graph that match the semantics of the user query.

[0148] The sparse retrieval scenario modeling module generates sparse subset datasets based on temporal knowledge graph datasets to construct sparse retrieval scenarios.

[0149] The learning framework building module establishes a reinforcement learning framework, defines the state, action space, state transition rules and basic rewards, and simulates path exploration and navigation logic.

[0150] The policy network construction module evaluates the value of candidate actions and selects the optimal action by incorporating a policy network with an attention mechanism.

[0151] The dynamic completion module is constructed by combining a pre-trained temporal knowledge graph embedding model to expand the action space in sparse retrieval scenarios.

[0152] The reward function design module designs a multi-dimensional reward function that integrates hit rewards, embedding rewards, and path rewards to guide the agent in optimizing path selection.

[0153] The retrieval and reasoning module takes a user query as input, and the agent, guided by the policy network, performs multi-hop path search using the expanded action space, and outputs the retrieval results in combination with a multi-dimensional reward function.

[0154] Example 3:

[0155] An electronic device includes a memory, a processor, and a computer program stored in the memory and running thereon. When the processor executes the program, it implements the aforementioned information retrieval method for multi-hop path reasoning on a sparse temporal knowledge graph, comprising:

[0156] Construct a temporal knowledge graph dataset containing temporal information, and set up a retrieval task to retrieve target entities from the temporal knowledge graph that match the semantics of the user query;

[0157] Sparse retrieval scenarios are constructed by generating sparse subset datasets based on temporal knowledge graph datasets.

[0158] Establish a reinforcement learning framework, define the state, action space, state transition rules and basic rewards, and simulate path exploration and navigation logic;

[0159] By incorporating a policy network with an attention mechanism, the value of candidate actions is evaluated and the optimal action is selected.

[0160] A dynamic completion module is constructed, which is combined with a pre-trained temporal knowledge graph embedding model to expand the action space in sparse retrieval scenarios;

[0161] Design a multi-dimensional reward function that integrates hit rewards, embedding rewards, and path rewards to guide the agent to optimize path selection;

[0162] When a user query is input, the agent, guided by the policy network, performs multi-hop path search using the expanded action space and outputs the search results in conjunction with a multi-dimensional reward function.

[0163] Example 4:

[0164] A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the aforementioned information retrieval method for multi-hop path reasoning on a sparse temporal knowledge graph, comprising:

[0165] Construct a temporal knowledge graph dataset containing temporal information, and set up a retrieval task to retrieve target entities from the temporal knowledge graph that match the semantics of the user query;

[0166] Sparse retrieval scenarios are constructed by generating sparse subset datasets based on temporal knowledge graph datasets.

[0167] Establish a reinforcement learning framework, define the state, action space, state transition rules and basic rewards, and simulate path exploration and navigation logic;

[0168] By incorporating a policy network with an attention mechanism, the value of candidate actions is evaluated and the optimal action is selected.

[0169] A dynamic completion module is constructed, which is combined with a pre-trained temporal knowledge graph embedding model to expand the action space in sparse retrieval scenarios;

[0170] Design a multi-dimensional reward function that integrates hit rewards, embedding rewards, and path rewards to guide the agent to optimize path selection;

[0171] When a user query is input, the agent, guided by the policy network, performs multi-hop path search using the expanded action space and outputs the search results in conjunction with a multi-dimensional reward function.

[0172] Those skilled in the art will understand that the modules or steps described above can be implemented using general-purpose computer devices. Optionally, they can be implemented using computer-executable program code, which can then be stored in a storage device for execution by a computer device. Alternatively, they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. This disclosure is not limited to any particular combination of hardware and software.

[0173] The above description is merely a preferred embodiment of this application and is not intended to limit this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

[0174] While the specific embodiments of this disclosure have been described above in conjunction with the accompanying drawings, this is not intended to limit the scope of protection of this disclosure. Those skilled in the art should understand that various modifications or variations that can be made by those skilled in the art without creative effort based on the technical solutions of this disclosure are still within the scope of protection of this disclosure.

Claims

1. An information retrieval method for multi-hop path reasoning on sparse temporal knowledge graphs, characterized in that, Includes the following steps: Construct a temporal knowledge graph dataset containing temporal information, and set up a retrieval task to retrieve target entities from the temporal knowledge graph that match the semantics of the user query; Sparse retrieval scenarios are constructed by generating sparse subset datasets based on temporal knowledge graph datasets. Establish a reinforcement learning framework, define the state, action space, state transition rules and basic rewards, and simulate path exploration and navigation logic; By incorporating a policy network with an attention mechanism, the value of candidate actions is evaluated and the optimal action is selected. A dynamic completion module is constructed, which is combined with a pre-trained temporal knowledge graph embedding model to expand the action space in sparse retrieval scenarios; Design a multi-dimensional reward function that integrates hit rewards, embedding rewards, and path rewards to guide the agent to optimize path selection; When a user query is input, the agent, guided by the policy network, performs multi-hop path search using the expanded action space and outputs the search results in combination with a multi-dimensional reward function. The dynamic completion module uses a weight matrix. Current state The mapping is done to a space that matches the relation-timestamp combinations; then, the attention score for each relation-timestamp combination is obtained through a dot product; finally, the attention scores are normalized. in, This is the weight matrix. Representing relation r p Attention score combined with timestamp; Combine high-probability relationships with timestamps The input is fed into the pre-trained sparse temporal knowledge graph model, i.e., the TKGE model, to obtain the corresponding candidate tail entities. The probability of: in The current state The entity at time, E is the set of all candidate entities, This represents a pre-trained TKGE model; the top k entities with the highest probabilities are selected as target entities and combined with x high-probability relation-timestamp pairs. Generate a potential action space for the intelligent agent. ; The fusion hit reward for: Where 1() is an indicator function, The source entity being queried. The relationship to the query For the target entity, For the timestamp of the query, Temporal knowledge graph; returns 1 when the input is true; returns 0 when the input is false; The embedded reward number for: in, It is a scoring function obtained based on an embedded sparse temporal knowledge graph model. When the target entity does not match the answer, the agent... Get the quadruple The reward; The path reward for: in, This represents a similarity evaluation function based on Euclidean distance; Ultimately, the multi-dimensional reward function : in, It is a custom hyperparameter, and .

2. The information retrieval method for multi-hop path reasoning on a sparse temporal knowledge graph according to claim 1, characterized in that, In the reinforcement learning framework, the state of the agent after completing the t-hop path search By querying relationships Current Entity Historical search paths Query the timestamp information corresponding to the facts definition, Current state An action Using the temporal knowledge graph (TKG) and the current entity Related entities , and Relationship The time when the facts occurred definition, for The complete set of actions in the current state forms the action space. This includes all possible retrieval paths starting from the current retrieval point; the state transition rule is that if the agent chooses an action... For the next action, the state Transition to a new state The state transition sets the maximum number of hops to T. If the agent has not reached the correct target entity by t=T, the state transition will proceed to state T. Termination, at this time The basic reward is as follows: if the agent eventually reaches the correct target entity through a multi-hop inference path, the agent is given a reward of 1; otherwise, if the agent fails to reach the correct target entity, the agent is given a reward of 0.

3. The information retrieval method for multi-hop path reasoning on a sparse temporal knowledge graph according to claim 1, characterized in that, Before reaching the maximum hop count limit, if the agent has located the correct target entity, it must remain at the current entity node and configure a self-loop action for each state; if the current entity is equal to the target entity, it must remain at the target entity node. Furthermore, if the number of hops in the search is within a given limit, the agent chooses to stay at the current node. The self-loop action acts as a stop operation, ensuring that the agent does not continue searching after finding the target entity.

4. The information retrieval method for multi-hop path reasoning on a sparse temporal knowledge graph according to claim 1, characterized in that, In the policy network that incorporates the attention mechanism, historical reasoning path information Represented as: in The hidden state is represented by a combination of timestamps and relationships, containing past observation and action sequences. Historical reasoning path information is encoded as a continuous vector using an LSTM model. For an initial time t=0, a zero-padded vector is used as input to the initial hidden state, and the LSTM outputs the initial hidden state based on its weights. ,as follows: in, and These are the initial relationship and the initial timestamp, used to correlate with the initial entity. The initial startup action is formed, and the dynamic update process of LSTM is represented as follows: The agent in the current state Choose the action Calculate the previous action timestamp With the current action timestamp The difference Taking the absolute value yields the embedding of the time-relative displacement, as shown in the formula: action The embedded information is updated as follows: Current state All possible actions in the lower action space Stacking to form a matrix ,in Represents the real number field. The dimensions representing the entity vector, relation vector, and timestamp vector are defined in the policy network as follows: in, For the updated policy network; This represents the original policy network; m is a vector whose elements come from the set This indicates whether to select the corresponding action. In state s t The number of possible actions; m is a positive number. i ~Bernoulli(1- ), representing m i It is sampled from the Bernoulli distribution, and its success probability is 1-β; Let m be the parameter of the Bernoulli distribution. i The probability is 1.

5. The information retrieval method for multi-hop path reasoning on a sparse temporal knowledge graph according to claim 1, characterized in that, Potential Actions The probabilities are as follows: Use parameters Control the proportion of actions to be added, with parameter M representing the maximum number of additional actions and the size of the potential action space. Defined as: in This indicates a round-up operation. Refers to the current state Given the size of the existing action space, select the top k high-probability target entities for each relation-timestamp combination, where k is a custom hyperparameter; the number of generated entities is... The potential action space for each state The number of relationship-timestamp combinations selected, x, is as follows: The action space update process is represented as follows: in This is the updated action space.

6. An information retrieval system for multi-hop path reasoning on a sparse temporal knowledge graph, used to implement the information retrieval method according to any one of claims 1-5, characterized in that, include: The task definition module constructs a temporal knowledge graph dataset containing temporal information and sets up a retrieval task to retrieve target entities from the temporal knowledge graph that match the semantics of the user query. The sparse retrieval scenario modeling module generates sparse subset datasets based on temporal knowledge graph datasets to construct sparse retrieval scenarios. The learning framework building module establishes a reinforcement learning framework, defines the state, action space, state transition rules and basic rewards, and simulates path exploration and navigation logic. The policy network construction module evaluates the value of candidate actions and selects the optimal action by incorporating a policy network with an attention mechanism. The dynamic completion module is constructed by combining a pre-trained temporal knowledge graph embedding model to expand the action space in sparse retrieval scenarios. The reward function design module designs a multi-dimensional reward function that integrates hit rewards, embedding rewards, and path rewards to guide the agent in optimizing path selection. The retrieval and reasoning module takes a user query as input, and the agent, guided by the policy network, performs multi-hop path search using the expanded action space, and outputs the retrieval results in combination with a multi-dimensional reward function.

7. An electronic device, comprising a memory, a processor, and a computer program stored in the memory and running thereon, characterized in that, When the processor executes the program, it implements the information retrieval method for multi-hop path reasoning on a sparse temporal knowledge graph as described in any one of claims 1-5.

8. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by a processor, the program implements the information retrieval method for multi-hop path reasoning on a sparse temporal knowledge graph as described in any one of claims 1-5.