Knowledge graph-based question synthesis method and device, storage medium and computer device

By combining knowledge graphs and generative models, we generate natural language questions with clear logical paths and rich semantics, solving the problem of easily broken logical chains in traditional methods and achieving high-quality multi-hop reasoning question-answering data generation.

CN122242607APending Publication Date: 2026-06-19北京银联金卡科技有限公司

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
北京银联金卡科技有限公司
Filing Date
2026-03-06
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Traditional data synthesis methods based on RAG or LLM struggle to ensure that the retrieved context contains clear, complete, and multi-hop reasoning logical paths. As a result, the control of the generated question format mainly relies on external templates, making it impossible to achieve fine-grained control of the logical structure and prone to logical errors.

Method used

By combining the structured information of knowledge graphs with the language modeling capabilities of generative models, complex multi-hop reasoning question-and-answer synthesis data with clear logical paths, rich semantics, and diverse sentence structures is generated. A dual-perspective processing mode is adopted to extract structured logical connections and generalized semantic elements from knowledge graphs and perform dynamic weighted fusion to generate natural language questions.

Benefits of technology

The generated questions not only conform to the inherent logic of knowledge graphs, but also possess fluency and generalization ability close to human expression. This effectively solves the problem that the logical chain of traditional models is prone to breakage and deviation during multi-step reasoning, and improves the accuracy and naturalness of the questions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242607A_ABST
    Figure CN122242607A_ABST
Patent Text Reader

Abstract

This application discloses a method, apparatus, storage medium, and computer device for question synthesis based on knowledge graphs. The method includes: acquiring the target answer and its associated subgraph structure from a knowledge graph; generating a first semantic element and a second semantic element based on the subgraph structure, wherein the first semantic element is extracted from the topological relationships of the subgraph structure, and the second semantic element is generated based on the semantic features of the subgraph structure and the target answer; fusing the first and second semantic elements to obtain a fused feature; and generating a natural language question pointing to the target answer based on the fused feature and the subgraph structure. This achieves complementarity between structural constraints and semantic expression through fusion, improving the naturalness and expressive diversity of the question while ensuring the accuracy and logic of the generated question, effectively solving the problem of easily broken or deviated logical chains in traditional data synthesis tasks.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of artificial intelligence and deep learning technology, and in particular to a method, apparatus, storage medium and computer device for question synthesis based on knowledge graphs. Background Technology

[0002] The collection and management of high-quality training data is fundamental to the development of high-performance text models, but this process is typically very costly and time-consuming. Therefore, generating high-quality synthetic data is crucial for the continued development of large models.

[0003] Traditional data synthesis methods based on RAG or LLM guide the generation process by retrieving relevant context. However, whether it's the vector database used by RAG or the hints used by LLM, the retrieval is essentially unstructured, making it difficult to ensure that the retrieved context contains a clear, complete, and multi-hop reasoning-compatible logical path. If the correct answer is not retrieved, the model may develop illusions, leading to serious logical errors. This results in the generation of questions relying primarily on external templates for question design control, a coarse-grained control mechanism that can control the semantic type of the questions but cannot provide fine-grained control over the precise internal logical structure required for multi-hop reasoning. Summary of the Invention

[0004] In view of this, this application provides a method, apparatus, storage medium and computer device for question synthesis based on knowledge graphs. By combining the structured information of knowledge graphs with the language modeling capabilities of generative models, it produces complex multi-hop reasoning question-answering synthesis data with clear logical paths, rich semantics and diverse sentence structures.

[0005] According to a first aspect of this application, a question synthesis method based on a knowledge graph is provided, the method comprising: Obtain the target answer and its associated subgraph structure from the knowledge graph; Based on the subgraph structure, a first semantic element and a second semantic element are generated. The first semantic element is extracted from the topological relationship of the subgraph structure and is used to express the structured logical connection between entities. The second semantic element is generated based on the semantic features of the subgraph structure and the target answer and is used to express the generalized semantic intent of the context. The first semantic element and the second semantic element are fused together to obtain the fused feature; Based on the fusion features and the subgraph structure, a natural language question is generated that points to the target answer.

[0006] Optionally, the method further includes: Determine the entity nodes and edge relationships in the subgraph structure that are within a first preset number of hops of the target answer, wherein the edge relationships are used to represent the reasoning path whose endpoint points to the target answer; Convert the name tags and attribute values ​​of the entity nodes into entity text; Semantic connectors are determined based on the relation type of the edge relationship; Based on the logical order of the edge relationships, the entity text is combined with the semantic connectors to form the first semantic element with an explicit logical chain.

[0007] Optionally, the method further includes: The subgraph structure and the text corresponding to the target answer are encoded by a multi-layer self-attention mechanism to generate joint semantic features that reflect the semantic interaction between the subgraph structure and the target answer. With the optimization goal of maximizing language fluency and semantic coverage, the joint semantic features are decoded and mapped through a first generative model to generate a second semantic element containing synonym substitutions or general expressions.

[0008] Optionally, the step of fusing the first semantic element and the second semantic element to obtain a fused feature includes: The first semantic element and the second semantic element are mapped to a unified vector space respectively to obtain the first feature vector and the second feature vector; The first feature vector and the second feature vector are concatenated to obtain the concatenated vector; The concatenated vector is input into an adaptive gating network, and a gating vector is calculated based on the semantic distribution of the concatenated vector using the Sigmoid activation function of the adaptive gating network. Each element value in the gating vector represents the importance weight of the first semantic element and the second semantic element in their corresponding dimension. Based on the gate vector, the first feature vector and the second feature vector are nonlinearly weighted and fused to determine the fused feature.

[0009] Optionally, the step of performing nonlinear weighted fusion of the first feature vector and the second feature vector based on the gate vector to obtain the fused feature includes: Calculate the element-wise product of the gate vector and the first feature vector to obtain the first weighting term; The second weighting term is obtained by calculating the difference between the unit vector and the gate vector and then multiplying it element-wise with the second feature vector. The fusion feature is obtained by summing the first weighted term and the second weighted term.

[0010] Optionally, generating a natural language question pointing to the target answer based on the fusion features and the subgraph structure includes: Obtain joint semantic features that reflect the semantic interaction relationship between the subgraph structure and the target answer; The fused features and the joint semantic features are input into the second generation model. Through the cross-attention mechanism of the second generation model, the contextual constraints of the joint semantic features and the semantic constraints of the fused features are simultaneously verified, and candidate questions are predicted and generated through autoregression. The candidate questions are syntactically verified using the natural language processing tool, and the candidate questions that pass the verification are taken as the natural language questions.

[0011] Optionally, the step of simultaneously verifying the contextual constraints of the joint semantic features and the semantic constraints of the fused features through the cross-attention mechanism of the second generation model, and autoregressively predicting and generating candidate questions, includes: The first attention distribution of the current decoding state and the joint semantic features, and the second attention distribution of the current decoding state and the fused features are calculated respectively through the cross attention layer of the second generation model, wherein the current decoding state is determined based on the generated prefix sequence; The first attention distribution and the second attention distribution are weighted and fused to obtain a comprehensive context representation; Predict the probability distribution of the current word element based on the comprehensive context representation; Based on the probability distribution, target words are generated by sampling sequentially until the iteration ends and the candidate question is obtained.

[0012] Optionally, obtaining the target answer and its associated subgraph structure in the knowledge graph includes: Select at least one entity or one relationship path from the knowledge graph as the target answer; Centered on the target answer, the knowledge graph extracts neighboring entities and their connections that are within the second preset number of hops of the target answer, thus forming the subgraph structure. The second preset hop count range is greater than the first preset hop count range.

[0013] According to a second aspect of this application, a knowledge graph-based question synthesis apparatus is provided, the apparatus comprising: The acquisition module is used to acquire the target answer and its associated subgraph structure in the knowledge graph; The guiding word generation module is used to generate a first semantic element and a second semantic element based on the subgraph structure. The first semantic element is extracted from the topological relationship of the subgraph structure and is used to express the structured logical connection between entities. The second semantic element is generated based on the semantic features of the subgraph structure and the target answer and is used to express the generalized semantic intent of the context. The fusion module is used to fuse the first semantic element and the second semantic element to obtain a fused feature; The statement generation module is used to generate a natural language question pointing to the target answer based on the fusion features and the subgraph structure.

[0014] Optionally, the guiding word generation module is specifically used to determine the entity nodes and edge relationships in the subgraph structure that are within a first preset number of jumps to the target answer, wherein the edge relationships are used to represent the reasoning path whose endpoint points to the target answer; convert the name tags and attribute values ​​of the entity nodes into entity text; determine semantic connectors based on the relationship type of the edge relationships; and combine the entity text with the semantic connectors based on the logical order of the edge relationships to form the first semantic element with an explicit logical chain.

[0015] Optionally, the guide word generation module is specifically used to encode the text corresponding to the subgraph structure and the target answer through a multi-layer self-attention mechanism to generate joint semantic features that reflect the semantic interaction relationship between the subgraph structure and the target answer; with the optimization goal of maximizing language fluency and semantic coverage, the joint semantic features are decoded and mapped through a first generation model to generate the second semantic element containing synonym substitutions or general expressions.

[0016] Optionally, the fusion module is specifically used to map the first semantic element and the second semantic element to a unified vector space to obtain a first feature vector and a second feature vector; concatenate the first feature vector and the second feature vector to obtain a concatenated vector; input the concatenated vector into an adaptive gating network, and calculate a gating vector based on the semantic distribution of the concatenated vector using the sigmoid activation function of the adaptive gating network, wherein each element value in the gating vector represents the importance weight of the first semantic element and the second semantic element in their corresponding dimension; and perform nonlinear weighted fusion on the first feature vector and the second feature vector according to the gating vector to determine the fused feature.

[0017] Optionally, the fusion module is specifically used to calculate the element-wise product of the gate vector and the first feature vector to obtain a first weighting term; calculate the element-wise product of the difference between the unit vector and the gate vector and the second feature vector to obtain a second weighting term; and sum the first weighting term and the second weighting term to obtain the fused feature.

[0018] Optionally, the sentence generation module is specifically used to obtain joint semantic features that reflect the semantic interaction relationship between the subgraph structure and the target answer; input the fused features and the joint semantic features into the second generation model, and simultaneously verify the contextual constraints of the joint semantic features and the semantic constraints of the fused features through the cross-attention mechanism of the second generation model, predict and generate candidate questions through autoregression; verify the syntactic fluency of the candidate questions through the natural language processing tool, and take the candidate questions that pass the verification as the natural language questions.

[0019] Optionally, the sentence generation module is specifically used to calculate, through the cross-attention layer of the second generation model, a first attention distribution of the current decoding state and the joint semantic features, and a second attention distribution of the current decoding state and the fused features, wherein the current decoding state is determined based on the generated prefix sequence; the first attention distribution and the second attention distribution are weighted and fused to obtain a comprehensive context representation; the probability distribution of the current word is predicted based on the comprehensive context representation; and target words are sequentially sampled and generated based on the probability distribution until the iteration ends to obtain the candidate question.

[0020] Optionally, the acquisition module is specifically used to select at least one entity or a relationship path from the knowledge graph as the target answer; with the target answer as the center, extract neighboring entities and their connection relationships that are within a second preset hop count range from the knowledge graph to form the subgraph structure; wherein, the second preset hop count range is greater than the first preset hop count range.

[0021] According to a third aspect of this application, a readable storage medium is provided on which a program or instructions are stored, which, when executed by a processor, implement the steps of the above-described knowledge graph-based question synthesis method.

[0022] According to a fourth aspect of this application, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the above-described knowledge graph-based question synthesis method.

[0023] By employing the aforementioned technical solution, after determining the target answer and its associated subgraph structure within the knowledge graph, a dual-perspective processing mode is adopted: first, highly deterministic entities and relationships are directly extracted from the structural information of the knowledge graph as structured first semantic elements; second, through a first generative model, the joint representation of the target answer and subgraph structure is decoded into a set of generative second semantic elements with better generalization and linguistic fluency. The semantic elements from these two sources are dynamically weighted and fused, and the fused features are used as guiding information to generate the final multi-hop reasoning natural language question. Thus, through fusion, structural constraints and semantic expression are complemented, ensuring the accuracy and logic of question generation while improving the naturalness and expressive diversity of the questions. This results in questions that conform to the inherent logic of the knowledge graph and possess fluency and generalization ability close to human expression, effectively solving the problem of easily broken and deviated logical chains in traditional models when handling multi-step reasoning.

[0024] The above description is only an overview of the technical solution of this application. In order to better understand the technical means of this application and to implement it in accordance with the contents of the specification, and to make the above and other objects, features and advantages of this application more obvious and understandable, specific embodiments of this application are given below. Attached Figure Description

[0025] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings: Figure 1 A flowchart illustrating the question synthesis method based on knowledge graphs provided in an embodiment of this application is shown. Figure 2 This illustration shows a schematic diagram of the operational logic of the knowledge graph-based question synthesis method provided in an embodiment of this application; Figure 3 This paper shows a structural block diagram of a knowledge graph-based question synthesis device provided in an embodiment of this application; Figure 4 A schematic diagram of the electronic structure of a computer device provided in an embodiment of this application is shown. Detailed Implementation

[0026] The present application will be described in detail below with reference to the accompanying drawings and embodiments. It should be noted that, unless otherwise specified, the embodiments and features described in the embodiments of the present application can be combined with each other.

[0027] The embodiments of this application are described in detail below. Examples of these embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain this application, and should not be construed as limiting this application.

[0028] Those skilled in the art will understand that, unless specifically stated otherwise, the singular forms “a,” “an,” “the,” and “the” used herein may also include the plural forms. It should be further understood that the term “comprising” as used in this application means the presence of the stated features, integers, steps, operations, elements, and / or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and / or groups thereof. It should be understood that when we say an element is “connected” or “attached” to another element, it can be directly connected or attached to the other element, or there may be intermediate elements present. Furthermore, the term “and / or” as used herein includes all or any units and all combinations of one or more associated listed items.

[0029] Exemplary embodiments according to this application will now be described in more detail with reference to the accompanying drawings. However, these exemplary embodiments may be implemented in many different forms and should not be construed as being limited to the embodiments set forth herein. It should be understood that these embodiments are provided so that the disclosure of this application is thorough and complete, and that the concept of these exemplary embodiments is fully conveyed to those skilled in the art.

[0030] This embodiment provides a question synthesis method based on knowledge graphs, such as... Figure 1 As shown, the method includes: Step 101: Obtain the target answer and its associated subgraph structure in the knowledge graph.

[0031] In this context, a knowledge graph stores various types of knowledge in a structured knowledge representation, consisting of entities (nodes) and relationships (edges). The target answer refers to the baseline content selected from the knowledge graph that ultimately needs to be pointed to by the generated natural language question; it can be one or more entities or a relationship path. A subgraph structure refers to a local region in the knowledge graph that is directly or indirectly related to the target answer.

[0032] Understandably, the names, labels, and attribute values ​​of each node and edge in a knowledge graph can be transformed into a descriptive sequence of natural language text.

[0033] In practical application scenarios, step 101 specifically includes the following steps: Step 101-1: Select at least one entity or one relationship path from the knowledge graph as the target answer.

[0034] Step 101-2: Centered on the target answer, extract neighboring entities and their connections that are within the second preset hop count range of the target answer from the knowledge graph to form the subgraph structure.

[0035] The second preset hop count range can be reasonably set according to the relevance requirements and difficulty level of information retrieval, so as to introduce richer contextual semantics and candidate entities. For example, the second preset hop count range can be set to 3 to 5 times.

[0036] In this embodiment, by limiting the extraction scope to a second preset number of hops from the target answer, all intermediate entities and relationships required to deduce the target answer from the context entity can be fully covered, forming one or more complete reasoning paths. This ensures that all factual basis required for the subsequently generated questions exists in the subgraph, allowing the model to focus on key information and avoiding computational redundancy and attention distraction caused by using the entire knowledge graph as context input. This provides a solid factual foundation for generating well-founded multi-hop reasoning questions.

[0037] For example, such as Figure 2 As shown, from a large-scale knowledge graph database, based on predefined criteria such as domain, entity type, or relation complexity, an entity or a complete relation path is selected as the target answer A. After selecting the target answer A, a breadth-first search (BFS) method is used to extract its neighboring entities and the relationships between them within a range of N hops (N is a configurable hyperparameter, usually 4 or 5), forming a subgraph structure C to represent the context of the target answer A.

[0038] The structured subgraph structure C and the answer A are converted into a sequence of natural language text. Specifically, each triple (head entity, relation, tail entity) in the context C is converted into a descriptive text segment. Simultaneously, specific prefix identifiers are added to different types of data to help the model distinguish different roles of the input information. For example, the processed context text is prefixed with "Context:", and the answer text is prefixed with "Answer:", enabling the model to learn the specific semantic functions of different input segments during the encoding phase.

[0039] The pre-trained language model is used to segment the concatenated text sequence, dividing the continuous text string into a sequence of tokens. Each token is mapped to a unique integer ID based on the tokenizer's built-in vocabulary. The pre-trained language model can be a BART model with a built-in tokenizer or other models with segmentation capabilities. To handle variable-length input sequences, padding and truncation operations are performed to standardize the length of all sequences. Furthermore, an attention mask of the same length as the input ID sequence is generated. This mask indicates the true tokens and invalid bits used for padding in subsequent self-attention calculations, preventing unnecessary computation on the padding portion. After this stage, the original graph-structured data is completely transformed into a normalized tensor containing input IDs, the attention mask, and other information, which can be directly fed into the subsequent generative model.

[0040] Step 102: Based on the subgraph structure, generate the first semantic element and the second semantic element.

[0041] The first semantic element is extracted from the topological relationships of the subgraph structure and is used to express the structured logical connections between entities. The second semantic element is generated based on the semantic features of the subgraph structure and the target answer and is used to express the generalized semantic intent of the context.

[0042] It should be noted that a semantic element is the smallest semantic unit that expresses a specific meaning, such as a text sequence, a semantic vector, or an identifier. The text sequence can be keywords, phrases, or short sentences, etc., and this application does not impose specific limitations. In the process of obtaining the first and second semantic elements, either parallel or serial logic can be used.

[0043] In this embodiment, a dual-perspective semantic element is generated by extracting structured logical connections from topological relationships and generating generalized semantic intents from semantic features. On the one hand, the first semantic element, directly extracted from topological relationships, provides a lossless and interpretable logical anchor for the generation process, ensuring the factual accuracy of the generated questions on multi-hop reasoning paths and helping to reduce logical breaks or factual biases. On the other hand, the second semantic element has excellent generalization ability, capable of producing rich linguistic expressions such as synonyms and generalized phrases, which helps to improve the diversity and natural fluency of sentence expression in the generated content. This effectively enhances the ability to generate long-distance multi-hop reasoning questions and avoids the defects of broken logical chains or homogenized expressions.

[0044] In one embodiment, generating the first semantic element in step 102 specifically includes: determining the entity nodes and edge relationships in the subgraph structure that are within a first preset number of jumps to the target answer; converting the name tags and attribute values ​​of the entity nodes into entity text; determining semantic connectors based on the relationship type of the edge relationships; and combining entity text with semantic connectors based on the logical order of the edge relationships to form the first semantic element with an explicit logical chain.

[0045] The first preset number of hops range is smaller than the second preset number of hops range; for example, the first preset number of hops range can be set to 1 to 3 times. Edge relationships are used to represent the reasoning path from the endpoint to the target answer.

[0046] In this embodiment, the entity nodes in the subgraph structure that are within a first preset hop count range of the target answer, i.e., those with strong relevance, and the edge relationships between these entity nodes are first determined. Then, the name tags and attribute values ​​of the entity nodes are converted into descriptive entity text, and corresponding semantic connectors are matched according to the type and logical order of the edge relationships. Finally, the structured information is transformed into first semantic elements readable in natural language. Thus, by extracting two types of entity keywords and relation keywords with clear semantic orientation from the topological structure of the knowledge graph, high-fidelity structured logical information can be directly extracted from the topological structure of the knowledge graph. Moreover, the extraction process is entirely based on the objective structure of the graph, rather than the autonomous learning of the model. Therefore, the obtained first semantic elements are a direct and lossless mapping of the underlying knowledge logic, fundamentally changing the information flow and constraint paradigm of the data synthesis task. This ensures that the subsequently generated synthesized sentences are not only fluent in expression but also highly consistent with the inherent logic of the knowledge graph, eliminating logical fallacies or factual deviations at the source.

[0047] For example, entity A, which serves as the target answer, and its first- to N-1-order neighboring entities in the subgraph structure C are directly identified and extracted, and their associated names or textual descriptions are used as entity keywords. These entities constitute the core discussion objects and background of the question. To identify their source and function, special classification tags are uniformly added to these keywords, such as... <doc>This indicates that it belongs to the core entity at the context level. Similarly, from the subgraph structure C, identify the reasoning path that leads to the target answer A, consisting of multiple relation sequences. The relation names or their textual descriptions in these relation sequences are used as relation keywords. For example, if the reasoning path is (entity X) - [relation R1] → (entity Y) - [relation R2] → (answer A), then the extracted relation keywords are the combination of R1 and R2. To identify their function, special classification tags are added to these keywords, such as... <qes>This indicates that it is directly related to the reasoning logic of the problem.

[0048] In one embodiment, generating the second semantic element in step 102 specifically includes: encoding the text corresponding to the subgraph structure and the target answer through a multi-layer self-attention mechanism to generate joint semantic features that reflect the semantic interaction between the subgraph structure and the target answer; with the optimization goal of maximizing language fluency and semantic coverage, decoding and mapping the joint semantic features through a first generation model to generate the second semantic element containing synonym substitutions or general expressions.

[0049] Understandably, the text corresponding to the target answer in the subgraph structure can be obtained by converting the name label and attribute value of the node.

[0050] In this embodiment, the deep semantic interactions between entities / relationships in the subgraph structure and the target answer are fully explored through encoding processing, generating joint semantic features that include the contextual association between entities / relationships and the answer. Then, with language fluency and semantic coverage as optimization objectives, the joint semantic features are decoded using a first generative model to generate diverse language expressions (second semantic elements). This effectively enhances the richness and generalization ability of semantic expression, adapting to diverse natural language generation needs, and compensates for the shortcomings of purely structured information in terms of language flexibility and fluency. It ensures that the core semantic intent is not lost while generalizing the expression, providing semantic support that is more in line with natural language expression habits for subsequent multi-element fusion, and improving the naturalness and readability of the final generated question.

[0051] For example, such as Figure 2 As shown, the preprocessed subgraph structure C and the joint text sequence of the target answer A are input into a shared encoder. This encoder, based on the Transformer architecture, encodes the input sequence into a high-dimensional hidden state vector sequence H (joint semantic features) containing rich contextual information through a multi-layer self-attention mechanism. Subsequently, this hidden state H and the text vector representation of the target answer A are used as input to the first decoder, namely the keyword decoder (the first generative model). This keyword generation decoder is trained to generate a sequence containing... <doc>and <qes>A mixed sequence of two types of keywords. Compared with the structured first semantic elements extracted from the graph structure, the second semantic elements generated by this decoder may not be the original words in the context, but rather synonyms, hypernyms, or more general phrases generated by the model based on semantic understanding, thus enriching the expressive diversity of keywords.

[0052] Step 103: The first semantic element and the second semantic element are fused to obtain the fused feature.

[0053] In this embodiment, the first semantic element extracted from the knowledge graph topology is fused with the second semantic element obtained through semantic generation. This fully combines the high determinism and strong logic of structured logic with the high fluency and strong expressiveness of generalized semantics. The fused features possess both precise logical constraints and flexible semantic expression, ensuring that the subsequent question generation process does not deviate from the objective facts of the knowledge graph, while also improving the diversity and fluency of natural language expression. This provides stable and comprehensive core guiding information for the final generation of logically rigorous and semantically natural multi-hop reasoning questions.

[0054] In practical applications, step 103 specifically includes the following steps: Step 103-1: Map the first semantic element and the second semantic element to a unified vector space to obtain the first feature vector and the second feature vector.

[0055] Step 103-2: Concatenate the first feature vector and the second feature vector to obtain a concatenated vector; input the concatenated vector into an adaptive gating network, and calculate the gating vector based on the semantic distribution of the concatenated vector using the Sigmoid activation function of the adaptive gating network.

[0056] In this context, each element value in the gating vector represents the importance weight of the first and second semantic elements in their corresponding dimensions.

[0057] Step 103-3: Based on the gate vector, perform nonlinear weighted fusion of the first feature vector and the second feature vector to determine the fused feature.

[0058] In this embodiment, the feature vectors of the first and second semantic elements are concatenated and input into an adaptive gating network. A sigmoid activation function is used to dynamically calculate a gating vector based on the semantic distribution of the concatenated vectors, representing the importance of the two types of semantic elements in each dimension. Then, a non-linear weighted fusion of the first and second feature vectors is achieved based on the gating vector. This enables deep, adaptive, and dynamic collaboration between structured logical information and generalized semantic information within a unified semantic space. It avoids the insufficient adaptability problem caused by fixed weights and automatically strengthens key features and suppresses redundant information according to different subgraph structures and question-answering scenarios. The resulting fused features possess knowledge accuracy, logical rigor, and linguistic flexibility, providing more accurate, robust, and discriminative guidance information for subsequent question generation, significantly improving the stability and quality of multi-hop reasoning question generation.

[0059] Further, step 103-3 specifically includes: calculating the element-wise product of the gate vector and the first feature vector to obtain the first weighting term; calculating the element-wise product of the difference between the unit vector and the gate vector and the second feature vector to obtain the second weighting term; and summing the first weighting term and the second weighting term to obtain the fused feature.

[0060] In this context, unit vectors are used to implement complementary weights in a mathematical sense.

[0061] For example, based on the gate vector, the first feature vector and the second feature vector are nonlinearly weighted and fused to determine the fused features, which can be calculated using the following formula: ; ; In the formula, and These represent the first eigenvector and the second eigenvector, respectively. Represents the concatenated vector; This represents the learnable weight matrix in an adaptive gating network; This represents the learnable bias vector in an adaptive gating network; This represents the Sigmoid activation function. The gating vector is between 0 and 1, and its dimension is the same as that of the keyword vector. Each element value in the gating vector represents the importance weight of the keyword in the corresponding dimension. This indicates element-wise multiplication; Represents the first weighted term; This represents the second weighted term.

[0062] This gating mechanism allows the model to adaptively learn during training, deciding whether to rely more on structurally precise graph keywords or focus more on fluent generative keywords for different samples and dimensions. For example, when generating questions requiring precise entity names, the model might provide... Higher weighting; while when generating more conversational or general questions, it may rely more on .

[0063] Step 104: Based on the fusion features and subgraph structure, generate a natural language question pointing to the target answer.

[0064] The knowledge graph-based question synthesis method provided in this application, after determining the target answer and its associated subgraph structure in the knowledge graph, adopts a dual-perspective processing mode: first, it directly extracts highly deterministic entities and relationships from the structural information of the knowledge graph as structured first semantic elements; second, it decodes the joint representation of the target answer and the subgraph structure into a set of generative second semantic elements with better generalization and linguistic fluency through a first generative model. The semantic elements from the two sources are dynamically weighted and fused, and the fused features are used as guiding information to generate the final multi-hop reasoning natural language question. Thus, the fusion achieves complementarity between structural constraints and semantic expression, improving the naturalness and expressive diversity of the question while ensuring the accuracy and logic of the generated question. This results in a final generated question that conforms to the inherent logic of the knowledge graph and possesses fluency and generalization ability close to human expression, effectively solving the problem of easily broken and deviated logical chains in traditional models when handling multi-step reasoning.

[0065] In practical applications, step 104 specifically includes the following steps: Step 104-1: Obtain joint semantic features that reflect the semantic interaction between the subgraph structure and the target answer.

[0066] It should be noted that a multi-layer self-attention mechanism can be used to encode the text corresponding to the subgraph structure and the target answer, generating joint semantic features that reflect the semantic interaction between the subgraph structure and the target answer.

[0067] Step 104-2: Input the fused features and joint semantic features into the second generative model. Through the cross-attention mechanism of the second generative model, simultaneously verify the contextual constraints of the joint semantic features and the semantic constraints of the fused features, and predict and generate candidate questions through autoregression.

[0068] In this embodiment, after the encoder encodes the subgraph structure and the target answer text into high-dimensional joint semantic features, the system fuses the aforementioned fusion features with the joint semantic features to generate natural language questions. This forces the data synthesis process to focus within a pre-defined logical framework, enabling the autoregressive prediction process to simultaneously consider the structural logic of the knowledge graph and the semantic guidance of the fusion features, significantly improving the second generative model's ability to generate long-distance, multi-hop reasoning questions. Furthermore, since the complex reasoning path has been solidified into a clear sequence of keywords, the second generative model can focus on transforming this logical path into fluent, natural natural language questions, thereby greatly improving the accuracy and complexity of the synthesized data and effectively solving the problem of logical chain breakage or deviation in traditional models when handling multi-step reasoning.

[0069] For example, such as Figure 2 As shown, the fused features and the contextual hidden state H are fed into the second decoder, namely the question generation decoder (the second generative model). This decoder is also based on the Transformer architecture, and its core task is to autoregressively generate the final multi-hop reasoning question Q under the joint constraints of the subgraph structure C and the fused features. The fused features not only limit the core entities and relationships of the question, but also affect the sentence structure and wording of the question, ensuring that the generated question is logically clear, content-relevant, and accurately points to the preset answer A.

[0070] Further, step 104-2 specifically includes: calculating the first attention distribution of the current decoding state and the joint semantic features, and the second attention distribution of the current decoding state and the fused features through the cross-attention layer of the second generation model; performing weighted fusion of the first attention distribution and the second attention distribution to obtain a comprehensive context representation; predicting the probability distribution of the current word based on the comprehensive context representation; and sequentially sampling and generating target words based on the probability distribution until the iteration ends to obtain candidate questions.

[0071] The current decoding state is determined based on the generated prefix sequence, which is a lexical unit that has been generated and determined before the current time in the autoregressive generation process. This lexical unit can be a lexical unit obtained by segmenting from the first semantic element and the second semantic element.

[0072] In this embodiment, the attention distribution between the current decoding state and the joint semantic features and fused features is calculated through the cross-attention layer of the second generation model. The first attention distribution ensures that the model can continuously perceive the deep semantic interaction between the subgraph and the answer, maintaining the coherence and contextual consistency of question generation. At the same time, the second attention distribution forces the model to accurately focus on the key entities and relational paths after gating and fusion in each generation step, effectively suppressing semantic drift and illusion phenomena in long sequence generation. After weighted fusion of the two attention distributions, a probability distribution of the comprehensive context representation predicting the current word element is obtained. Based on this probability distribution, candidate questions are generated word by word through autoregression. Thus, during the decoding process, the semantic interaction information between the subgraph and the answer, as well as the guiding information after the fusion of structured and generalized semantics, can be accurately utilized simultaneously to achieve dual constraints on knowledge logic and language expression. This mechanism avoids problems such as logical breaks, semantic deviations, or awkward expressions in multi-hop reasoning, ensuring that the generated questions are logically clear, content-related, and accurately point to the preset target answer, significantly improving the accuracy, logic, and generation quality of candidate questions.

[0073] Step 104-3: Perform syntactic fluency verification on the candidate questions using natural language processing tools, and use the candidate questions that pass the verification as natural language questions.

[0074] In this embodiment, the candidate questions are checked for syntactic fluency using natural language processing tools. This further filters out non-standard expressions and grammatical incoherence, ensuring that the final output natural language questions strictly conform to the objective facts and reasoning chains of the knowledge graph. This not only significantly improves the logical rigor, complexity, and naturalness of the synthesized questions, but also generates high-quality training corpora with high robustness and interpretability for complex multi-hop reasoning tasks.

[0075] It's worth noting that after obtaining the natural language question pointing to the target answer, the natural language question Q generated by the model can be combined with its corresponding original subgraph structure C and the target answer A to form a complete (C, Q, A) data triple. All (C, Q, A) triples that pass the screening and validation are then aggregated and organized and stored according to standard dataset formats (such as JSON, CSV, etc.) to form the final high-quality dual-view multi-hop reasoning question-answering test set. This test set can be used to evaluate and train the reasoning capabilities of other complex question-answering systems.

[0076] In one specific embodiment, taking a corporate financing and investment knowledge graph as an example, the question synthesis method based on knowledge graphs provided in this application is applied for data synthesis. Specifically, the entities used in the corporate financing and investment knowledge graph mainly include: corporate entity names (e.g., new energy company X), investment institution entities (e.g., investment institution Y), industry classification entities (e.g., industry Z), and financing round entities (e.g., strategic financing). These entities, as nodes of the graph, are the core referents of questions and answers. The relationships used in the graph mainly include: business / event relationships: such as "obtaining financing" and "belonging to an industry". Investment behavior relationships: such as "investing" and "being invested in". Time / sequence relationships: relationships related to financing rounds, used to determine the chronological order or stage of events.

[0077] First, in the preprocessing stage, the system selects a specific corporate entity or a financing relationship path from the corporate financing and investment knowledge graph as the baseline answer A. For example, the most recent "strategic financing" relationship path of "a certain new energy company" is selected as answer A. This relationship includes, but is not limited to, attributes such as investors, financing and investment types, amounts, and time.

[0078] Next, using answer A as the center, a breadth-first search is employed to extract its neighboring entities and their relationships within a range of N hops (e.g., N=4), constructing a local subgraph. This subgraph is then fed into the larger model, which rewrites the triple "(New Energy Company A, received financing, strategic financing)" into a natural language description and adds the prefix "Context:", forming subgraph structure C. Subgraph structure C contains entities and relationships associated with the answer, such as investment institutions, financing rounds, and industry classifications, ensuring all the factual basis required to generate multi-hop reasoning questions. Finally, the BART tokenizer is called to segment and convert the concatenated text sequence, generating a standardized input attention mask tensor and ID sequence, completing the transformation of the data structure into model input.

[0079] Secondly, in the dual-perspective keyword extraction stage, the system generates two types of keywords in parallel: (I) Keywords based on knowledge graph structure: Extract precise structured information related to answer A directly from the local subgraph corresponding to subgraph structure C. The entity of answer A and its 1st to N-1th order neighbor entities are used as entity keywords, and the corresponding relationships are considered relational keywords. For example, "investment institution Y" and "industry Z" are extracted as entity keywords, and the relationship combination "investment institution Y - belongs to - industry Z" is extracted as relational keywords. Finally, these are combined into a graph structure keyword sequence KG. These keywords have high determinism and strong logic. Taking the above as an example, the KG sequence content can be simply represented as: [Innovative medical technology company E, F round financing, venture capital institution Y, state-owned capital Z, biopharmaceutical industry, a certain pharmaceutical R&D center T, J round financing, received financing, investment, belongs to industry, previous round].

[0080] (II) Keywords Based on Generative Model: The preprocessed subgraph structure C, answer A, and the joint text sequence of the three sets of information (subgraph structure C and answer A) are input into the Transformer encoder. The hidden state H output by the encoder and the representation of answer A are fed into the keyword decoder. Based on semantic understanding, the decoder generates a set of generative keyword sequences KM. These keywords may contain synonyms or generalized phrases of words in the graph to enhance the fluency and generalization of the language. The content of the KM sequence can be simply represented as: [Venture Capital Y, State-owned Capital Z, Biotechnology Company, November 2016, 250 million yuan, Medical Technology, Financing]. The KM sequence may contain synonyms (such as "Biotechnology Company" corresponding to "Innovative Medical Technology Company E") and generalized phrases (such as "Medical Technology" and "Financing").

[0081] Subsequently, in the gating fusion and question generation stages, the system dynamically weights and fuses the two types of keywords: the graph structure keyword sequence KG and the generative keyword sequence KM are converted into vector representations EG and EM respectively through an embedding layer. Next, the gating unit calculates the fusion weights to obtain the final fused keyword representation EF. Then, the fused keyword EF, along with the contextual hidden state H output by the encoder, is fed into the question generation decoder. Under dual constraints, this decoder autoregressively generates logically clear, multi-hop reasoning questions Q that point to answer A. For example, it generates questions requiring multi-step reasoning, such as "Among the companies in industry Z invested in by investment institution Y, what is the name of the company that received 19 million RMB in strategic financing on November 9, 2016?"

[0082] Finally, in the post-processing stage, the generated question Q, its corresponding subgraph structure C, and the answer A are combined into a complete question-answer pair (C, Q, A). Tools such as spaCy and Stanza are used to verify syntactic fluency, filtering out grammatically incorrect or incoherent question-answer pairs. Ultimately, all the high-quality question-answer pair triples that pass the filtering and verification are organized and stored according to standard formats such as JSON, forming a high-quality multi-hop reasoning question-answering test set for enterprise financing and investment that can be used to evaluate or train complex question-answering systems.

[0083] The knowledge graph-based question synthesis method provided in this application can be applied to a terminal, a server, or software running on either a terminal or a server. In some embodiments, the terminal can be a smartphone, tablet, laptop, desktop computer, etc.; the server can be configured as an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.

[0084] It should be noted that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.

[0085] The user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties. Furthermore, the collection, use and processing of the relevant data must comply with the relevant laws, regulations and standards of the relevant countries and regions, and corresponding operation entry points are provided for users to choose to authorize or refuse.

[0086] Furthermore, such as Figure 3 As shown, as a specific implementation of the above-mentioned question synthesis method based on knowledge graph, this application provides a question synthesis device 300 based on knowledge graph. The question synthesis device 300 based on knowledge graph includes: an acquisition module 301, a guide word generation module 302, a fusion module 303, and a sentence generation module 304.

[0087] Among them, the acquisition module 301 is used to acquire the target answer and its associated subgraph structure in the knowledge graph; The guide word generation module 302 is used to generate a first semantic element and a second semantic element based on the subgraph structure. The first semantic element is extracted from the topological relationship of the subgraph structure and is used to express the structured logical connection between entities. The second semantic element is generated based on the semantic features of the subgraph structure and the target answer and is used to express the generalized semantic intent of the context. The fusion module 303 is used to fuse the first semantic element and the second semantic element to obtain the fused feature; The statement generation module 304 is used to generate natural language questions pointing to the target answer based on fused features and subgraph structure.

[0088] Furthermore, the guide word generation module 302 is specifically used to determine the entity nodes and edge relationships in the subgraph structure that are within the first preset number of jumps of the target answer. The edge relationships are used to represent the reasoning path from the endpoint to the target answer. The name tags and attribute values ​​of the entity nodes are converted into entity text. Semantic connectors are determined based on the relationship type of the edge relationships. Based on the logical order of the edge relationships, the entity text is combined with the semantic connectors to form the first semantic element with an explicit logical chain.

[0089] Furthermore, the guiding word generation module 302 is specifically used to encode the text corresponding to the subgraph structure and the target answer through a multi-layer self-attention mechanism to generate joint semantic features that reflect the semantic interaction relationship between the subgraph structure and the target answer; with the optimization goal of maximizing language fluency and semantic coverage, the joint semantic features are decoded and mapped through the first generation model to generate a second semantic element containing synonym substitutions or general expressions.

[0090] Further, the fusion module 303 is specifically used to map the first semantic element and the second semantic element to a unified vector space to obtain a first feature vector and a second feature vector; to concatenate the first feature vector and the second feature vector to obtain a concatenated vector; to input the concatenated vector into an adaptive gating network, and to calculate a gating vector based on the semantic distribution of the concatenated vector using the sigmoid activation function of the adaptive gating network, wherein each element value in the gating vector represents the importance weight of the first semantic element and the second semantic element in their corresponding dimension; and to perform nonlinear weighted fusion of the first feature vector and the second feature vector according to the gating vector to determine the fused feature.

[0091] Furthermore, the fusion module 303 is specifically used to calculate the element-wise product of the gate vector and the first feature vector to obtain the first weighting term; calculate the element-wise product of the difference between the unit vector and the gate vector and the second feature vector to obtain the second weighting term; and sum the first weighting term and the second weighting term to obtain the fused feature.

[0092] Furthermore, the sentence generation module 304 is specifically used to obtain joint semantic features that reflect the semantic interaction relationship between the subgraph structure and the target answer; input the fused features and joint semantic features into the second generation model, and simultaneously verify the contextual constraints of the joint semantic features and the semantic constraints of the fused features through the cross-attention mechanism of the second generation model, predict and generate candidate questions through autoregression; verify the syntactic fluency of the candidate questions through natural language processing tools, and take the candidate questions that pass the verification as natural language questions.

[0093] Furthermore, the sentence generation module 304 is specifically used to calculate the first attention distribution of the current decoding state and the joint semantic features, and the second attention distribution of the current decoding state and the fused features through the cross-attention layer of the second generation model, wherein the current decoding state is determined based on the generated prefix sequence; the first attention distribution and the second attention distribution are weighted and fused to obtain a comprehensive context representation; the probability distribution of the current word is predicted based on the comprehensive context representation; and target words are generated by sampling sequentially based on the probability distribution until the iteration ends to obtain candidate questions.

[0094] Furthermore, the acquisition module 301 is specifically used to select at least one entity or a relationship path from the knowledge graph as the target answer; with the target answer as the center, it extracts neighboring entities and their connection relationships that are within a second preset hop count range from the knowledge graph to form a subgraph structure; wherein, the second preset hop count range is greater than the first preset hop count range.

[0095] Specific limitations regarding the knowledge graph-based question synthesis device can be found in the limitations of the knowledge graph-based question synthesis method described above, and will not be repeated here. Each module in the aforementioned knowledge graph-based question synthesis device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device, or stored in the memory of a computer device as software, so that the processor can call and execute the operations corresponding to each module.

[0096] Based on the above, Figure 1 Accordingly, embodiments of this application also provide a readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the above-described method. Figure 1 The method for question synthesis based on knowledge graphs is shown.

[0097] Based on this understanding, the technical solution of this application can be embodied in the form of a software product. The software product can be stored in a non-volatile storage medium (such as a CD-ROM, USB flash drive, or portable hard drive), and includes several instructions to cause a computer device (such as a personal computer, server, or network device) to execute the methods described in the various implementation scenarios of this application.

[0098] Based on the above, Figure 1 The method shown, and Figure 3 The virtual device embodiment shown is designed to achieve the above objectives, such as... Figure 4 As shown in the figure, this application embodiment also provides a computer device 400, which includes a processor 401 and a memory 402. The memory 402 stores a program or instructions that can run on the processor 401. When the program or instructions are executed by the processor 401, they implement the above-mentioned... Figure 1 The method for question synthesis based on knowledge graphs is shown.

[0099] The memory 402 can be used to store software programs and various data. The memory 402 may primarily include a first storage area for storing programs or instructions and a second storage area for storing data. The first storage area may store the operating system, application programs or instructions required for at least one function (such as sound playback, image playback, etc.). Furthermore, the memory 402 may include volatile memory or non-volatile memory, or both. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory can be random access memory (RAM), static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous link dynamic random access memory (SLDRAM), and direct memory bus RAM (DRRAM). The memory 402 in this embodiment includes, but is not limited to, these and any other suitable types of memory.

[0100] Processor 401 may include one or more processing units; optionally, processor 401 integrates an application processor and a modem processor, wherein the application processor mainly handles operations involving the operating system, user interface, and applications, and the modem processor mainly handles wireless communication signals, such as a baseband processor. It is understood that the aforementioned modem processor may also not be integrated into processor 401.

[0101] Computer equipment can specifically include personal computers, servers, network devices, etc.

[0102] Optionally, the computer device may also include a user interface, a network interface, a camera, radio frequency (RF) circuitry, sensors, audio circuitry, a Wi-Fi module, etc. The user interface may include a display screen, input units such as a keyboard, etc., and optional user interfaces may also include USB ports, card reader ports, etc. The network interface may optionally include standard wired interfaces, wireless interfaces (such as Bluetooth interfaces, Wi-Fi interfaces), etc.

[0103] Those skilled in the art will understand that the computer device structure provided in this embodiment does not constitute a limitation on the computer device, and may include more or fewer components, or combine certain components, or have different component arrangements.

[0104] Through the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general-purpose hardware platform, or it can be implemented by hardware.

[0105] Those skilled in the art will understand that the accompanying drawings are merely schematic diagrams of a preferred embodiment, and the modules or processes shown in the drawings are not necessarily essential for implementing this application. Those skilled in the art will understand that the modules in the apparatus of the embodiment can be distributed within the apparatus of the embodiment as described, or can be modified to be located in one or more apparatuses different from this embodiment. The modules of the above-described embodiment can be combined into one module, or further divided into multiple sub-modules.

[0106] The serial numbers in this application are for descriptive purposes only and do not represent the superiority or inferiority of any particular implementation scenario. The above disclosures are merely a few specific implementation scenarios of this application; however, this application is not limited thereto, and any variations conceived by those skilled in the art should fall within the protection scope of this application.< / qes> < / doc> < / qes> < / doc>

Claims

1. A knowledge graph based question synthesis method, characterized in that, The method includes: Obtain the target answer and its associated subgraph structure from the knowledge graph; Based on the subgraph structure, a first semantic element and a second semantic element are generated. The first semantic element is extracted from the topological relationship of the subgraph structure and is used to express the structured logical connection between entities. The second semantic element is generated based on the semantic features of the subgraph structure and the target answer and is used to express the generalized semantic intent of the context. The first semantic element and the second semantic element are fused together to obtain a fused feature; Based on the fusion features and the subgraph structure, a natural language question is generated that points to the target answer. 2.The knowledge graph based question synthesis method according to claim 1, characterized in that, The method further includes: Determine the entity nodes and edge relationships in the subgraph structure that are within a first preset number of hops of the target answer, wherein the edge relationships are used to represent the reasoning path whose endpoint points to the target answer; Convert the name tags and attribute values ​​of the entity nodes into entity text; Semantic connectors are determined based on the relation type of the edge relationship; Based on the logical order of the edge relationships, the entity text is combined with the semantic connectors to form the first semantic element with an explicit logical chain. 3.The knowledge graph based question synthesis method of claim 1, wherein, The method further includes: The subgraph structure and the text corresponding to the target answer are encoded by a multi-layer self-attention mechanism to generate joint semantic features that reflect the semantic interaction between the subgraph structure and the target answer. With the optimization goal of maximizing language fluency and semantic coverage, the joint semantic features are decoded and mapped through a first generative model to generate a second semantic element containing synonym substitutions or general expressions. 4.The knowledge graph based question synthesis method of claim 1, wherein, The process of fusing the first semantic element and the second semantic element to obtain the fused feature includes: The first semantic element and the second semantic element are mapped to a unified vector space respectively to obtain the first feature vector and the second feature vector; The first feature vector and the second feature vector are concatenated to obtain the concatenated vector; The concatenated vector is input into an adaptive gating network, and a gating vector is calculated based on the semantic distribution of the concatenated vector using the Sigmoid activation function of the adaptive gating network. Each element value in the gating vector represents the importance weight of the first semantic element and the second semantic element in their corresponding dimension. Based on the gate vector, the first feature vector and the second feature vector are nonlinearly weighted and fused to determine the fused feature. 5.The knowledge graph based question synthesis method of claim 4, wherein, The step of performing nonlinear weighted fusion of the first feature vector and the second feature vector based on the gate vector to obtain the fused feature includes: Calculate the element-wise product of the gate vector and the first feature vector to obtain the first weighting term; The second weighting term is obtained by calculating the difference between the unit vector and the gate vector and then multiplying it element-wise with the second feature vector. The fusion feature is obtained by summing the first weighted term and the second weighted term. 6.The knowledge graph based question synthesis method according to claim 1, characterized in that, The step of generating a natural language question pointing to the target answer based on the fusion features and the subgraph structure includes: Obtain joint semantic features that reflect the semantic interaction relationship between the subgraph structure and the target answer; The fused features and the joint semantic features are input into the second generation model. Through the cross-attention mechanism of the second generation model, the contextual constraints of the joint semantic features and the semantic constraints of the fused features are simultaneously verified, and candidate questions are predicted and generated through autoregression. The candidate questions are syntactically verified using the natural language processing tool, and the candidate questions that pass the verification are taken as the natural language questions. 7.The knowledge graph based question synthesis method of claim 6, wherein, The process of simultaneously verifying the contextual constraints of the joint semantic features and the semantic constraints of the fused features through the cross-attention mechanism of the second generative model, and predicting and generating candidate questions through autoregression, includes: The first attention distribution of the current decoding state and the joint semantic features, and the second attention distribution of the current decoding state and the fused features are calculated respectively through the cross attention layer of the second generation model, wherein the current decoding state is determined based on the generated prefix sequence; The first attention distribution and the second attention distribution are weighted and fused to obtain a comprehensive context representation; Predict the probability distribution of the current word element based on the comprehensive context representation; Based on the probability distribution, target words are generated by sampling sequentially until the iteration ends and the candidate question is obtained. 8.A knowledge graph based question synthesis device, characterized by comprising: The device includes: The acquisition module is used to acquire the target answer and its associated subgraph structure in the knowledge graph; The guiding word generation module is used to generate a first semantic element and a second semantic element based on the subgraph structure. The first semantic element is extracted from the topological relationship of the subgraph structure and is used to express the structured logical connection between entities. The second semantic element is generated based on the semantic features of the subgraph structure and the target answer and is used to express the generalized semantic intent of the context. The fusion module is used to fuse the first semantic element and the second semantic element to obtain a fused feature; The statement generation module is used to generate a natural language question pointing to the target answer based on the fusion features and the subgraph structure.

9. A readable storage medium, on which a program or instructions are stored, characterized in that, When the program or instructions are executed by the processor, they implement the question synthesis method based on knowledge graphs as described in any one of claims 1 to 7.

10. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the question synthesis method based on knowledge graphs as described in any one of claims 1 to 7.