A knowledge graph query method fusing vector retrieval

By performing semi-structured text description transformation and semantic vector generation on knowledge graph tuples, the semantic alignment problem between entities, attributes, and relationships in the knowledge graph and user query questions is solved, improving the accuracy and stability of query statements, adapting to mixed Chinese and English scenarios, and reducing schema size.

CN122285686APending Publication Date: 2026-06-26CHINESE PEOPLES LIBERATION ARMY INFORMATION SUPPORT CORPS ENGINEERING UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINESE PEOPLES LIBERATION ARMY INFORMATION SUPPORT CORPS ENGINEERING UNIVERSITY
Filing Date
2026-02-13
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

There are issues with the semantic alignment between entities, attributes, and relationships in knowledge graphs and user queries, especially the differences in expression in mixed Chinese and English scenarios; excessively long knowledge graph schemas make it difficult for large models to accurately understand the meaning of the entire schema, affecting the accuracy of query generation.

Method used

By performing semi-structured text description transformation on tuples in the knowledge graph, semantic vectors of tuples are generated. Semantic encoding is then performed using a pre-trained semantic model. The similarity between the query vector and the semantic vector of the tuples is calculated, candidate tuples are selected, a recommendation pattern is formed, and finally, a knowledge graph query statement is generated.

Benefits of technology

It effectively solves the semantic alignment problem in mixed Chinese and English scenarios, improves the matching accuracy of attributes, relationships, and entities, significantly improves the accuracy and stability of query statement generation, reduces schema size, and ensures the reliability of query statements generated from large models.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure FT_1
    Figure FT_1
  • Figure QLYQS_1
    Figure QLYQS_1
  • Figure QLYQS_5
    Figure QLYQS_5
Patent Text Reader

Abstract

This invention discloses a knowledge graph query method integrating vector retrieval. First, the tuples in the knowledge graph are transformed into semi-structured text descriptions, obtaining descriptive information including entity names, attribute fields, attribute field descriptions, and attribute values. Then, based on template filling, the semi-structured text descriptions are converted into complete semantic descriptions, and the two are concatenated and encoded by a pre-trained semantic model to generate tuple semantic vectors. Next, the user query question is encoded to obtain a query vector, its similarity to the tuple semantic vectors is calculated, and candidate tuples are selected. After deduplication filtering, a recommendation pattern is formed. Finally, the user query question and the recommendation pattern are input into a large language model to generate a knowledge graph query statement. This invention achieves accurate alignment between the query question and the knowledge graph schema through semantic vector retrieval, effectively solving the semantic alignment problem in mixed Chinese and English scenarios and the problem of decreased query accuracy caused by excessively long schemas.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of knowledge graph and natural language processing technology, specifically to a knowledge graph query method that integrates vector retrieval. Background Technology

[0002] In recent years, large-scale pre-trained language models based on the Transformer architecture have achieved significant breakthroughs in the field of natural language processing, demonstrating powerful text generation, understanding, and reasoning capabilities. However, these large models still suffer from inherent defects such as knowledge fixation and insufficient timeliness, poor interpretability, and insufficient coverage of long-tail knowledge. Meanwhile, knowledge graphs, as structured semantic networks, can explicitly express knowledge in the form of triples, offering advantages in accuracy, interpretability, and scalability. However, their application is limited by the completeness of knowledge coverage and the complexity of dynamic construction.

[0003] Combining large-scale models with knowledge graphs can create complementary advantages. In terms of knowledge enhancement, knowledge graphs provide structured knowledge support for large-scale models, compensating for their factual deficiencies. Regarding interpretability of reasoning, the explicit relational paths within the knowledge graph enhance the transparency of model reasoning. And in terms of dynamic update mechanisms, knowledge graphs can be updated in real time, helping large-scale models adapt to new knowledge. This combination has broad application value in scenarios such as intelligent question-answering systems, recommendation systems, financial risk control, medical auxiliary diagnosis, personalized learning in education, and knowledge engineering in vertical domains.

[0004] However, existing knowledge graph-based question answering technologies face two main challenges. First, there are discrepancies in the representation of entity, attribute, and relation values ​​within the knowledge graph compared to the query question. The design and construction of knowledge graphs fully consider the diversity of semantic representations, resulting in numerous aliases for attributes, entities, relations, and their corresponding values, which complicates graph queries. This is particularly problematic in domestic applications, where knowledge graphs often use English fields for naming, leading to semantic misalignment and low accuracy when users query with Chinese questions. Furthermore, insufficient alias coverage can cause character differences between the query value and the graph value, affecting the stability of query performance.

[0005] Secondly, the lack of standardized schema design during the construction of knowledge graphs causes inconvenience for the interaction of large models. The emergence of large models has brought new opportunities and development to knowledge graph-based question answering, providing more solutions for complex questions. However, since the design of knowledge graphs often precedes that of large models, the early design and query processes often rely on graph schema mapping tables and field descriptions, resulting in inconsistencies. In domestic database and graph design, naming conventions often use pinyin or pinyin abbreviations, which makes graph querying and updating based on large models difficult. Furthermore, if a graph is large, the schema content will be too long, affecting the effectiveness of the large model in generating query statements. Summary of the Invention

[0006] This invention proposes a knowledge graph query method that integrates vector retrieval, aiming to solve the following technical problems: the semantic alignment between entities, attributes, and relationships in the knowledge graph and user query questions, especially the expression differences in mixed Chinese and English scenarios; and the problem that excessively long knowledge graph schemas make it difficult for large models to accurately understand the meaning of the entire schema, thus affecting the accuracy of query statement generation.

[0007] To address the aforementioned technical problems, this invention provides a knowledge graph query method that integrates vector retrieval, comprising the following steps: Step S1: Perform semi-structured text description transformation on the tuples in the knowledge graph to obtain a semi-structured text description containing entity name, attribute field, attribute field description and attribute value; Step S2: Based on template filling, the semi-structured text description is converted into a complete semantic description, and the complete semantic description is concatenated with the semi-structured text description and encoded by a pre-trained semantic model to generate a tuple semantic vector; Step S3: Encode the user query question into a query vector using the pre-trained semantic model, calculate the similarity between the query vector and the semantic vector of the tuple, filter candidate tuples, and form a recommendation pattern after deduplication and filtering of the candidate tuples. Step S4: Input the user query question and the recommendation mode into the large language model to generate a knowledge graph query statement.

[0008] Preferably, the tuples include triples in the form of <entity, attribute, attribute value> or <head entity, relation, tail entity>, and quadruples in the form of <entity, time, location, event>.

[0009] Preferably, the semi-structured text description transformation is achieved by extracting and completing each field of the tuple through preset constraint rules, or by semantically rewriting the tuple through a large language model.

[0010] Preferably, the semi-structured text description conversion is represented as follows: ; In the formula, It is a semi-structured text description. For conversion tools, The input tuple.

[0011] Preferably, the generated representation of the tuple semantic vector is as follows: ; In the formula, For tuple semantic vectors, For pre-trained semantic models, For a complete semantic description, It is a semi-structured text description. This is a text concatenation operation.

[0012] Preferably, the template filling uses a preset template format where the attribute {attribute field description} of entity {entity name} is {attribute value}, where the values ​​within curly braces are the corresponding field values ​​extracted from the semi-structured text description.

[0013] Preferably, the process of generating the recommendation pattern is represented as follows: ; ; ; In the formula, For query vector, For pre-trained semantic models, For users to query questions, For candidate plural groups, The cosine similarity function is used. For the set of all tuple semantic vectors, For the number of candidates, This is a deduplication filtering function. For the number of recommendations, This is the recommended mode.

[0014] Preferably, the deduplication filtering function performs the operation of removing tuples with the same entity, the same attribute, and the same attribute value, and the output recommendation pattern includes an entity recommendation list, an attribute recommendation list, and a relationship recommendation list.

[0015] Preferably, the pre-trained semantic model supports semantic encoding of mixed Chinese and English text.

[0016] Preferably, the knowledge graph query statement is a Gremlin query statement, and the large language model generates the Gremlin query statement based on a prompt template that includes user needs, suggested entity and attribute information, example queries, and output format requirements.

[0017] The beneficial effects of the present invention include at least the following: (1) By converting the tuples in the knowledge graph into semi-structured text descriptions, the original field names are retained while Chinese description information is added, which effectively solves the semantic alignment problem in mixed Chinese and English scenarios and improves the matching accuracy of attributes, relations and entities; (2) Construct a multi-group semantic vector based on a pre-trained semantic model, and vectorize the complete semantic description and the semi-structured text description after concatenation, so as to meet the dual requirements of semantic understanding and accurate matching. (3) A two-stage strategy of semantic vector retrieval combined with deduplication filtering is adopted to realize schema recommendation for query problems, effectively reduce the schema size of the input large model, and significantly improve the accuracy of query statement generation; (4) The overall solution forms a complete processing link from knowledge graph to query statement. The modules work together to ensure the stability and reliability of the large model to generate graph query statements. Attached Figure Description

[0018] Figure 1 This is a schematic diagram of the process of an embodiment of the present invention. Detailed Implementation

[0019] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the protection scope of the present invention.

[0020] like Figure 1 As shown, the knowledge graph query method with fusion vector retrieval provided by this invention comprises three core components: a knowledge graph-oriented vector construction module, a vector retrieval-based schema recommendation module, and a query statement generation module. The specific implementation of each component is described in detail below.

[0021] Step S1: Perform semi-structured text description transformation on the tuples in the knowledge graph to obtain a semi-structured text description containing entity name, attribute field, attribute field description and attribute value.

[0022] A knowledge graph is a graph-like data structure designed according to ontology attributes and relationships during its construction, typically stored using triples or tuples. A triple usually contains an entity, entity attributes, and relationships between entities. Taking a drone-related knowledge graph as an example, where "drone" is the entity name, "endurance time" is the attribute, and "2 hours" is the attribute value, it can form the triple <drone, endurance_time, 2 hours>. For more complex information representations, drone flight events can be represented as quadruples containing time, location, and event task, such as <drone, March 7th, Shuxi Lake, aerial photography>.

[0023] To address the discrepancy between field naming in knowledge graphs and user queries, this invention transforms entity attributes and entity relationships—multiples within the knowledge graph—into semi-structured descriptive information using a conversion tool. This conversion process can be represented as: ; In the formula, It is a semi-structured text description. For conversion tools, The input tuple.

[0024] The conversion tool can retrieve and complete relevant fields according to preset constraint methods, or it can rewrite them using a large language model to form semi-structured text. Taking the triple <drone, duration_time, 2 hours> as an example, the converted semi-structured text description is: "[Entity Name: Drone, Attribute Field: duration_time, Attribute Field Description: Endurance Time, Attribute Value: 2 hours]". This conversion method preserves the original English field names to ensure accurate matching, while also supplementing Chinese descriptions to support semantic understanding.

[0025] Step S2: Convert the semi-structured text description into a complete semantic description based on template filling, and then concatenate the complete semantic description with the semi-structured text description and encode it through a pre-trained semantic model to generate a tuple semantic vector.

[0026] Pre-trained semantic vector models are built from a large amount of unstructured text data and possess several important capabilities. In semantic understanding, they can map words to vectors that capture semantic meaning, analyzing sentence semantics and relationships between components. In semantic similarity calculation, they can quantify the degree of similarity between words and texts, aiding in tasks such as word meaning disambiguation. Their semantic reasoning capabilities enable them to combine external knowledge or deduce new semantic information based on logic, supporting knowledge-based and simple logical reasoning. In context awareness, they dynamically adjust word semantics according to the context, clearly defining the referent. They also possess language generation capabilities, such as extracting key information from text summaries, creating accurate question-answering systems, and generating stories, poems, and copywriting based on themes.

[0027] However, due to the relative scarcity of training samples for semi-structured data, semantic vector models do not perform ideally in the field of semi-structured data. Furthermore, users typically use unstructured natural language text when querying knowledge graphs, making direct retrieval of triples less effective. To address this issue, this invention employs template filling to convert semi-structured descriptions into complete semantic descriptions and then performs vectorization processing.

[0028] First, the semi-structured text description is converted into a complete semantic description using a template filling tool: ; In the formula, For a complete semantic description, For template fill tool, It is a semi-structured text description.

[0029] The default template format is "Entity {Entity Name}'s attribute {Attribute Field Description} is {Attribute Value}", where the content within curly braces is the corresponding field value extracted from the semi-structured text description. Taking the aforementioned example, the complete semantic description obtained after filling in the blanks is "Entity drone's attribute endurance is 2 hours".

[0030] Then, the complete semantic description is concatenated with the semi-structured text description and input into the pre-trained semantic model to generate a tuple semantic vector: ; In the formula, For tuple semantic vectors, For pre-trained semantic models, For a complete semantic description, It is a semi-structured text description. This is a text concatenation operation.

[0031] The concatenated text example is: "The attribute of the entity drone is its endurance time of 2 hours. Its original description is: [Entity Name: Drone, Attribute Field: endurance_time, Attribute Field Description: Endurance Time, Attribute Value: 2 hours]". This design takes into account both semantic understanding and precise matching requirements. It should be noted that the pre-trained semantic model needs to have the ability to perform mixed Chinese and English retrieval to adapt to the common mixed Chinese and English naming in knowledge graphs.

[0032] Step S3: Encode the user query question into a query vector using a pre-trained semantic model, calculate the similarity between the query vector and the semantic vector of the tuple, and filter candidate tuples. After deduplication and filtering of the candidate tuples, a recommendation pattern is formed.

[0033] Because schemas in knowledge graphs are typically lengthy, large models struggle to accurately understand the meaning of all schemas during query generation. This invention addresses user queries by leveraging semantic retrieval of relevant attribute tuples and relation tuples to recommend accurate schema information. This process primarily involves two steps: query vectorization and semantic retrieval.

[0034] First, the user's query is input into the pre-trained semantic model to generate a query vector: ; In the formula, For query vector, For pre-trained semantic models, To help users find answers to their questions.

[0035] Then, calculate the cosine similarity between the query vector and all tuple semantic vectors, and select the top tuples with the highest similarity. 1 tuple as candidate tuples: ; In the formula, For the candidate tuple set, This is the cosine similarity calculation function. For query vector, The set of all tuple semantic vectors. The parameter representing the number of candidate tuples.

[0036] Finally, the candidate tuples are deduplicated to generate the final recommendation pattern: ; In the formula, For recommendation mode, This is a deduplication filtering function. For candidate plural groups, This is the parameter for the final recommended number of tuples.

[0037] The deduplication filtering function performs the following operations: removing candidate tuples containing the same entities, removing tuples containing the same attributes, and removing tuples containing the same attribute values. After filtering, standardized entity recommendation lists, attribute recommendation lists, and relationship recommendation lists are generated, providing accurate schema support for generating query statements for large models.

[0038] Step S4: Input the user query question and recommendation mode into the large language model to generate a knowledge graph query statement.

[0039] The knowledge graph query generation based on large-scale models leverages the powerful language understanding and generation capabilities of these models, combined with entity and relationship information from the knowledge graph, to automatically generate accurate query statements. This module can mine potential information within the knowledge graph based on given needs or contexts, providing users with efficient and accurate data query services, and has significant application value in fields such as intelligent search and data analysis.

[0040] This invention uses Gremlin syntax as the target format for query statements. Gremlin is a graph traversal language widely used in various graph database systems. The large model generates query statements conforming to Gremlin syntax specifications based on suggested entities, entity attributes, and entity relationships in the recommendation pattern, combined with preset suggestion templates.

[0041] The design of the suggestion template includes the following key parts: a user requirement description section to clarify the user's query intent; a suggested entity and entity attribute information section, containing structured information such as entity type, attribute list, and relationship list obtained from the schema recommendation module; an example query statement section, providing examples of multiple typical query scenarios to help the large model understand the query statement generation pattern; and an output format requirement section, specifying that the results should be output in JSON format, including the target entity name and executable Gremlin statement.

[0042] The following is a specific example of a prompt template: “ # User Needs {user_requirement} # Output Requirements - Generate Gremlin syntax to answer user requests based on `suggested entities and entity attributes`. - Pay attention to identifying the entity names and attributes requested in the requirements. - Return the final result strictly according to the following JSON format. json {{ "entity_name": "<target entity name>", "code": "<Gremlin statements that can be executed>" }} ``` --- # Example - Suggested entity types and entity attributes: - Entity: Entity Name: C919 Entity Type: Aircraft - Entity attributes: Entity Type: Aircraft Type Tag: 1 Attribute List: [RunwayLengthCode: Runway Length (meters), Span: Width (meters), Height: Height (meters), PhysicalSizeCode: Physical Size Level, Length: Length (meters)] **Relationship List**: [chuanGanQiDaZaifeiji: Carrying, tuiJinXiTongDaZaifeiji: Carrying] --- - User requirements: Find the altitude of C919 - Generated JSON code: {{ "entity_name": "C919", "code": "gV().has('name', ' C919').values('Height')" }} --- - User requirements: What is the longest airplane? - Generated JSON code: {{ "entity_name": "", "code": "gV().hasLabel('1').order().by('Length', Order.desc).limit(1).properties('Length')" }} --- - User requirements: What sensors does the C919 carry? - Generated JSON code: {{ "entity_name": "C919", "code": "gV().has('name', 'J-30').out('chuanGanQiDaZaishuimianjianting').values('name')" }} --- - User requirements: What are the attribute information of the C919? - Generated JSON code: {{ "entity_name": "C919", "code": "gV().has('name', 'J-30').properties()" }} === # Suggestions for entities, entity attributes, and relationships ## Suggested Entity Attributes and Relationships {entity_property} ## Top-3 Recommended Entities {entity_concept} --- # User Needs {user_requirement} Please use the entity name from the suggested entities to perform the query.

[0043] " In the suggested entities and entity attributes section, the example content includes entity C919 (type is aircraft), whose attribute list includes fields such as runway length, width, height, physical size level, and length, and the relationship list includes relationships such as sensor mounting and propulsion system mounting.

[0044] Through the collaborative work of the above four modules, this invention realizes a complete conversion process from user natural language queries to knowledge graph structured query statements, effectively solving the two core problems of semantic alignment and schema recommendation, and significantly improving the query accuracy and stability of knowledge graph-based question answering systems.

[0045] The technical features of the above embodiments can be combined arbitrarily. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described; only preferred embodiments of the present invention are illustrated. The descriptions are relatively specific and detailed, but they should not be construed as limiting the scope of the present invention. As long as the combination of these technical features does not contradict each other, it should be considered within the scope of this specification.

[0046] It should be noted that those skilled in the art can make various modifications and improvements without departing from the inventive concept, and these all fall within the scope of protection of this invention. Therefore, the scope of protection of this invention should be determined by the appended claims.

Claims

1. A knowledge graph query method integrating vector retrieval, characterized in that: Includes the following steps: Step S1: Perform semi-structured text description transformation on the tuples in the knowledge graph to obtain a semi-structured text description containing entity name, attribute field, attribute field description and attribute value; Step S2: Based on template filling, the semi-structured text description is converted into a complete semantic description, and the complete semantic description is concatenated with the semi-structured text description and encoded by a pre-trained semantic model to generate a tuple semantic vector; Step S3: Encode the user query question into a query vector using the pre-trained semantic model, calculate the similarity between the query vector and the semantic vector of the tuple, filter candidate tuples, and form a recommendation pattern after deduplication and filtering of the candidate tuples. Step S4: Input the user query question and the recommendation mode into the large language model to generate a knowledge graph query statement.

2. The method according to claim 1, characterized in that: The tuples include triples in the form of <entity, attribute, attribute value> or <head entity, relation, tail entity>, and quadruples in the form of <entity, time, location, event>.

3. The method according to claim 1, characterized in that: The semi-structured text description transformation is achieved by extracting and completing each field of the tuple through preset constraint rules, or by semantically rewriting the tuple through a large language model.

4. The method according to claim 1, characterized in that: The semi-structured text description is transformed as follows: ; In the formula, It is a semi-structured text description. For conversion tools, The input tuple.

5. The method according to claim 1, characterized in that: The generated representation of the tuple semantic vector is as follows: ; In the formula, For tuple semantic vectors, For pre-trained semantic models, For a complete semantic description, It is a semi-structured text description. This is a text concatenation operation.

6. The method according to claim 5, characterized in that: The template filling uses a preset template format: the attribute {attribute field description} of entity {entity name} is {attribute value}, where the curly braces contain the corresponding field value extracted from the semi-structured text description.

7. The method according to claim 1, characterized in that: The process of generating the recommendation pattern is represented as follows: ; ; ; In the formula, For query vector, For pre-trained semantic models, For users to query questions, For candidate plural groups, The cosine similarity function is used. For the set of all tuple semantic vectors, For the number of candidates, This is a deduplication filtering function. For the number of recommendations, This is the recommended mode.

8. The method according to claim 7, characterized in that: The deduplication filtering function performs the operation of removing tuples with the same entity, the same attribute, and the same attribute value. The output recommendation pattern includes an entity recommendation list, an attribute recommendation list, and a relationship recommendation list.

9. The method according to claim 1, characterized in that: The pre-trained semantic model supports semantic encoding of mixed Chinese and English text.

10. The method according to claim 1, characterized in that: The knowledge graph query statement is a Gremlin query statement, which is generated by the large language model based on a prompt template that includes user needs, suggested entity and attribute information, example queries, and output format requirements.