Knowledge question and answer system and method in water conservancy engineering field
By combining a hybrid retrieval method of large language models and knowledge graphs, a knowledge question-answering system for the field of water conservancy engineering was constructed, which solved the problems of long query time and information asymmetry in traditional queries, and achieved efficient and accurate knowledge query and answer generation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANGHAI JIAOTONG UNIV
- Filing Date
- 2024-12-27
- Publication Date
- 2026-06-30
AI Technical Summary
In the field of water conservancy engineering, traditional knowledge retrieval processes are time-consuming and error-prone, and information is asymmetrical. Existing technologies rely on predefined prompt templates and subgraph retrieval strategies to provide limited local knowledge.
By combining large language models and knowledge graphs, and employing hybrid retrieval methods (vector retrieval, keyword retrieval, and graph retrieval), and using few-sample hints, a knowledge question-answering system for the field of water conservancy engineering is constructed. The LangChain expression language is used to extract entities and relationships to generate accurate answers.
It improves knowledge utilization and retrieval recall, generates more comprehensive and context-dependent answers, reduces training costs, and enhances the accuracy and robustness of the question-answering system.
Smart Images

Figure CN122309634A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to a technology in the field of natural language processing, specifically a method for implementing a knowledge question-answering system in the field of water conservancy engineering based on a large language model and knowledge graph. Background Technology
[0002] Currently, traditional knowledge retrieval processes in the water conservancy engineering field often involve searching through numerous paper or electronic documents, requiring multiple document searches to obtain the answer. This process is time-consuming and prone to errors. Furthermore, relevant documents may be stored in different departments, leading to information asymmetry and low transparency. Summary of the Invention
[0003] This invention addresses the shortcomings of existing technologies that rely primarily on predefined prompt templates, instance data, and subgraph retrieval strategies, providing only localized and limited knowledge. It proposes a method for implementing a knowledge-based question-answering system in the field of water conservancy engineering. By combining the excellent reasoning capabilities of large language models with the contextual summarization capabilities of knowledge graphs, this system can accurately answer questions within the knowledge scope of the water conservancy engineering field.
[0004] This invention is achieved through the following technical solution:
[0005] This invention relates to a method for implementing a knowledge-based question-answering system in the field of water conservancy engineering based on a large language model and knowledge graph. First, using the LangChain expression language and through a prompting process, the system extracts entities and relationships from the user's input question using a large language model and outputs them in JSON format. Then, a hybrid search engine, including vector search, keyword search, and graph search, is configured sequentially. Based on the entities extracted from the user's input question, vector search, keyword search, and graph search are performed respectively, and the search results are merged. Finally, the merged search results are filled into a preset prompt template, and the constructed prompt statement is output to the large language model to generate the answer and return it to the user. Technical effect
[0006] This invention processes and utilizes large-scale text in the field of water conservancy engineering, including data cleaning of unstructured text, text standardization, word segmentation, text vectorization, and the construction of a knowledge graph for the field. Few-shot hints are employed in the user input question analysis, text vectorization, and answer generation stages. For knowledge retrieval, a hybrid retrieval method combining vector retrieval, keyword retrieval, and graph retrieval is used. Compared to the complex and scattered nature of existing text in the field of water conservancy engineering, this invention collects and organizes a large amount of relevant data, processes the data, and constructs a knowledge graph, improving knowledge utilization and facilitating subsequent question-and-answer system queries. The use of few-shot hints in multiple stages further assists the large language model in understanding the conveyed content and has a certain degree of generalization; even content not previously encountered by the large language model can be understood through hints, saving significant training costs. The hybrid retrieval method improves the recall and robustness of the retrieval, allowing for more comprehensive and context-dependent output to the large language model, thereby improving the quality of generated content. Attached Figure Description
[0007] Figure 1 This is a flowchart of the present invention;
[0008] Figure 2 This is a schematic diagram of the system in the embodiment;
[0009] Figure 3 This is a schematic diagram illustrating the effect of an example. Detailed Implementation
[0010] like Figure 2 As shown in this embodiment, a knowledge-based question-and-answer system for the water conservancy engineering field, constructed based on a large language model and knowledge graph, includes: a login module, a knowledge storage module, a knowledge graph visualization module, a knowledge graph maintenance module, an automatic question-and-answer module, and a user management module. The login module manages user accounts based on their identity information and provides login and logout operations. The knowledge graph storage module uses the Neo4j graph database to store the constructed knowledge graph for the water conservancy engineering field. The knowledge graph visualization module displays the constructed knowledge graph stored in the Neo4j graph database on the front end, including the nodes and relationships in the knowledge graph. The knowledge graph maintenance module generates new graph documents from newly input unstructured text using a graph converter, and while storing the documents using the knowledge graph storage module, it also provides users with interfaces for adding, deleting, modifying, and querying nodes, facilitating manual maintenance of the knowledge graph. The automatic question-and-answer module receives user questions, sends them to the GraphRag-based question-and-answer system, processes the questions, derives the answers, and then sends them to the front end for display. The user management module uses a MySQL database to store user account, password, and permission information, and provides an interface for administrators to add, delete, modify, and query this information.
[0011] The login module includes an account management unit and a login / logout system unit. The account management unit provides users with registration, password retrieval, and information modification functions. Users can maintain their accounts by entering their information through this unit. The login / logout system unit provides users with login and logout operations. Users can enter their account and password to authenticate and access the question-and-answer system. The system will redirect users to the page corresponding to their permissions. When users finish using the system, they can log out to exit their accounts to ensure account security.
[0012] The knowledge storage module uses the neo4j graph database to store the constructed knowledge graph of the water conservancy engineering field. The knowledge graph of the water conservancy engineering field includes: entities, that is, nodes containing multiple attributes and labels, and storage relationships between nodes, that is, edges representing the type and characteristic attributes of the relationships.
[0013] The knowledge graph visualization module displays the attributes of entity nodes and the relationships between entities in the knowledge graph. Because the relationships between entity nodes are relatively close, and the number of nodes displayed each time is large, a typical concentric circle node layout would result in node stacking. This module uses the force-guided graph algorithm from graph layout algorithms to lay out the entity nodes in the knowledge graph, achieving a more rational layout. Specifically, the knowledge graph visualization interface is built using the Vue + ECharts framework.
[0014] The knowledge graph maintenance module includes: a unit for adding, deleting, modifying, and querying knowledge graph nodes, and a unit for automatically constructing knowledge graphs based on user-input documents. Specifically, the unit for adding, deleting, modifying, and querying knowledge graph nodes uses the Django framework to perform operations on the graph database, such as adding entity nodes, adding relationships between entities, adding entity attributes, modifying entity names and types, modifying entity attribute values, modifying relationship names between entities, deleting entities, deleting entity attributes, and deleting relationships between entities. The unit for automatically constructing knowledge graphs generates new graph documents from newly input unstructured text using a graph converter and stores them using the knowledge graph storage module.
[0015] The aforementioned automatic question-answering module provides core question-answering operations. The front-end uses the Vue framework, and the back-end uses the Django framework. After receiving the user's question, the front-end sends it to the GraphRag-based question-answering system. The system processes the question, derives the answer, and then sends it back to the front-end for display.
[0016] The user management module uses a MySQL database to store user account, password, and permission information, and provides an interface for administrators to add, delete, modify, and query this information.
[0017] In this embodiment, the system is implemented using the Vue framework and the Django framework, and the database uses the neo4j graph database and the MySQL database.
[0018] like Figure 1 The diagram illustrates the implementation method of a knowledge-based question-answering system for the water conservancy engineering field based on a large language model and knowledge graph, as shown in this embodiment. The method includes:
[0019] Step 1) Using the LangChain expression language and the prompting engineering, extract the entities and relationships in the user's input question through the large language model and output them in JSON format. The prompting engineering adopts a few-shot prompting method, and the prompt words include: a detailed description of the task and a sample expected answer, in order to improve the ability of the large language model to handle related complex tasks.
[0020] For example: "You are extracting facilities, types, water conservancy concepts, and all entities of engineering from the text, focusing on nouns such as engineering equipment and materials, permanent hydraulic structures, spillway structures, slope protection and drainage of earth-rock dams, and reservoirs."
[0021] The output includes all entities from the input question and does not output any characters other than the JSON array. The output JSON array is named "entities" and the elements are named "name".
[0022] Step 2) Configure a hybrid search engine that includes both unstructured and structured search engines, specifically including:
[0023] 2.1) Construct an unstructured search engine, i.e., configure vector search and keyword search, specifically including:
[0024] 2.1.1) The Bge embbeding model BAAI / bge-large-zh-v1.5 is used to vectorize the chunked Documents and entities extracted from the input question. This model supports using hint engineering to improve the accuracy of the generated representation. The `from_existing_graph` method from the Neo4jVector library is used to add a vector retrieval tool to the document. This method can configure a vector search index for hybrid search methods. Since the chunked documents are stored in nodes with the attribute "Document" during the knowledge graph construction process, the target is the node labeled "Document".
[0025] 2.1.2) Use the Neo4jVector.from_existing_graph method to add a keyword search tool to the document. This method can configure the keyword search index for hybrid search methods. Since the knowledge graph construction process stores the segmented documents in nodes with the attribute "Document", the target is the node labeled "Document".
[0026] 2.1.3) Combine the vector searcher and the keyword searcher into an unstructured searcher to retrieve the ten document blocks most relevant to the entity vector representation and keywords extracted from the input question.
[0027] Preferably, when the relevance of the ten document blocks obtained by the unstructured retrieval machine is not the highest, the list of multiple retrieved chunks or nodes is re-ranked through a re-ranking process, so that its ranking is more relevant to the user's input question, and the more relevant and accurate chunks are ranked higher, so that they can be given priority in the generation of the large language model to improve the output quality. Specifically, fine ranking is performed by bce-reranker-base_v1 of type NetEase Youdao RerankerModel.
[0028] 2.2) Constructing a structured retrieval engine, i.e., constructing a knowledge graph retrieval engine, specifically includes:
[0029] 2.2.1) Constructing a Knowledge Graph: The LangChain loader is used to obtain knowledge documents in the field of water conservancy engineering. A recursive chunking method based on rules is employed for text segmentation. This recursive chunking first attempts to segment the text according to certain criteria (such as paragraphs or headings). If the segmented text blocks are still too large, the segmentation process is repeated on these blocks until all blocks meet the size requirements. This method is suitable for scenarios where long texts are subdivided into smaller segments while maintaining the independence and integrity of each block as much as possible. After recursively segmenting the text into chunks of 200 chunks and 100 chunks of 100 chunks based on the specified delimiters "\n\n", "\n", "", "", the LLMGraphTransformer module is used to construct a knowledge graph for the water conservancy engineering field using a large language model. Specifically, using the specified generalized large language model, the `convert_to_graph_documents` method in the LLMGraphTransformer module provided by LangChain is used to construct a graph converter, transforming the segmented documents into graph documents. This invention adopts the tool-based pattern in LLMGraphTransformer, which is suitable for large language models that support structured output or function calls. Tool calls are implemented through the built-in `with_structured_output` operation of the large language model. The tool specification defines a standardized output format, ensuring the structured and standardized extraction process of entities and relations.
[0030] In this embodiment, graph documents are returned using the above method, and these documents can be imported into Neo4j using the add_graph_documents method.
[0031] Assign an additional __Entity__ tag to each node to enhance indexing and query performance.
[0032] Link nodes to their source documents to facilitate data traceability and contextual understanding.
[0033] 2.2.2) The entities extracted in step 1 are mapped to the knowledge graph of the water conservancy engineering field using full-text indexing. The detector will traverse the detected entities and use the Cypher template to retrieve the neighborhood of the relevant nodes.
[0034] 2.3) The answers obtained from vector retrieval, keyword retrieval, and graph retrieval are combined and output to the large language model.
[0035] 2.4) Through prompting engineering, the response is generated by integrating the context provided by the hybrid retrieval system, allowing the large language model to integrate the received information and provide the answer to the question.
[0036] Through specific practical experiments, in an environment setting of Ubuntu 22.04LTS, NVIDIA Tesla A4048G graphics card, and CUDA version 12.2, 10% of the content was extracted from real-world knowledge documents in the field of enterprise water conservancy engineering. Questions and answers were manually constructed and used as the test set. Evaluation was conducted using the LangSmith platform. LangSmith is a developer platform that allows debugging, testing, evaluating, and monitoring of Large Language Model (LLM) applications and can seamlessly integrate with LangChain. This invention evaluates the quality of answer generation by comparing the answers generated in this embodiment with those generated by methods based on knowledge graphs, large language models, and RAG, using the same test set questions as input. Accuracy, EM index, and BLEU score are used to test the honesty, relevance, negative exclusion, and information integration capabilities of the embodiment's answers.
[0037] The experiment was repeated three times for each method, and the specific results are shown below: method accuracy EM index F1 value Based on knowledge graph 0.79410 0.71972 0.72352 Based on large language model 0.83193 0.75634 0.80486 RAG 0.89502 0.84391 0.86729 This invention 0.96438 0.95941 0.94302
[0038] The results show that, compared with the prior art, the present invention has significantly improved the quality of answer generation. Through hybrid retrieval methods and few-sample prompting engineering, the model can better answer the questions raised by users and provide relatively accurate answers.
[0039] The above-described specific implementations can be partially adjusted by those skilled in the art in different ways without departing from the principles and purpose of the present invention. The scope of protection of the present invention is defined by the claims and is not limited to the above-described specific implementations. All implementation schemes within the scope of the claims are bound by the present invention.
Claims
1. A method for implementing a knowledge-based question-answering system in the field of water conservancy engineering based on a large language model and knowledge graph, characterized in that, First, using the LangChain expression language and through the suggestion project, the entities and relationships in the user's input question are extracted through a large language model and output in JSON format. After configuring a hybrid search engine that includes vector search, keyword search, and graph search, the system performs vector search, keyword search, and graph search respectively on the entities extracted from the user's input question, and merges the search results. The merged search results are then filled into a preset prompt template, and the constructed prompt statement is output to the large language model to generate an answer and return it to the user.
2. The method for implementing a knowledge question-answering system in the field of water conservancy engineering based on a large language model and knowledge graph as described in claim 1, is characterized in that, specifically... include: Step 1) Use the LangChain expression language and the prompting engineering to extract the entities and relationships in the user's input question through a large language model and output them in JSON format; Step 2) Configure a hybrid search engine that includes both unstructured and structured search engines, specifically including: 2.1) Construct an unstructured search engine, i.e., configure vector search and keyword search; 2.2) Constructing a structured search engine, i.e., constructing a knowledge graph search engine; 2.3) The answers obtained from vector retrieval, keyword retrieval, and graph retrieval are combined and output to the large language model; 2.4) Through prompting engineering, the response is generated by integrating the context provided by the hybrid retrieval system, allowing the large language model to integrate the received information and provide the answer to the question.
3. The method for implementing a knowledge question-answering system in the field of water conservancy engineering based on a large language model and knowledge graph as described in claim 2, is characterized in that... Step 2.1 specifically includes: 2.1.1) The Bge embbeding model BAAI / bge-large-zh-v1.5 is used to vectorize the chunked Document and the entities extracted from the input question. This model supports improving the accuracy of the generated representation by using hint engineering. The from_existing_graph method in the Neo4jVector library is used to add a vector retrieval tool to the document. This method configures the vector search index for the hybrid search method. Since the chunked documents are stored in nodes with the attribute Document during the construction of the knowledge graph, the target is the node marked with Document. 2.1.2) Use the Neo4jVector.from_existing_graph method to add a keyword searcher to the document. This method configures the keyword search index for the hybrid search method. Since the knowledge graph is constructed by storing the segmented documents in nodes with the attribute Document, the target is the node marked with Document. 2.1.3) Combine the vector searcher and the keyword searcher into an unstructured searcher to retrieve the ten document blocks most relevant to the entity vector representation and keywords extracted from the input question.
4. The method for implementing a knowledge question-answering system in the field of water conservancy engineering based on a large language model and knowledge graph as described in claim 3, is characterized in that... When the ten document chunks obtained by the unstructured retrieval machine are not the most relevant, the list of multiple retrieved chunks or nodes is re-ranked through a re-ranking process. This re-ranking process makes the chunks more relevant and accurate to the user's input question, and prioritizes them when generating the large language model to improve output quality. Specifically, the re-ranking is performed using the bce-reranker-base_v1 of the NetEase Youdao RerankerModel type.
5. The method for implementing a knowledge question-answering system in the field of water conservancy engineering based on a large language model and knowledge graph as described in claim 2, is characterized in that... Step 2.2 specifically includes: 2.2.1) Constructing a knowledge graph: The LangChain loader is used to obtain knowledge documents in the field of water conservancy engineering, and the text is segmented by recursive chunking according to rules. The recursive chunking first attempts to segment the text according to paragraphs or headings. If the segmented text blocks are still too large, the segmentation process is repeated on these blocks until the size of all blocks meets the requirements. The text is recursively segmented into text blocks with chunk_size=200 and chunk_overlap=100 according to the specified delimiters "\n\n","\n","","". Then, the LLMGraphTransformer module is used to construct a knowledge graph of the field of water conservancy engineering through a large language model. 2.2.2) The entities extracted in step 1 are mapped to the knowledge graph of the water conservancy engineering field using full-text indexing. The detector will traverse the detected entities and use the Cypher template to retrieve the neighborhood of the relevant nodes.
6. The method for implementing a knowledge question-answering system in the field of water conservancy engineering based on a large language model and knowledge graph as described in claim 5, is characterized in that... Step 2.2.1 specifically involves: using the specified generalized 1000-question large language model, constructing a graph converter through the convert_to_graph_documents method in the LLMGraphTransformer module provided by LangChain, and converting the segmented documents into graph documents. This adopts the tool-based mode in LLMGraphTransformer, which is suitable for large language models that support structured output or function calls. Tool calls are implemented through the built-in with_structured_output operation of the large language model.
7. A knowledge-based question-answering system for the field of water conservancy engineering, constructed based on a large language model and knowledge graph, according to any one of claims 1-6, characterized in that, include: The system comprises a login module, a knowledge storage module, a knowledge graph visualization module, a knowledge graph maintenance module, an automatic question answering module, and a user management module. Specifically: the login module manages user accounts based on their identity information and provides login and logout operations; the knowledge graph storage module uses the Neo4j graph database to store the constructed knowledge graph in the field of water conservancy engineering; the knowledge graph visualization module displays the constructed knowledge graph stored in the Neo4j graph database on the front end; and the knowledge graph maintenance module generates new graph documents from newly input unstructured text using a graph converter, stores them using the knowledge graph storage module, and provides interfaces for adding, deleting, modifying, and querying nodes. The user management module uses a MySQL database to store user accounts, passwords, and permission information.
8. The knowledge question-answering system for the field of water conservancy engineering based on a large language model and knowledge graph as described in claim 7, is characterized in that... The knowledge graph visualization module displays the attributes of entity nodes in the knowledge graph and the relationships between entities, i.e., the knowledge graph visualization interface is built using the Vue + ECharts framework.
9. The knowledge question-answering system for the field of water conservancy engineering based on a large language model and knowledge graph as described in claim 7, is characterized in that... The knowledge graph maintenance module includes: a unit for adding, deleting, modifying, and querying knowledge graph nodes, and a unit for automatically constructing knowledge graphs based on user-input documents. Specifically, the unit for adding, deleting, modifying, and querying knowledge graph nodes uses the Django framework to perform operations on the graph database, such as adding entity nodes, adding relationships between entities, adding entity attributes, modifying entity names and types, modifying entity attribute values, modifying relationship names between entities, deleting entities, deleting entity attributes, and deleting relationships between entities. The unit for automatically constructing knowledge graphs generates new graph documents from newly input unstructured text using a graph converter and stores them using the knowledge graph storage module.