A method and system for constructing a large language model-based question and answer system
By constructing a dynamically updated question-and-answer knowledge base and knowledge graph, and combining multimodal data processing and semantic enhancement technologies, the real-time and accuracy problems of existing question-and-answer systems in multimodal data processing are solved, achieving efficient generation of complex query answers and timely knowledge base.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HUBEI ZHONGKE NETWORK ENG
- Filing Date
- 2025-07-22
- Publication Date
- 2026-06-26
Smart Images

Figure CN120910197B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of question-answering system technology, and in particular to a method and system for constructing a question-answering system based on a large language model. Background Technology
[0002] As a core technology of human-computer interaction, question-answering systems have been widely used in recent years in fields such as intelligent customer service, medical consultation, and financial analysis. Question-answering systems are an advanced form of information retrieval systems that can answer users' questions in natural language with accurate and concise natural language. Question-answering systems use natural language understanding technology to analyze user questions, then use information from knowledge bases or databases to generate answers, and return them to users in natural language form through natural language generation technology.
[0003] A knowledge base question answering construction method and apparatus based on a large model, disclosed in CN118035405A, includes the following steps: constructing a knowledge database; constructing a knowledge vector library using the knowledge database; obtaining user questions and matching the semantic vectors corresponding to the user questions with the content of the knowledge vector library; merging the text paragraphs of the matching knowledge vector library with the user questions to generate information to be reasoned; and generating corresponding answers to the information to be reasoned using a pre-configured large language model.
[0004] Existing question-answering systems suffer from insufficient real-time performance and weak cross-modal semantic understanding when processing multimodal data, making it difficult to accurately interpret the deep intent of complex queries and resulting in low answer accuracy. In addition, the lag in dynamic knowledge updates and the simplistic retrieval strategies further limit the effective integration of timely knowledge and multi-source information, failing to meet users' demand for accurate answers in highly dynamic scenarios and reducing answer accuracy. Summary of the Invention
[0005] In view of this, the present invention proposes a method and system for constructing a question-answering system based on a large language model, which can effectively adapt to the real-time processing needs of multiple modal data such as text, images, and speech, realize accurate semantic understanding and answer generation for complex queries, and improve the accuracy of answers.
[0006] The technical solution of this invention is implemented as follows: In a first aspect, this invention provides a method for constructing a question-answering system based on a large language model, comprising the following steps:
[0007] S1. Acquire multimodal data, construct a question-and-answer knowledge base and knowledge graph, and dynamically update the question-and-answer knowledge base;
[0008] S2, obtain the query text, and perform vectorization processing on the query text and multimodal data respectively to generate the corresponding query semantic vector and multimodal vector;
[0009] S3 uses a recognition model to extract entities from the query text and extracts triples associated with the entities from the knowledge graph. The query text and triples are concatenated and quantized to generate a query semantic enhancement vector.
[0010] S4. Combine keyword matching algorithm and similarity retrieval algorithm to retrieve the question-answering knowledge base, calculate the similarity and matching degree of the query semantic enhancement vector and multimodal vector respectively, and perform dynamic weighted calculation on the similarity and matching degree to obtain the text fragment with the highest score;
[0011] S5 concatenates the query text and the highest-scoring text segment, encodes the concatenated text using a pre-trained language model, and outputs the semantic vector of each word, thus obtaining the semantic vector matrix of the query text and the semantic vector matrix of the document.
[0012] S6. Based on the query text semantic vector matrix and the document semantic vector matrix, an attention mechanism is used to obtain the attention weight of each word in the query text and text fragments, and the query text semantic vector matrix and the document semantic vector matrix are fused to obtain the fused semantic vector.
[0013] S7: Extract words from the text segment that have a frequency greater than the preset threshold and attention weight threshold, and generate natural language using the GPT-2 model based on the fused semantic vector. Add the extracted words to the natural language, use the beam search algorithm to generate fluent sentences, and output the natural language answer text.
[0014] Based on the above technical solutions, preferably, step S1, which involves acquiring multimodal data, constructing a question-and-answer knowledge base and a knowledge graph containing entity-related triples, and dynamically updating the question-and-answer knowledge base, includes the following sub-steps:
[0015] S11: Obtain multimodal data from documents, databases, API interfaces and web pages, clean the multimodal data to obtain standard sample data, build a question-answering knowledge base based on the standard sample data, and build a knowledge graph containing entity association triples based on entity-relationship-attribute;
[0016] S12, dynamically update the question-and-answer knowledge base, determine whether the standard sample data has version update time information, if it does, obtain the corresponding updated data and data to be updated according to the update time, locate the newly added content interval between the updated data and the data to be updated through binary search, update the version difference segment, and obtain the updated standard sample data.
[0017] If it does not exist, the data blocks are divided and hash values are calculated for the updated data and the data to be updated. The difference blocks between the updated data and the data to be updated are filtered, and the semantic similarity of the difference blocks is calculated and compared to obtain the similarity difference value. A preset similarity update threshold is set. When the similarity difference value is greater than the similarity update threshold, the difference blocks of the data to be updated are updated.
[0018] Based on the above technical solutions, preferably, step S2, which involves obtaining the query text and vectorizing the query text and multimodal data to generate corresponding query semantic vectors and multimodal vectors, includes the following steps:
[0019] S21, retrieve the query text and text-image-table multimodal data;
[0020] S22, a pre-trained BERT language model is used to vectorize the text data and query text, generating text semantic vectors from the text data and query semantic vectors from the query text;
[0021] S23 uses a pre-trained ResNet visual model to vectorize image data and extract image features to generate image feature vectors.
[0022] S24: Identify table data boundaries and cells, determine header rows and data rows; establish a row and column index structure, embed and combine the header and cell content respectively, and obtain the table semantic vector.
[0023] Based on the above technical solutions, preferably, step S3 involves using a recognition model to extract entities from the query text, extracting triples associated with the entities from the knowledge graph, concatenating the query text with the triples, and performing quantization processing to generate a query semantic enhancement vector, including the following steps:
[0024] S31, Use the BLINK model to identify and obtain entities in the query text, link the entities to the knowledge graph, and extract the triples associated with the entities from the knowledge graph;
[0025] S32, concatenate the extracted triples, and input the concatenated text into the pre-trained BERT language model to generate triple vectors;
[0026] S33, by combining the triplet vector with the query semantic vector through the MLP multilayer perceptron and the first gating coefficient, the query semantic enhancement vector is output;
[0027] S34. Calculate the cosine similarity between the query semantic vector and the triple vector to obtain the relevance score, and dynamically adjust the first gating coefficient based on the relevance score.
[0028] Based on the above technical solutions, preferably, step S4 involves combining keyword matching algorithms and similarity retrieval algorithms to search the question-answering knowledge base, calculating the similarity and matching degree between the query semantic enhancement vector and the multimodal vector respectively, and dynamically weighting the similarity and matching degree to obtain the text segment with the highest score. This includes the following sub-steps:
[0029] S41 uses the Stanford CoreNLP tool to perform dependency parsing on the query text, generates a syntax tree, and calculates its maximum syntax depth.
[0030] S42, obtain the ratio between the number of entity words in the query text and the total number of entities in the knowledge graph to obtain the entity density;
[0031] S43, Initialize the weight ratio of the BM25 keyword matching algorithm and the similarity retrieval algorithm. Calculate the dynamic weights of the BM25 keyword matching algorithm and the similarity retrieval algorithm based on the maximum syntactic depth, entity density, and query complexity of the current query text. The query complexity is calculated using the syntactic tree depth and entity density.
[0032] S44. Based on the BM25 keyword matching algorithm, retrieve the question-and-answer knowledge base, calculate the matching score between the query text and the text fragment, retrieve the question-and-answer knowledge base according to the similarity retrieval algorithm, and calculate the cosine similarity between the semantic enhancement vector and the multimodal vector; according to the dynamic weights of the BM25 keyword matching algorithm and the similarity retrieval algorithm, calculate the sum score of the matching score and the cosine similarity, sort according to the final sum score, and obtain the text fragment with the highest score as the retrieval result.
[0033] Based on the above technical solutions, preferably, step S5 involves concatenating the query text and the text segment with the highest retrieved score, encoding the concatenated text using a pre-trained language model, and outputting the semantic vector of each word to obtain the semantic vector matrix of the query text and the semantic vector matrix of the document, respectively. This includes the following sub-steps:
[0034] S51, the user query text and the highest-scoring text fragment are concatenated in the following format, and the pre-trained RoBERTa-large model is used to encode the concatenated text, outputting the semantic vector of each word;
[0035] S52, determine whether the concatenated text words exceed the maximum input length of the RoBERTa-large model. If they do, retain the first and last words according to half of the maximum input length, and truncate the middle part.
[0036] S53, construct a semantic vector matrix for the query text based on the semantic vectors of the words corresponding to the query text, and construct a semantic vector matrix for the document based on the semantic vectors of the words corresponding to the text fragments;
[0037] S54. If there are multiple retrieved text fragments, perform mean pooling on the semantic vector matrix of each text fragment to obtain a comprehensive document semantic vector.
[0038] Based on the above technical solutions, preferably, step S6, which involves obtaining the attention weight of each word in the query text and text fragments through an attention mechanism based on the query text semantic vector matrix and the document semantic vector matrix, and fusing the query text semantic vector matrix and the document semantic vector matrix to obtain a fused semantic vector, includes the following sub-steps:
[0039] S61, Based on the semantic vector matrix of the query text and the semantic vector matrix of the document, calculate the attention weight of each word in the query text and text fragment through an attention mechanism;
[0040] S62, the semantic vector matrix of the query text and the semantic vector matrix of the document are fused, and the contribution ratio of the query text and the text fragment is dynamically adjusted by the second gating coefficient to calculate and generate the fused semantic vector.
[0041] Based on the above technical solutions, preferably, step S7 involves obtaining words in the text segment that have a frequency greater than a preset threshold and a focus weight threshold, generating natural language using the GPT-2 model based on the fused semantic vector, adding the extracted words to the natural language, generating fluent sentences using a beam search algorithm, and outputting the natural language answer text. This includes the following sub-steps:
[0042] The system uses preset word frequency and attention weight thresholds to filter out words in a text segment that exceed the preset word frequency and attention weight thresholds, and adds the filtered words to the generated vocabulary of GPT-2.
[0043] Based on the current context and the fused semantic vector, the generation probability of each word is calculated using the GPT-2 model;
[0044] The probability of copying each word is calculated based on attention weight and word frequency;
[0045] The candidate probabilities of each word are obtained by calculating the generation probability and duplication probability of each word using a weighted average.
[0046] Based on the candidate probabilities of each word and using a beam search algorithm, the natural language answer is output.
[0047] Secondly, the present invention also provides a system for constructing a question-answering system based on a large language model, implemented using a method for constructing a question-answering system based on a large language model, including:
[0048] The knowledge base construction module is used to acquire multimodal data, build a question-and-answer knowledge base and a knowledge graph containing entity association triples, and dynamically update the question-and-answer knowledge base;
[0049] The data processing module is used to acquire the query text and perform vectorization processing on the query text and multimodal data respectively to generate corresponding query semantic vectors and multimodal vectors;
[0050] The semantic enhancement module is used to extract entities from the query text using the recognition model, extract triples associated with the entities from the knowledge graph, concatenate the query text with the triples and perform quantization processing to generate a query semantic enhancement vector.
[0051] The retrieval and matching module is used to search the question-and-answer knowledge base by combining keyword matching algorithm and similarity retrieval algorithm. It calculates the similarity and matching degree between the query semantic enhancement vector and the multimodal vector respectively, and performs dynamic weighted calculation on the similarity and matching degree to obtain the text fragment with the highest score.
[0052] The encoding processing module is used to concatenate the query text and the text segment with the highest score, encode the concatenated text using a pre-trained language model, and output the semantic vector of each word, thus obtaining the semantic vector matrix of the query text and the semantic vector matrix of the document.
[0053] The attention fusion module is used to obtain the attention weight of each word in the query text and text fragments through an attention mechanism based on the query text semantic vector matrix and the document semantic vector matrix, and then fuse the query text semantic vector matrix and the document semantic vector matrix to obtain the fused semantic vector.
[0054] The generation module is used to obtain words in the text fragment that are greater than the preset word frequency and attention weight thresholds, and generate natural language based on the fused semantic vector using the GPT-2 model. The extracted words are then added to the natural language, and a beam search algorithm is used to generate fluent sentences, outputting the natural language answer text.
[0055] The method and system for constructing a question-answering system based on a large language model of the present invention have the following advantages over the prior art:
[0056] (1) Through the dynamic knowledge base incremental update mechanism, context-aware hybrid retrieval strategy, cross-modal semantic enhancement technology and user feedback-driven continuous optimization method, it can effectively adapt to the real-time processing needs of multiple modal data such as text, image, and voice, and achieve accurate semantic understanding and answer generation for complex queries;
[0057] (2) A highly timely question-and-answer knowledge base and knowledge graph were constructed through multi-source data fusion and dynamic update mechanism. An abnormal circuit breaker, distributed transaction lock and multi-mirror source retry mechanism were introduced to ensure update stability, ensure knowledge timeliness, and improve the response capability of the question-and-answer system to dynamic knowledge and the reliability of answers.
[0058] (3) The query structure complexity is quantified by syntactic analysis, and a query complexity model is constructed by combining entity density and text length. The Sigmoid function with learnable parameters is introduced to dynamically adjust the weights of BM25 and semantic similarity, so that the system can adaptively allocate algorithm weights according to query characteristics, avoiding matching bias caused by traditional static weights; and improving the recall and accuracy of retrieval. Attached Figure Description
[0059] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0060] Figure 1 This is a flowchart of the method for constructing a question-answering system based on a large language model according to the present invention;
[0061] Figure 2 This is a schematic diagram illustrating the dynamic context window adjustment of the question-answering system construction method based on a large language model according to the present invention;
[0062] Figure 3 This is a line graph comparing the multi-turn dialogue performance of the question-answering system construction method based on a large language model according to the present invention. Detailed Implementation
[0063] The technical solutions of the present invention will be clearly and completely described below with reference to the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present invention.
[0064] like Figure 1-3 As shown, the present invention provides a method for constructing a question-answering system based on a large language model, characterized by comprising the following steps:
[0065] S1: Acquire multimodal data, construct a question-and-answer knowledge base and knowledge graph, and dynamically update the question-and-answer knowledge base.
[0066] In this implementation, step S1 includes the following sub-steps:
[0067] S11: Obtain multimodal data from documents, databases, API interfaces and web pages, clean the multimodal data to obtain standard sample data, build a question-answering knowledge base based on the standard sample data, and build a knowledge graph containing entity association triples based on entity-relationship-attribute;
[0068] It should be noted that knowledge data is collected in real time from sources such as documents, databases, API interfaces, and web pages. Regular expression matching and semantic disambiguation techniques are used to clean redundant data. Data cleaning removes duplicate, expired, or invalid data to ensure the accuracy and timeliness of the knowledge base content. The cleaned data is stored in a structured question-and-answer knowledge base for easy retrieval and updates. NLP techniques are used to extract entities, relationships, and attributes, and triples are constructed to build a knowledge graph.
[0069] S12, dynamically update the question-and-answer knowledge base, determine whether the standard sample data has version update time information, if it does, obtain the corresponding updated data and data to be updated according to the update time, locate the newly added content interval between the updated data and the data to be updated through binary search, update the version difference segment, and obtain the updated standard sample data.
[0070] If it does not exist, the data blocks are divided and hash values are calculated for the updated data and the data to be updated. The difference blocks between the updated data and the data to be updated are filtered, and the semantic similarity of the difference blocks is calculated and compared to obtain the similarity difference value. A preset similarity update threshold is set. When the similarity difference value is greater than the similarity update threshold, the difference blocks of the data to be updated are updated.
[0071] It should be noted that if the data source provides a version timestamp, incremental data is directly obtained by time interval, and the difference segment is quickly located by binary search. The version difference segment is then updated to obtain the updated standard sample data. If there is no timestamp, block hash comparison is used. The data is divided into fixed-size blocks, each 512 bytes. The SHA-256 hash value is calculated for each block to generate a hash fingerprint. The hash blocks of the updated data are compared with the hash blocks of the data to be updated to filter the difference blocks. The cosine similarity of the difference blocks is calculated using a pre-trained Sentence-BERT model to avoid hash collisions and misjudgments. This update strategy only replaces the difference segments to reduce the consumption of computing resources. An abnormal circuit breaker mechanism is implemented in the update process. When a node fails to process, it automatically switches to the backup data mirror source for retry. Data consistency is ensured by distributed transaction locks. The cleaned data is classified and stored according to the ontology model. A three-level traceability system of version-branch-commit is established. It supports dynamic adjustment of retrieval priority according to data credibility. If the knowledge base is abnormal after the update, it can be rolled back to the previous stable version.
[0072] S2, obtain the query text, and perform vectorization processing on the query text and multimodal data respectively to generate the corresponding query semantic vector and multimodal vector.
[0073] In this implementation, step S2 includes the following steps:
[0074] S21, retrieve the query text and text-image-table multimodal data;
[0075] S22, a pre-trained BERT language model is used to vectorize the text data and query text, generating text semantic vectors from the text data and query semantic vectors from the query text; the expression is:
[0076]
[0077] In the formula, V text The query semantic vector has the same dimension as the hidden layer dimension of BERT, BERT(q). [CLS] This is the CLS tag output vector of the BERT model for the input text. When the input text is too long, a sliding window segmentation is used, with a window size of 128 and an overlap rate of 30%.
[0078] S23, a pre-trained ResNet visual model is used to vectorize the image data, extracting image features to generate image feature vectors; the expression is:
[0079]
[0080] In the formula, v image The image feature vector is used; the global features are output from the last layer GAP of ResNet-50 with a dimension of 2048, while Faster R-CNN is responsible for extracting the Top-50 region features, and MaxPool reduces the dimension to 2048.
[0081] S24, identify table data boundaries and cells, determine header rows and data rows; establish a row and column index structure, embed and combine the header and cell content respectively, to obtain the table semantic vector; the expression is:
[0082]
[0083] In the formula, v table For table semantic vectors, ⊕ represents vector concatenation, MLP(·) represents a multilayer perceptron, and Embed( H i () is the table header H i Embed( C ij () is a cellC ij The embedding vector;
[0084] It should be noted that numeric and text field types are distinguished, and the embedding matrix is initialized separately for each type. Numeric cells are also embedded after Z-Score normalization.
[0085] S3 uses a recognition model to extract entities from the query text and extracts triples associated with the entities from the knowledge graph. The query text and triples are concatenated and quantized to generate a query semantic enhancement vector.
[0086] In this implementation, step S3 includes the following steps:
[0087] S31, Use the BLINK model to identify and obtain entities in the query text, link the entities to the knowledge graph, and extract the triples associated with the entities from the knowledge graph; the expression is:
[0088]
[0089] In the formula, Expand(q) represents the expression for the query text. q The result set after knowledge expansion. Entities ( q ) is from the query text q The set of entities identified in the process h Let 'r' be the head entity in the knowledge graph, and 'r' be the relation in the knowledge graph. t For tail entities in the knowledge graph, R c For the candidate relation set;
[0090] S32, concatenate the extracted triples, and input the concatenated text into the pre-trained BERT language model to generate triple vectors;
[0091] S33, using an MLP (Multilayer Perceptron) and the first gating coefficient, fuses the triplet vector with the query semantic vector, outputting a query semantic enhancement vector; the expression is:
[0092]
[0093] In the formula, Eenh(q) is the query semantic enhancement vector, λ is the first gating coefficient, E(q) is the query semantic vector, and MLP(·) is the multilayer perceptron.
[0094] S34, calculate the cosine similarity between the query semantic vector and the triple vector to obtain the relevance score, expressed as:
[0095]
[0096] In the formula, Score(q) , hrt ) This refers to the relevance score results;
[0097] The first gate coefficient is dynamically adjusted based on the correlation score results, expressed as follows:
[0098] .
[0099] It should be noted that by using the BLINK model to achieve accurate entity recognition and knowledge graph association, the implicit semantics of the query are effectively supplemented; by using BERT vectorization to transform structured triples into semantic vectors, contextual features are preserved; furthermore, the dynamic fusion mechanism of MLP and the first gating coefficient can adaptively adjust the weights of the original query and extended knowledge according to the query complexity, avoiding information redundancy or loss; finally, the feedback mechanism based on cosine similarity optimizes the gating parameters in real time, enabling the system to generate enhanced semantic representations that combine the original intent and domain knowledge, thereby improving the recall and accuracy of retrieval.
[0100] S4 combines keyword matching and similarity retrieval algorithms to search the question-answering knowledge base, calculates the similarity and matching degree between the query semantic enhancement vector and the multimodal vector respectively, and performs dynamic weighted calculation on the similarity and matching degree to obtain the text fragment with the highest score.
[0101] In this implementation, step S4 includes the following sub-steps:
[0102] S41 uses the Stanford CoreNLP tool to perform dependency parsing on the query text, generates a syntax tree, and calculates its maximum syntax depth.
[0103] S42, obtain the ratio between the number of entity words in the query text and the total number of entities in the knowledge graph to obtain the entity density;
[0104] S43, Initialize the weight ratio of the BM25 keyword matching algorithm and the similarity retrieval algorithm. Based on the maximum syntactic depth, entity density, and query complexity of the current query text, calculate the dynamic weights of the BM25 keyword matching algorithm and the similarity retrieval algorithm, expressed as:
[0105]
[0106] In the formula, ω is the weight coefficient, σ is the Sigmoid function, α and β are learnable parameters, and Complexity(q) is the query complexity;
[0107] The query complexity is calculated using the syntax tree depth and entity density, expressed as follows:
[0108]
[0109] In the formula, Depth(q) is the syntactic tree depth, EntityCount(q) is the entity density, and Length(q) is the query text. q The length of the text;
[0110] S44. Based on the BM25 keyword matching algorithm, retrieve the question-and-answer knowledge base, calculate the matching score between the query text and the text fragment, retrieve the question-and-answer knowledge base according to the similarity retrieval algorithm, and calculate the cosine similarity between the semantic enhancement vector and the multimodal vector; according to the dynamic weights of the BM25 keyword matching algorithm and the similarity retrieval algorithm, calculate the sum of the matching score and the cosine similarity score, sort according to the final sum score, and obtain the text fragment with the highest score as the retrieval result. The expression is:
[0111] .
[0112] In the formula, Sim(q,d) is the sum of the matching score and the cosine similarity score, BM25(q,d) is the matching score calculated based on the BM25 algorithm, and cos(E(q),E(d)) is the cosine similarity score calculated by the similarity retrieval algorithm.
[0113] It should be noted that the query structure complexity is quantified based on Stanford CoreNLP syntactic analysis. A query complexity model is constructed by combining entity density and text length. A learnable Sigmoid function is introduced to dynamically adjust the weights of BM25 and semantic similarity, enabling the system to adaptively allocate algorithm weights according to query characteristics, avoiding matching bias caused by traditional static weights, and improving the recall and accuracy of retrieval.
[0114] In addition, step S43 includes randomly selecting 1000 queries from historical queries, manually labeling them to determine the optimal weight combination for each query, randomly generating 50 sets of weight parameters, calculating the mean absolute error of each set of parameters on the validation set to evaluate model performance, and dynamically adjusting the learnable parameters α and β until the error change is less than 0.0001 or 200 iterations are reached. The expression is:
[0115]
[0116] In the formula, N The total number of samples in the validation set, For the first i Predicted weights of the sample For the first i The true weights of the samples are determined by a grid search on the validation set.
[0117] Furthermore, during the retrieval process, the dimension of the original vector is... v ∈R D Divide into multiple sub-vector matrices uniformly For each sub-vector matrix, k-means clustering is performed on the sample data to generate multiple cluster center vectors. In the formula, c i,k For the first i The codebook for each subspace contains k There are 1 cluster center vector; during compressed storage, the index of the nearest neighbor cluster center vector in the corresponding codebook is recorded for each sub-vector. During inspection, the approximate distance is calculated by looking up a table, and the expression is:
[0118]
[0119] In the formula, q i For the first i Each query semantic vector, c i,idx(vi) for v i The corresponding cluster center vector;
[0120] In this embodiment, product quantization technology is used to decompose a high-dimensional vector into multiple low-dimensional subspaces, independently cluster them to generate codebooks, and record the most recent codeword index, thereby achieving efficient vector compression and fast retrieval.
[0121] S5 concatenates the query text and the highest-scoring text segment, encodes the concatenated text using a pre-trained language model, and outputs the semantic vector of each word, thus obtaining the semantic vector matrix of the query text and the semantic vector matrix of the document.
[0122] Step S5 in this embodiment includes the following sub-steps:
[0123] S51, the user query text and the highest-scoring text fragment are concatenated in the following format, and the pre-trained RoBERTa-large model is used to encode the concatenated text, outputting the semantic vector of each word;
[0124] S52, determine whether the concatenated text words exceed the maximum input length of the RoBERTa-large model. If they do, retain the first and last words according to half of the maximum input length, and truncate the middle part.
[0125] S53, construct a semantic vector matrix for the query text based on the semantic vectors of the words corresponding to the query text, and construct a semantic vector matrix for the document based on the semantic vectors of the words corresponding to the text fragments;
[0126] S54. If there are multiple retrieved text fragments, perform mean pooling on the semantic vector matrix of each text fragment to obtain a comprehensive document semantic vector.
[0127] It should be noted that the RoBERTa-large pre-trained model is used, with a maximum input length of 512 words. The ultra-long text truncation strategy is to retain the first and last 256 words and then perform joint encoding.
[0128] S6. Based on the query text semantic vector matrix and the document semantic vector matrix, an attention mechanism is used to obtain the attention weight of each word in the query text and text fragments, and the query text semantic vector matrix and the document semantic vector matrix are fused to obtain the fused semantic vector.
[0129] In this embodiment, step S6 includes the following sub-steps:
[0130] S61, based on the semantic vector matrix of the query text and the semantic vector matrix of the document, the attention weight of each word in the query text and text fragment is calculated using an attention mechanism, expressed as follows:
[0131]
[0132] In the formula, Q To query the text encoding, K / V Encode text fragments;
[0133] S62, the query text semantic vector matrix and the document semantic vector matrix are fused, and the contribution ratio of the query text and text fragments is dynamically adjusted through the second gating coefficient to calculate and generate the fused semantic vector, the expression of which is:
[0134]
[0135] In the formula, H final To fuse semantic vectors, g is a learnable second gating coefficient that dynamically controls the contribution ratio of query text and text fragments. H query To query the text semantic vector matrix, H doc This is the document semantic vector matrix.
[0136] It should be noted that by using attention mechanisms and gating fusion strategies, deep semantic interaction and dynamic weight allocation between query text and document text fragments are achieved, which improves the accuracy of semantic matching, model adaptability and computational efficiency.
[0137] S7: Extract words from the text segment that have a frequency greater than the preset threshold and attention weight threshold, and generate natural language using the GPT-2 model based on the fused semantic vector. Add the extracted words to the natural language, use the beam search algorithm to generate fluent sentences, and output the natural language answer text.
[0138] In this embodiment, step S7 includes the following sub-steps:
[0139] The system uses preset word frequency and attention weight thresholds to filter out words in a text segment that exceed the preset word frequency and attention weight thresholds, and adds the filtered words to the generated vocabulary of GPT-2.
[0140] Based on the current context and the fused semantic vector, the generation probability of each word is calculated using the GPT-2 model;
[0141] The probability of copying each word is calculated based on attention weight and word frequency;
[0142] The candidate probability of each word is obtained by weighting and calculating the generation and duplication probabilities of each word; the expression is:
[0143]
[0144] In the formula, P final (ω) represents the overall generation probability of the word ω by the model, and λ1 is the copy probability weight coefficient. P copy (ω) represents the replication probability. P generate (ω) represents the generation probability.
[0145] Based on the candidate probabilities of each word and using a beam search algorithm, the natural language answer is output.
[0146] It should be noted that by combining word frequency and attention weights in vocabulary selection, dynamically integrating generation and copying probabilities, and optimizing the beam search algorithm, the accuracy, coherence, and information richness of natural language answers have been improved.
[0147] This embodiment also includes support for multi-turn context-aware dialogue and adaptive optimization of retrieval and generation strategies based on historical interactions.
[0148] Set the initial window length W base This is retained as basic content;
[0149] Based on dialogue complexity and historical relevance, an expansion coefficient is calculated to dynamically adjust the final window size. The expression is as follows:
[0150]
[0151] Among them, W maxThis is the maximum allowed window length for the model, and Δ is the expansion step size. a This is the expansion factor.
[0152] Specifically, the initial settings retain the history of the last 3 rounds of dialogue, with a maximum length of 512 tokens per round. The memory capacity is implemented using a circular buffer to cover typical short dialogue scenarios.
[0153] Set extended trigger conditions: Entity density > 40%, calculated based on the ratio of the current number of entities to the total number of words; Topic jump frequency > 2 times, judged based on the similarity of topics between adjacent rounds < 0.5.
[0154] The system dynamically determines the number of historical dialogue rounds to retain by analyzing the complexity and topic consistency of the dialogue. The expansion rate coefficient is used to control the sensitivity of window expansion. Experience shows that 0.6 can balance response speed and information integrity. The cumulative entity count counts the number of domain entities mentioned in the current dialogue. The more entities, the more complex the topic. The topic similarity variance calculates the topic similarity between adjacent dialogue rounds. The larger the variance, the stronger the topic jump.
[0155] Meanwhile, based on the expansion coefficient calculation results, the system adjusts the context window length according to the following strategies: the base window retains the most recent 3 rounds of dialogue by default to cover the context requirements of short dialogues; each expansion adds 2 rounds of historical dialogue to avoid frequent minor adjustments; and to prevent memory overload, a maximum of 8 rounds of dialogue are retained.
[0156] This solution significantly outperforms traditional solutions in accuracy and response efficiency in multi-turn dialogue scenarios. (See the performance comparison graph below.) Figure 3 The system significantly improves accuracy and coherence in multi-turn dialogue scenarios. Experiments show that the F1 score of the first-round response reaches 89.7%, and the accuracy of multi-turn dialogue is improved by 37.2%. At the same time, the response latency is reduced to less than 500ms through quantization compression and retrieval optimization. In scenarios such as smart government affairs, industrial diagnostics, and multilingual customer service, it demonstrates efficient knowledge iteration capabilities and strong robustness, with an error rate reduced by 62% compared to traditional solutions, and has broad prospects for commercial application.
[0157] Secondly, the present invention also provides a system for constructing a question-answering system based on a large language model, implemented using a method for constructing a question-answering system based on a large language model, including:
[0158] The knowledge base construction module is used to acquire multimodal data, build a question-and-answer knowledge base and a knowledge graph containing entity association triples, and dynamically update the question-and-answer knowledge base;
[0159] The data processing module is used to acquire the query text and perform vectorization processing on the query text and multimodal data respectively to generate corresponding query semantic vectors and multimodal vectors;
[0160] The semantic enhancement module is used to extract entities from the query text using the recognition model, extract triples associated with the entities from the knowledge graph, concatenate the query text with the triples and perform quantization processing to generate a query semantic enhancement vector.
[0161] The retrieval and matching module is used to search the question-and-answer knowledge base by combining keyword matching algorithm and similarity retrieval algorithm. It calculates the similarity and matching degree between the query semantic enhancement vector and the multimodal vector respectively, and performs dynamic weighted calculation on the similarity and matching degree to obtain the text fragment with the highest score.
[0162] The encoding processing module is used to concatenate the query text and the text segment with the highest score, encode the concatenated text using a pre-trained language model, and output the semantic vector of each word, thus obtaining the semantic vector matrix of the query text and the semantic vector matrix of the document.
[0163] The attention fusion module is used to obtain the attention weight of each word in the query text and text fragments through an attention mechanism based on the query text semantic vector matrix and the document semantic vector matrix, and then fuse the query text semantic vector matrix and the document semantic vector matrix to obtain the fused semantic vector.
[0164] The generation module is used to obtain words in the text fragment that are greater than the preset word frequency and attention weight thresholds, and generate natural language based on the fused semantic vector using the GPT-2 model. The extracted words are then added to the natural language, and a beam search algorithm is used to generate fluent sentences, outputting the natural language answer text.
[0165] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.
[0166] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working process of the system and modules described above can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.
[0167] In the embodiments provided by this invention, it should be understood that the disclosed systems and methods can be implemented in other ways. For example, the device embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.
[0168] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0169] In addition, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.
[0170] If a function is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, ROM, RAM, magnetic disks, or optical disks.
[0171] Furthermore, it should be noted that in the system and method of the present invention, it is obvious that the components or steps can be decomposed and / or recombined. These decompositions and / or recombinations should be considered equivalent solutions of the present invention. Moreover, the steps performing the above series of processes can naturally be executed in the order described, but are not necessarily required to be executed in chronological order; some steps can be executed in parallel or independently of each other. Those skilled in the art will understand that all or any step or component of the method and apparatus of the present invention can be implemented in any computing device (including processors, storage media, etc.) or network of computing devices, in hardware, firmware, software, or a combination thereof. This is something that those skilled in the art can achieve by using their basic programming skills after reading the description of the present invention.
[0172] Therefore, the object of the present invention can also be achieved by running a program or a set of programs on any computing system. The computing system can be a known general-purpose system. Therefore, the object of the present invention can also be achieved simply by providing a program product containing program code for implementing the method or apparatus. That is, such a program product also constitutes the present invention, and the storage medium storing such a program product also constitutes the present invention. Obviously, the storage medium can be any known storage medium or any storage medium developed in the future. It should also be noted that in the apparatus and method of the present invention, it is obvious that the components or steps can be decomposed and / or recombined. These decompositions and / or recombinations should be considered equivalent to the present invention. Furthermore, the steps for performing the above series of processes can naturally be performed in the order described, but are not necessarily required to be performed in chronological order. Some steps can be performed in parallel or independently of each other.
[0173] The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A method for constructing a question-answering system based on a large language model, characterized in that, Includes the following steps: S1. Acquire multimodal data, construct a question-and-answer knowledge base and knowledge graph, and dynamically update the question-and-answer knowledge base; S2, obtain the query text, and perform vectorization processing on the query text and multimodal data respectively to generate the corresponding query semantic vector and multimodal vector; S3 uses a recognition model to extract entities from the query text and extracts triples associated with the entities from the knowledge graph. The query text and triples are concatenated and quantized to generate a query semantic enhancement vector. S4. Combine keyword matching algorithm and similarity retrieval algorithm to retrieve the question-answering knowledge base, calculate the similarity and matching degree of the query semantic enhancement vector and multimodal vector respectively, and perform dynamic weighted calculation on the similarity and matching degree to obtain the text fragment with the highest score; Includes the following sub-steps: S41 uses the Stanford CoreNLP tool to perform dependency parsing on the query text, generates a syntax tree, and calculates its maximum syntax depth. S42, obtain the ratio between the number of entity words in the query text and the total number of entities in the knowledge graph to obtain the entity density; S43, Initialize the weight ratio of the BM25 keyword matching algorithm and the similarity retrieval algorithm. Calculate the dynamic weights of the BM25 keyword matching algorithm and the similarity retrieval algorithm based on the maximum syntactic depth, entity density, and query complexity of the current query text. The query complexity is calculated using the syntactic tree depth and entity density. S44. Based on the BM25 keyword matching algorithm, retrieve the question-and-answer knowledge base, calculate the matching score between the query text and the text fragment, retrieve the question-and-answer knowledge base according to the similarity retrieval algorithm, and calculate the cosine similarity between the semantic enhancement vector and the multimodal vector; according to the dynamic weights of the BM25 keyword matching algorithm and the similarity retrieval algorithm, calculate the sum score of the matching score and the cosine similarity, sort according to the final sum score, and obtain the text fragment with the highest score as the retrieval result; S5 concatenates the query text and the highest-scoring text segment, encodes the concatenated text using a pre-trained language model, and outputs the semantic vector of each word, thus obtaining the semantic vector matrix of the query text and the semantic vector matrix of the document. S6. Based on the query text semantic vector matrix and the document semantic vector matrix, an attention mechanism is used to obtain the attention weight of each word in the query text and text fragments, and the query text semantic vector matrix and the document semantic vector matrix are fused to obtain the fused semantic vector. S7: Extract words from the text segment that have a frequency greater than the preset threshold and attention weight threshold, and generate natural language using the GPT-2 model based on the fused semantic vector. Add the extracted words to the natural language, use the beam search algorithm to generate fluent sentences, and output the natural language answer text.
2. The method for constructing a question-answering system based on a large language model as described in claim 1, characterized in that: Step S1, which involves acquiring multimodal data, constructing a question-answering knowledge base and a knowledge graph containing entity-related triples, and dynamically updating the question-answering knowledge base, includes the following sub-steps: S11: Obtain multimodal data from documents, databases, API interfaces and web pages, clean the multimodal data to obtain standard sample data, build a question-answering knowledge base based on the standard sample data, and build a knowledge graph containing entity association triples based on entity-relationship-attribute; S12, dynamically update the question-and-answer knowledge base, determine whether the standard sample data has version update time information, if it does, obtain the corresponding updated data and data to be updated according to the update time, locate the newly added content interval between the updated data and the data to be updated through binary search, update the version difference segment, and obtain the updated standard sample data. If it does not exist, the data blocks are divided and hash values are calculated for the updated data and the data to be updated. The difference blocks between the updated data and the data to be updated are filtered, and the semantic similarity of the difference blocks is calculated and compared to obtain the similarity difference value. A preset similarity update threshold is set. When the similarity difference value is greater than the similarity update threshold, the difference blocks of the data to be updated are updated.
3. The method for constructing a question-answering system based on a large language model as described in claim 2, characterized in that: Step S2, which involves obtaining the query text and vectorizing both the query text and the multimodal data to generate corresponding query semantic vectors and multimodal vectors, includes the following steps: S21, retrieve the query text and text-image-table multimodal data; S22, a pre-trained BERT language model is used to vectorize the text data and query text, generating text semantic vectors from the text data and query semantic vectors from the query text; S23 uses a pre-trained ResNet visual model to vectorize image data and extract image features to generate image feature vectors. S24: Identify table data boundaries and cells, determine header rows and data rows; establish a row and column index structure, embed and combine the header and cell content respectively, and obtain the table semantic vector.
4. The method for constructing a question-answering system based on a large language model as described in claim 3, characterized in that: Step S3, which involves using a recognition model to extract entities from the query text and extracting triples associated with those entities from the knowledge graph, concatenating the query text with the triples, and performing quantization to generate a query semantic enhancement vector, includes the following steps: S31, Use the BLINK model to identify and obtain entities in the query text, link the entities to the knowledge graph, and extract the triples associated with the entities from the knowledge graph; S32, concatenate the extracted triples, and input the concatenated text into the pre-trained BERT language model to generate triple vectors; S33, by combining the triplet vector with the query semantic vector through the MLP multilayer perceptron and the first gating coefficient, the query semantic enhancement vector is output; S34. Calculate the cosine similarity between the query semantic vector and the triple vector to obtain the relevance score, and dynamically adjust the first gating coefficient based on the relevance score.
5. The method for constructing a question-answering system based on a large language model as described in claim 4, characterized in that: Step S5 involves concatenating the query text and the highest-scoring text segment, encoding the concatenated text using a pre-trained language model, and outputting the semantic vector of each word to obtain the semantic vector matrix of the query text and the semantic vector matrix of the document. This includes the following sub-steps: S51, the user query text and the highest-scoring text fragment are concatenated in the following format, and the pre-trained RoBERTa-large model is used to encode the concatenated text, outputting the semantic vector of each word; S52, determine whether the concatenated text words exceed the maximum input length of the RoBERTa-large model. If they do, retain the first and last words according to half of the maximum input length, and truncate the middle part. S53, construct a semantic vector matrix for the query text based on the semantic vectors of the words corresponding to the query text, and construct a semantic vector matrix for the document based on the semantic vectors of the words corresponding to the text fragments; S54. If there are multiple retrieved text fragments, perform mean pooling on the semantic vector matrix of each text fragment to obtain a comprehensive document semantic vector.
6. The method for constructing a question-answering system based on a large language model as described in claim 5, characterized in that: Step S6, which involves obtaining the attention weight of each word in the query text and text fragments through an attention mechanism based on the query text semantic vector matrix and the document semantic vector matrix, and then fusing the query text semantic vector matrix and the document semantic vector matrix to obtain a fused semantic vector, includes the following sub-steps: S61, Based on the semantic vector matrix of the query text and the semantic vector matrix of the document, calculate the attention weight of each word in the query text and text fragment through an attention mechanism; S62, the semantic vector matrix of the query text and the semantic vector matrix of the document are fused, and the contribution ratio of the query text and the text fragment is dynamically adjusted by the second gating coefficient to calculate and generate the fused semantic vector.
7. The method for constructing a question-answering system based on a large language model as described in claim 6, characterized in that: Step S7, which involves obtaining words in a text segment that have a frequency greater than a preset threshold and a focus weight threshold, generating natural language based on the fused semantic vector using the GPT-2 model, adding the extracted words to the natural language, generating fluent sentences using a beam search algorithm, and outputting the natural language answer text, includes the following sub-steps: The system uses preset word frequency and attention weight thresholds to filter out words in a text segment that exceed the preset word frequency and attention weight thresholds, and adds the filtered words to the generated vocabulary of GPT-2. Based on the current context and the fused semantic vector, the generation probability of each word is calculated using the GPT-2 model; The probability of copying each word is calculated based on attention weight and word frequency; The candidate probabilities of each word are obtained by calculating the generation probability and duplication probability of each word using a weighted average. Based on the candidate probabilities of each word and using a beam search algorithm, the natural language answer is output.
8. A system for constructing a question-answering system based on a large language model, implemented using the method for constructing a question-answering system based on a large language model as described in any one of claims 1 to 7, characterized in that, include: The knowledge base construction module is used to acquire multimodal data, build a question-and-answer knowledge base and a knowledge graph containing entity association triples, and dynamically update the question-and-answer knowledge base; The data processing module is used to acquire the query text and perform vectorization processing on the query text and multimodal data respectively to generate corresponding query semantic vectors and multimodal vectors; The semantic enhancement module is used to extract entities from the query text using the recognition model, extract triples associated with the entities from the knowledge graph, concatenate the query text with the triples and perform quantization processing to generate a query semantic enhancement vector. The retrieval and matching module is used to search the question-and-answer knowledge base by combining keyword matching algorithm and similarity retrieval algorithm. It calculates the similarity and matching degree between the query semantic enhancement vector and the multimodal vector respectively, and performs dynamic weighted calculation on the similarity and matching degree to obtain the text fragment with the highest score. The encoding processing module is used to concatenate the query text and the text segment with the highest score, encode the concatenated text using a pre-trained language model, and output the semantic vector of each word, thus obtaining the semantic vector matrix of the query text and the semantic vector matrix of the document. The attention fusion module is used to obtain the attention weight of each word in the query text and text fragments through an attention mechanism based on the query text semantic vector matrix and the document semantic vector matrix, and then fuse the query text semantic vector matrix and the document semantic vector matrix to obtain the fused semantic vector. The generation module is used to obtain words in the text fragment that are greater than the preset word frequency and attention weight thresholds, and generate natural language based on the fused semantic vector using the GPT-2 model. The extracted words are then added to the natural language, and a beam search algorithm is used to generate fluent sentences, outputting the natural language answer text.
9. A computer-readable storage medium, characterized in that, The storage medium stores a method program for constructing a question-answering system based on a large language model. When the method program is executed, it implements the method for constructing a question-answering system based on a large language model as described in any one of claims 1 to 7.