Dynamic retrieval enhancement generation method based on key path entropy increase trend and document selection
By optimizing the retrieval process of dynamic RAG technology through the critical path entropy increase trend triggering mechanism and document selection model, the problems of lag and noise interference in the generation process are solved, and efficient and accurate answer generation is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HANGZHOU DIANZI UNIV
- Filing Date
- 2026-03-11
- Publication Date
- 2026-06-19
AI Technical Summary
Existing dynamic RAG technology suffers from retrieval trigger lag and noise interference during the generation process, resulting in decreased generation quality and wasted computing resources. It also lacks an effective document selection mechanism, making it difficult to meet the needs of dynamic knowledge.
By employing a critical path entropy increase trend triggering mechanism and a document selection model based on direct preference optimization, the critical path is formed by identifying semantic contribution tokens, dynamically monitoring entropy increase trends and finely filtering external knowledge, thereby optimizing the retrieval triggering and document selection process.
It improves the noise resistance and timeliness of retrieval triggers, reduces invalid retrieval calls, enhances the accuracy and stability of generated answers, and reduces noise interference and computational resource waste.
Smart Images

Figure CN122240769A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of natural language processing technology, and in particular relates to a dynamic retrieval enhancement generation method based on the entropy increase trend of critical paths and document selection. Background Technology
[0002] In recent years, large-scale language models (LLMs) based on Transformers have made groundbreaking progress in natural language processing tasks. However, due to the inherent time delay and coverage limits of the parameterized knowledge within the model, LLMs are prone to generating fluent but erroneous information that contradicts objective facts when dealing with knowledge-intensive tasks outside the training corpus, i.e., producing the so-called illusion phenomenon.
[0003] Retrieval-enhanced RAG generation is an effective solution to alleviate the aforementioned problems. Traditional static RAGs perform only one retrieval before generation. While effective in simple tasks, this single-retrieval model has significant limitations in multi-step reasoning or long text generation tasks. Specifically, as the generation process progresses, the information needs of LLM (Local Language Management) change dynamically, and a single retrieval cannot cover the constantly emerging knowledge gaps, greatly limiting the accuracy and logical coherence of the generated content.
[0004] To address the limitations of static RAGs, existing technologies have proposed Dynamic RAG generation, a technique that enhances LLM generation by introducing external knowledge multiple times as needed during the LLM generation process. This adapts to the real-time changing information requirements of the LLM. Current mainstream dynamic RAGs often employ a process of triggering retrieval with a single-point entropy threshold and directly concatenating Top-K documents into the LLM: during word-by-word generation, the entropy value of the token at the current position is monitored in real time. When the entropy value exceeds a preset threshold, an external retrieval is triggered, and the retrieved Top-K documents are then directly concatenated into the context to guide further writing.
[0005] However, existing dynamic RAG technology still has the following shortcomings: First, retrieval triggering mechanisms based on single-point thresholds inherently suffer from lag and noise resistance deficiencies. Existing technologies rely on the single-point entropy value of all tokens for judgment, making the system highly susceptible to numerical fluctuations in tokens that do not contribute semantically, such as stop words, connectors, or formatting symbols, leading to false triggers. More seriously, fixed threshold triggering often has a lag; when an anomaly is detected, the model has often already generated some erroneous content, making timely correction difficult. This makes it difficult for the timing of retrieval triggering to reflect the true information needs of LLM, resulting in ineffective consumption of computational resources. Furthermore, the lack of a document selection mechanism focused on generation quality means that directly concatenating Top-K documents places a significant burden on the model's context. Existing techniques generally employ a coarse-grained concatenation approach after retrieval, directly piecing together retrieved documents into the context without any filtering. This approach results in a large amount of redundant or even noisy information crowding out the valuable context window of the LLM and distracting the model's attention, thus significantly reducing generation quality and even inducing new illusions.
[0006] Therefore, in order to simultaneously meet the requirements of dynamic RAG for retrieval timeliness, trigger noise resistance, and high signal-to-noise ratio external knowledge injection, it is urgent to design a dynamic retrieval enhancement generation method based on the critical path entropy increase trend and document selection. Summary of the Invention
[0007] Based on the aforementioned shortcomings and deficiencies in the prior art, one of the objectives of this invention is to at least solve one or more of the aforementioned problems in the prior art. In other words, one of the objectives of this invention is to provide a dynamic retrieval enhancement generation method based on critical path entropy increase trend and document selection that meets one or more of the aforementioned requirements.
[0008] To achieve the above-mentioned objectives, the present invention adopts the following technical solution: A dynamic retrieval enhancement generation method based on critical path entropy increase trend and document selection includes the following steps: S1. Based on the initial question, use a large language model to generate the answer and obtain the initial token sequence; S2. Perform critical path filtering on the initial token sequence: identify tokens with semantic contributions in the initial token sequence and calculate the attention weights of tokens with semantic contributions; select tokens with semantic contributions whose attention weights meet the threshold conditions and form the critical path according to their generation order. S3. Calculate the entropy increase trend index based on the entropy value change of the token on the critical path, and determine whether external knowledge retrieval is triggered based on the entropy increase trend index; if so, proceed to step S4. S4. Interrupt the generation of the large language model and truncate the trigger position and the content generated thereafter; then construct a retrieval query based on the preserved context, recall a set of candidate documents from the external knowledge base, and use a pre-trained document selection model to filter the set of candidate documents; select a subset of documents as external knowledge input to the large language model, and guide the large language model to continue generating answers based on the external knowledge. S5. Repeat steps S2 to S4 to continuously detect the entropy increase trend of the critical path in the subsequently generated content until the large language model outputs the end marker, and completes the answer output.
[0009] As a preferred embodiment, in step S2, the critical path selection is as follows: When satisfied and At that time, the first Tokens This is designated as the critical path token and added to the critical path token sequence in the order of its generation. ; in, For the currently generated number One Token; For semantic importance markers, Indicates the first Each token has a semantic contribution, calculated using the following formula: ; Where S is the set of stop words; For position The maximum attention weight of a token to its preceding tokens is calculated using the following formula: ; in, Indicates the generation of the first The token is paired with the first... Attention weight of each token; The sliding window is used to generate the t-th token. Depending on the generation location Slide to update, including up to the [number]th [number]. The most recent token One Token; For window The average attention weight threshold within the range is calculated using the following formula: .
[0010] As a preferred embodiment, in step S3, the entropy increase trend index is calculated based on the first-order entropy difference of the critical path token sequence, as shown in the following formula: ; in, Current window Indicators of entropy increase trend within; Current window The number of critical path tokens; when the number of critical path tokens At that time, the entropy increase trend indicator is not calculated, external knowledge retrieval is not triggered, and subsequent tokens continue to be generated. To correct the linear unit, For time decay weight, It represents the first-order entropy difference between adjacent nodes on the critical path.
[0011] As a preferred embodiment, the first-order entropy difference The calculation formula is as follows: ; in, and These are two adjacent token indices on the critical path; Token entropy The calculation formula is as follows: ; in, Indicates the location Token The probability distribution, This is a vocabulary list.
[0012] As a preferred embodiment, in step S4, the formula for constructing the retrieval query is as follows: ; in, For retrieval queries, For the initial problem, The key path token sequence that is retained after truncation; This indicates a sequence concatenation operation.
[0013] As a preferred embodiment, in step S4, the document selection model is pre-trained through the following steps: Constructing preference pairs for training data ;in, For the initial problem, For the positive sample set, For the negative sample set; The process of constructing the positive and negative sample sets is as follows: Using a large language model (LLM) as the teacher model, given an initial problem... and candidate document set Under these conditions, the teacher model autonomously selects different subsets of documents. And generate the corresponding answer. Multiple reasoning paths were obtained. And calculate the overall score for each reasoning path. The document set corresponding to the reasoning path with the highest overall score is selected as the positive sample. The document set corresponding to the reasoning path with the lowest overall score is selected as the negative sample. The model is trained using the Direct Preference Optimization (DPO) loss function to enable model selection. The probability is higher than .
[0014] As a preferred embodiment, the DPO loss function is expressed as: ; in, It is the document selection model strategy to be trained. It is a reference model strategy. It is a hyperparameter that controls the intensity of preference. It is the Sigmoid function; The input is conditional and includes the initial question. With candidate document set ).
[0015] As a preferred option, the comprehensive score The calculation formula is as follows: ; in, This represents the average value. Indicating the reasoning path Factual consistency score, Indicating the reasoning path semantic similarity score, Indicating the reasoning path The quality score of the generated data.
[0016] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the dynamic retrieval enhancement generation method as described in any of the preceding embodiments.
[0017] The present invention also provides a computer-readable storage medium storing instructions that, when executed on a computer, cause the computer to perform the dynamic retrieval enhancement generation method as described in any of the preceding embodiments.
[0018] Compared with the prior art, the beneficial effects of this invention are: (1) The retrieval triggering mechanism has high noise resistance and high timeliness. The first-order entropy increase trend triggering mechanism of the critical path proposed in this invention optimizes the full token single-point numerical monitoring of the existing technology into trend monitoring of key road tokens, which effectively suppresses false triggering caused by noise fluctuations of stop words and non-critical positions, and alleviates the retrieval triggering lag caused by single-point threshold triggering. Thus, unnecessary retrieval calls and inference overhead are reduced while ensuring the generation quality, and on-demand retrieval is realized. (2) The document selection mechanism has high signal-to-noise ratio and high accuracy. The document selection model DDSM based on Direct Preference Optimization (DPO) proposed in this invention performs refined screening and injection of the retrieved document set. It can select a subset of documents that are more conducive to improving the generation quality from the document set and remove irrelevant and redundant content. Compared with the coarse strategy of directly splicing the Top-K documents into the input without discrimination in the existing technology, this invention can significantly reduce the invalid occupation of the context window by redundant documents and reduce noise interference, thereby improving the accuracy and stability of the generated answer. Attached Figure Description
[0019] Figure 1 This is a flowchart of the dynamic retrieval enhancement generation method based on the key path entropy increase trend and document selection in an embodiment of the present invention.
[0020] Figure 2 This is a flowchart of the document selection model training process according to an embodiment of the present invention. Detailed Implementation
[0021] To more clearly illustrate the embodiments of the present invention, specific implementation methods will be described below with reference to the accompanying drawings. Obviously, the drawings described below are merely some embodiments of the present invention. For those skilled in the art, other drawings and other implementation methods can be obtained based on these drawings without any creative effort.
[0022] This invention addresses the technical shortcomings of existing dynamic RAG technology in terms of the timeliness and noise resistance of retrieval triggering, as well as the signal-to-noise ratio and efficiency of document utilization. In order to simultaneously meet the needs of large language models for on-demand retrieval and high-precision knowledge introduction, this invention creatively proposes a dynamic retrieval enhancement generation method based on the entropy increase trend of the critical path and document selection.
[0023] The innovations of this invention are as follows: First, it proposes a first-order entropy increase trend triggering mechanism for critical paths, optimizing the existing single-point numerical monitoring of all tokens into entropy increase trend monitoring of key path words. By dynamically determining the uncertainty increment, it eliminates false triggers caused by noise fluctuations in non-critical positions, solving the retrieval response lag problem caused by traditional single-point threshold triggering. Second, it proposes a document selection model based on Direct Preference Optimization (DPO), optimizing the traditional coarse-grained concatenation of Top-K documents into a fine-grained filtering mechanism based on generation quality preferences. By accurately filtering out irrelevant and redundant content, it avoids the ineffective occupation of the context window by redundant information, solving the problems of reduced generation quality and wasted computational resources caused by noise interference.
[0024] Specifically, the dynamic retrieval enhancement generation method based on the critical path entropy increase trend and document selection includes: S0 (Offline Phase): Construct the dataset and train the document selection model DDSM. Before performing step S1, the document selection model is pre-trained to filter documents that contain the information required by the model. The training process specifically includes: S01. Construct a basic dataset containing various instruction-response pairs. Utilize a large number of parameters in an LLM (Low-Level Model) as the teacher model, tailored to each problem. and the collection of retrieved documents Based on preset selection instructions, multiple different reasoning paths are generated. Each reasoning path Includes input issues The document subset selected autonomously by the model And the answers generated based on this subset The formula is as follows: ; in, ; S02, For each reasoning path generated by the teacher model Conduct a multi-dimensional quality assessment and calculate its overall score. The formula is as follows: ; in, This represents the average value; Indicating the reasoning path The factual consistency score can be achieved, for example, using exact matching (EM), F1, consistency discriminant models, or consistency scoring based on retrieved evidence. Indicating the reasoning path The semantic similarity score can be calculated, for example, using a sentence vector semantic similarity model (such as a Sentence-BERT-like encoder). Indicating the reasoning path The quality score of the generated answers can be automatically scored (e.g., 0–10 points) by a large language model based on preset scoring prompts. Based on the overall score, preference pairs are constructed for each question. ,in This is the set of documents corresponding to the highest-scoring reasoning path. The set of documents corresponding to the reasoning path with the lowest score; S03, Utilize the constructed preferences on the data Train the DDSM. Use the Direct Preference Optimization (DPO) loss function to optimize the model parameters for a given problem. and document collection In the case of output selection The probability is higher than After training, DDSM is able to select a subset of documents from the document set that can improve the quality of the responses based on the currently generated context.
[0025] S1. Based on the initial question, use a large language model to generate the answer and obtain the initial token sequence; S2. Perform critical path filtering on the currently generated token sequence: First, identify tokens that contribute semantically to the sequence and calculate the attention weight of the tokens that contribute semantically to the sequence; select tokens that contribute semantically to the sequence whose attention weights meet the threshold condition and form the critical path according to their generation order. S3. Calculate the entropy increase trend index based on the entropy value change of the Token on the critical path, and determine whether to trigger external knowledge retrieval (i.e. dynamic retrieval) based on the entropy increase trend. S4. If dynamic retrieval needs to be triggered, the generation of the large language model is interrupted, and the trigger position and the content generated thereafter are truncated; then, a retrieval query is constructed based on the preserved context to recall the document set from the external knowledge base, and the document set is filtered using a pre-trained document selection model; a subset of documents is selected as external knowledge input to the large language model, guiding the large language model to continue generating answers based on the external knowledge. S5. Repeat steps S2 to S4 to continuously detect the entropy increase trend of the subsequently generated content until the large language model outputs the end marker, thus completing the output of the final answer.
[0026] Specifically, to determine whether dynamic retrieval needs to be triggered, the system uses a sliding window... Internal critical path filtering and entropy increase trend detection: obtaining the window The token sequence and its corresponding attention weight matrix within the calculation window; Average attention weight threshold of internal tokens Filter window The key path tokens within the current window form a key path token sequence. ;based on Calculate the entropy increase trend index .when If the external knowledge retrieval is triggered, then the generation of subsequent tokens continues; otherwise, the trend threshold is used. It can be a fixed threshold or an adaptive threshold; the adaptive threshold can be based on the most recent The mean and standard deviation of the entropy increase trend index for each sliding window are determined.
[0027] The above entropy increase trend indicator is calculated based on the first-order entropy difference of the critical path token sequence, as shown in the following formula: ; in, Current window Indicators of entropy increase trend within; Current window The number of critical path tokens; when the number of critical path tokens At that time, the entropy increase trend indicator is not calculated, and subsequent tokens continue to be generated; To correct the linear unit, As a time decay weight; The first-order entropy difference between adjacent nodes on the critical path is given by the following formula: ; Furthermore, the token entropy value at each generation location. The formula is as follows: ; in, Indicates the location Token The probability distribution, It is a vocabulary list; Specifically, the filtering criteria for critical path tokens are expressed as follows: When satisfied and At that time, the first Tokens Add to the critical path token sequence in the order of generation. This gives you the current window. Key path token sequence within ; in, For semantic importance markers, Indicates position The maximum attention weight of a token to its preceding tokens is calculated using the following formula: ; in, Indicates the generation of the first The token is paired with the first... Attention weight of each token; Indicates generation up to the th The current sliding window when there are Tokens, the sliding window Depending on the generation location Slide to update, including up to the [number]th [number]. The most recent token Each Token The average attention weight of each token within the window is represented by the following formula: ; The semantic importance identifier The formula is as follows: ; in, This is a set of stop words.
[0028] The above truncation and regeneration process includes: determining the location of the token that triggers dynamic retrieval. The content following it is truncated, a search query is constructed to retrieve documents, and a document selection model is used to filter the retrieved documents, resulting in a set of filtered documents. It is then fed into the LLM context to guide the LLM in generating further answers.
[0029] The following specific embodiments provide a detailed description of the dynamic retrieval enhancement generation method of the present invention.
[0030] like Figure 1 As shown, the dynamic retrieval enhancement generation method based on critical path entropy increase trend and document selection in this embodiment of the invention includes steps T1 to T5: Step T1: Based on the initial question, generate the answer using a large language model. The specific steps are as follows: T11. First, the initial question based on user input. The system invokes a large language model based on Transformer to perform autoregressive generation without introducing external retrieval knowledge, resulting in an initial token sequence. ; T12, the token sequence The input adaptive retrieval trigger module serves as the contextual basis for subsequent critical path filtering and entropy increase trend monitoring, and is used for subsequent trigger determination based on the entropy increase trend of the critical path.
[0031] Step T2: For the token sequence Critical path selection is performed as follows: First, semantically contributing tokens are identified in the sequence, and their attention weights are calculated. Semantically contributing tokens whose attention weights meet a threshold condition are selected, and their critical paths are constructed in their generation order. The specific steps are as follows: T21. Filter words that contribute semantically. To eliminate the interference of words without semantic contribution on entropy monitoring, the system filters each generated token. Assign a semantic importance tag If the Token belongs to a preset set of stop words (such as conjunctions, modal particles, etc.), it is marked as 0; otherwise, it is marked as 1. The semantic contribution formula is as follows: ; in, It is a set of stop words (e.g., "um," "ba," and English articles and prepositions, which have no obvious semantic meaning). T22. Calculate attention weights. Quantify the importance of a token to the context using the self-attention mechanism of a large language model. Specifically, for the token in the generated sequence... For each token, extract the attention distribution of that token with respect to all its preceding tokens in the last layer of the LLM, and define the maximum attention value as the attention weight at that position. The attention weight formula is as follows: ; in, Indicates the generation of the first The token is paired with the first... Attention weight of each token.
[0032] In one optional implementation, the attention weight matrix is the multi-head self-attention matrix of the last layer, and the aggregated attention matrix is obtained by averaging or summing the attention weights of each attention head; in another optional implementation, the attention matrices of the last K layers can be aggregated by weighting the layer weights. T23. Perform critical path filtering. This is done in conjunction with the semantic importance markers mentioned above. With attention weight This involves selecting tokens that both carry substantial semantic meaning and are significantly important within the context. Specifically, a sliding token window is set up for critical path filtering. In generating up to the th When using a token, the current window is defined as the closest. Each Token, i.e. ,in This is a preset window length parameter. It can be set to a fixed value, or based on the length of the currently generated sequence. Adaptive determination; one method of adaptive determination is: ,in The preset maximum window length, This is a scaling factor. This embodiment uses the average value as an example (in other optional implementations, the average value can also be used). with standard deviation linear combination As a dynamic threshold, where (For adjustment coefficients), using the current window The average value of the attention weights of each token is used as the dynamic threshold, as shown in the following formula: ; The filtering criteria for critical path tokens are then expressed as follows: When satisfied and At that time, the first Tokens Add to the critical path token sequence in the order of generation. ;in, This ensures that the token has a semantic contribution, and This step ensures that the token has received significant attention from the LLM. Through this step, the original token sequence generated by the large language model is transformed into a critical path sequence. , as input for step T3.
[0033] Step T3: Calculate the entropy increase trend index based on the entropy changes of the tokens on the critical path, and determine whether to trigger dynamic retrieval based on the entropy increase trend. The specific steps are as follows: T31. Calculate the Token entropy value. For the current window... Key path token sequence within For each token, calculate the entropy value of its predicted probability distribution. This is used to represent the uncertainty of the LLM at that location. The formula is as follows: ; in, Indicates the model at position Predicting Tokens The probability, For a vocabulary list; T32. Calculate the first-order entropy difference. Calculate the first-order entropy difference between adjacent nodes on the critical path.
[0034] ; in, and These are two adjacent token indices on the critical path; T33. Calculate the entropy increase trend index. Based on the current window. The key path token sequence within the data is used to calculate a weighted entropy increase trend indicator. This indicator only focuses on the increase in entropy, and the formula is as follows: ; in, To correct linear units, used to capture only the direction of entropy increase; This is a time decay weight, used to assign higher weight to recently generated tokens, making the system more sensitive to current uncertainties. T34, Retrieval Trigger Determination. The trend threshold... A fixed threshold can be used, determined from a validation set or historical runtime data; or an adaptive threshold can be used, based on the most recent... The mean of the entropy increase trend index of a sliding window with standard deviation Sure: ,in This represents the threshold adjustment coefficient. An adaptive threshold allows the triggering strategy to dynamically adjust according to task difficulty and generation stage. The calculated entropy increase trend index... With trend threshold If a comparison is made, If the current generation process does not require external knowledge, proceed to step T2 to process the next generated token; if If it is determined that the current generation requires external knowledge, step T4 is executed immediately to trigger external knowledge retrieval.
[0035] Step T4: If dynamic retrieval needs to be triggered, the generation of the LLM is interrupted, and the trigger point and the content generated thereafter are truncated. Then, a retrieval query is constructed based on the preserved context, a candidate document set is retrieved from an external knowledge base, and the trained document selection model is used to filter the candidate document set. Finally, the filtered document set is merged into the LLM context, guiding the LLM to continue generating answers from the truncation point. The specific steps are as follows: T41, Interrupt the LLM generation process. When If this occurs, immediately interrupt the LLM generation process and determine the trigger point. and will The generated content thereafter is truncated; retained The previously generated content serves as a context fragment; T42. Constructing a search query. Based on user input. Construct a retrieval query using the critical path token that is retained after truncation. Search query The construction formula is as follows: ; in, This represents a sequence concatenation operation. This represents the critical path token sequence that is retained after truncation. T43. Execute external search. The search query... Send to the search engine to retrieve relevant data from an external knowledge base. Related candidate document set And return the top results with the highest similarity scores. From the candidate documents, we obtain a candidate document set. ; T44. DDSM-based document selection. This method utilizes a document selection model (DDSM) pre-trained in the offline S0 phase to select candidate documents. Filtering is performed. Specifically, DDSM receives queries. and external candidate document set As input, the optimal document set is output. ; T45. Knowledge Input and Regeneration. The optimal document set... Input the preset prompt template into the LLM context and continue generating subsequent content from the cutoff point.
[0036] Step T5: Repeat steps T2 to T4 to continuously detect the critical path entropy increase trend of the recovered content until the large language model outputs the end marker EOS, thus completing the output of the final answer.
[0037] Specifically, the system enters a continuous loop monitoring state. During each regeneration process, the system updates the attention weight distribution and entropy trend within the sliding window in real time. If the entropy increase trend is detected to exceed the threshold again, step T4 will be triggered again for multiple rounds of retrieval; until the large language model outputs an end marker, the final answer is output.
[0038] like Figure 2 As shown, the construction and training methods of the document selection model DDSM are described in detail, including steps T01-T03: T01. First, collect a basic dataset containing various instruction-response pairs (e.g., Open-Instruct datasets and knowledge-intensive question-answering data). Then, utilize a high-performance teacher model (such as GPT-4 or GLM-4.5) for each input question. and the original document collection retrieved This generates multiple different inference paths. Each inference path... Includes the input question and a subset of documents selected autonomously by the model. And the answers generated based on that set , is represented as: ; in, This is the document combination that the teacher model considers necessary to answer the question; T02. To quantify the quality of each inference path, this embodiment designs a multi-dimensional comprehensive scoring function. This function takes into account factual consistency, semantic consistency, and generation quality, and its calculation formula is as follows: ; in, This represents the average value. The consistency score can be represented by, for example, exact matching (EM), F1, consistency discriminant models, or consistency scores based on retrieved evidence. The semantic similarity score can be calculated, for example, using a sentence vector semantic similarity model (such as a Sentence-BERT-like encoder). This represents the quality score, which can be automatically scored by a large language model based on preset scoring prompts (e.g., 0–10 points). Based on the overall score, preference pairs are constructed for each question. ,in This is the set of documents corresponding to the highest-scoring reasoning path. This is the set of documents corresponding to the reasoning path with the lowest score.
[0039] T03. Train the DDSM using the constructed preference pair data. This embodiment uses the Direct Preference Optimization (DPO) loss function to optimize the model parameters. To make it under given conditions Under the condition of selecting positive samples The probability of selecting a negative sample is significantly higher than that of selecting a negative sample. The probability of . The DPO loss function is expressed as: ; in, It is the strategy of the DDSM model to be trained. It is a reference model strategy. It is a hyperparameter that controls the intensity of preference. It is the Sigmoid function. For conditional input (including the initial question) With candidate document set ); Through the training described above, DDSM learned the document selection ability of the teacher model, enabling it to filter out a set of documents from a large number of documents that can improve the quality of responses.
[0040] The effectiveness of the above-mentioned critical path entropy increase trend and document selection dynamic retrieval enhancement generation method in this embodiment of the invention can be further verified by the following experiments: I. Experimental conditions; The computing hardware consisted of an Intel Xeon Silver 4210R processor and four NVIDIA GeForce RTX 3090 graphics cards; the operating system was Ubuntu 18.04. The deep learning framework used was PyTorch 2.0.1, and the computational acceleration framework was CUDA 11.8. Furthermore, to verify the effectiveness of this invention, existing dynamic RAG methods (such as DRAGIN) were selected as a baseline for evaluation.
[0041] II. Experiment Content; Experiment 1 evaluates the dynamic RAG and DRAGIN methods presented in this invention on the 2WikiMultiHopQA and HotpotQA datasets, respectively. For ease of reproduction and fair comparison, LLaMA2-7B-Chat was chosen as the unified basic model architecture. The basic generative model directly uses LLaMA2-7B-Chat, and the document preference selection model DDSM is also initialized based on LLaMA2-7B-Chat and trained using the aforementioned DPO method. The first 1000 samples from each dataset, selected according to the default dataset order, are used as the test set. EM and F1 scores are used as evaluation metrics, and the average number of searches (times / sample) is also calculated. The test results are shown in Table 1: On the 2WikiMultiHopQA dataset, the EM score of the method of this invention is 0.292 and the F1 score is 0.3678, while the EM score of the DRAGIN method is 0.220 and the F1 score is 0.2926; on the HotpotQA dataset, the EM score of the method of this invention is 0.290 and the F1 score is 0.3939, while the EM score of the DRAGIN method is 0.232 and the F1 score is 0.3344. As can be seen from Table 1, the method of this invention achieves higher EM and F1 scores on both datasets. Simultaneously, it reduces the average number of searches to 0.951 times / sample on 2WikiMultiHopQA and 1.077 times / sample on HotpotQA, respectively, exhibiting a lower search trigger frequency compared to the DRAGIN method, thus improving answer accuracy while reducing search overhead. This indicates that the present invention, through on-demand triggering based on the entropy increase trend of the critical path and the document selection mechanism, can reduce unnecessary searches and improve evidence matching, thereby achieving less searching and higher accuracy.
[0042] Table 1. Comparison of EM / F1 and average number of searches between the method of this invention and the DRAGIN method on 2WikiMultiHopQA and HotpotQA (top 1000 samples from each method). .
[0043] The above description is merely a detailed explanation of preferred embodiments and principles of the present invention. For those skilled in the art, there may be changes in specific implementation methods based on the ideas provided by the present invention, and these changes should also be considered within the scope of protection of the present invention.
Claims
1. A dynamic retrieval enhancement generation method based on critical path entropy increase trend and document selection, characterized in that, Includes the following steps: S1. Based on the initial question, use a large language model to generate the answer and obtain the initial token sequence; S2. Perform critical path filtering on the initial token sequence: identify tokens that contribute semantically to the initial token sequence and calculate the attention weights of tokens that contribute semantically. Select semantically contributing tokens whose attention weights meet the threshold condition, and construct the critical path according to their generation order. S3. Calculate the entropy increase trend index based on the entropy value change of Tokens on the critical path, and determine whether to trigger external knowledge retrieval based on the entropy increase trend index. If so, proceed to step S4; S4. Interrupt the generation of the large language model and truncate the trigger position and the content generated thereafter. Subsequently, a retrieval query is constructed based on the retained context, a candidate document set is retrieved from the external knowledge base, and a pre-trained document selection model is used to filter the candidate document set; A subset of documents is selected as external knowledge input to the large language model, guiding the large language model to continue generating answers based on the external knowledge. S5. Repeat steps S2 to S4 to continuously detect the entropy increase trend of the critical path in the subsequently generated content until the large language model outputs the end marker, and completes the answer output.
2. The dynamic retrieval enhancement generation method according to claim 1, characterized in that, In step S2, the critical path filtering is as follows: When satisfied and At that time, the first Tokens This is designated as the critical path token and added to the critical path token sequence in the order of its generation. ; in, For the currently generated number One Token; For semantic importance markers, Indicates the first Each token has a semantic contribution, calculated using the following formula: ; Where S is the set of stop words; For position The maximum attention weight of a token to its preceding tokens is calculated using the following formula: ; in, Indicates the generation of the first The token is paired with the first... Attention weight of each token; The sliding window is used to generate the t-th token. Depending on the generation location Slide to update, including up to the [number]th [number]. The most recent token One Token; For window The average attention weight threshold within the range is calculated using the following formula: 。 3. The dynamic retrieval enhancement generation method according to claim 2, characterized in that, In step S3, the entropy increase trend index is calculated based on the first-order entropy difference of the critical path token sequence, as shown in the following formula: ; in, For the current window Indicators of entropy increase trend within; For the current window The number of critical path tokens; when the number of critical path tokens At that time, the entropy increase trend indicator is not calculated, external knowledge retrieval is not triggered, and subsequent tokens continue to be generated. To correct the linear unit, For time decay weight, It represents the first-order entropy difference between adjacent nodes on the critical path.
4. The dynamic retrieval enhancement generation method according to claim 3, characterized in that, The first-order entropy difference The calculation formula is as follows: ; in, and These are two adjacent token indices on the critical path; Token entropy The calculation formula is as follows: ; in, Indicates the location Token The probability distribution, This is a vocabulary list.
5. The dynamic retrieval enhancement generation method according to claim 1, characterized in that, In step S4, the formula for constructing the retrieval query is as follows: ; in, For retrieval queries, For the initial problem, The key path token sequence that is retained after truncation; This indicates a sequence concatenation operation.
6. The dynamic retrieval enhancement generation method according to claim 1, characterized in that, In step S4, the document selection model is pre-trained through the following steps: Constructing preference pairs for training data ;in, For the initial problem, For the positive sample set, The set of negative samples; The process of constructing the positive and negative sample sets is as follows: Using a large language model (LLM) as the teacher model, given an initial problem... and candidate document set Under these conditions, the teacher model autonomously selects different subsets of documents. And generate the corresponding answer. Multiple reasoning paths were obtained. And calculate the overall score for each reasoning path. The document set corresponding to the reasoning path with the highest overall score is selected as the positive sample. The document set corresponding to the reasoning path with the lowest overall score is selected as the negative sample. The model is trained using the Direct Preference Optimization (DPO) loss function to enable model selection. The probability is higher than .
7. The dynamic retrieval enhancement generation method according to claim 6, characterized in that, The DPO loss function is expressed as follows: ; in, It is the document selection model strategy to be trained. It is a reference model strategy. It is a hyperparameter that controls the intensity of preference. It is the Sigmoid function; The input is conditional and includes the initial question. With candidate document set ).
8. The dynamic retrieval enhancement generation method according to claim 6, characterized in that, The overall score The calculation formula is as follows: ; in, This represents the average value. Indicating the reasoning path Factual consistency score, Indicating the reasoning path semantic similarity score, Indicating the reasoning path The quality score of the generated data.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the dynamic retrieval enhancement generation method as described in any one of claims 1-8.
10. A computer-readable storage medium storing instructions therein, characterized in that, When the instructions are executed on a computer, the computer performs the dynamic retrieval enhancement generation method as described in any one of claims 1-8.