Adaptive retrieval augmented generation method and system based on model knowledge boundary

By improving the prompt template and the adaptive reordering technique of the large language model reorderer, combined with two-step progressive distillation and data augmentation methods, the problems of large language model generation quality and limitations are solved, and efficient and low-cost adaptive retrieval augmentation generation is achieved, which is suitable for scenarios such as answering systems, content creation and dialogue generation.

CN118503354BActive Publication Date: 2026-06-19UNIV OF SCI & TECH OF CHINA

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
UNIV OF SCI & TECH OF CHINA
Filing Date
2024-05-09
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing retrieval enhancement generation techniques for large language models suffer from poor generation quality and significant limitations in terms of article recall, ranking, and adaptive retrieval. In particular, traditional methods cannot effectively utilize the knowledge boundaries of the model, resulting in low generation quality and high inference costs.

Method used

The algorithm organizes prompt words using an improved prompt template format and adaptively rearranges recall information using a pre-built large language model reorderer. It then distills the adaptive rearrangement capability from the closed-source model to the open-source model through a two-step progressive distillation method. Finally, it combines K-means and Min-max algorithms for data sampling and augmentation to optimize the training process of the target model.

Benefits of technology

It improves the quality and speed of generated results, reduces costs, enables plug-and-play functionality in various task scenarios, reduces the impact of retrieval noise, and enhances universality.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118503354B_ABST
    Figure CN118503354B_ABST
Patent Text Reader

Abstract

The application provides a model knowledge boundary-based adaptive retrieval enhancement generation method and system, and relates to the technical field of retrieval enhancement generation of large language models. The model knowledge boundary-based adaptive retrieval enhancement generation method firstly organizes prompt words based on the format of an improved prompt template, and simultaneously uses a pre-constructed large language model rearranger to adaptively rearrange recall information related to user prompts; then, the user prompts and the adaptively rearranged results are spliced and input into a target large language model to finally obtain a generation result. Compared with existing large language model retrieval enhancement generation technologies, the application has higher generation result quality, faster reasoning speed, lower cost, can realize various application scenarios, and has better universality.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of retrieval enhancement generation technology for large language models, specifically to an adaptive retrieval enhancement generation method and system based on model knowledge boundaries. Background Technology

[0002] Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval and text generation. It utilizes large pre-trained language models (LLMs) and adds a retrieval component to query a stored knowledge base to obtain additional information relevant to the input query (i.e., retrieved articles). The retrieved information, along with the initial input, is then used as context for the generation component, resulting in richer, more accurate, and informative output. This technique addresses the challenges of the large model illusion and continuous learning, and is widely used in scenarios such as answering systems, content creation, and dialogue generation.

[0003] However, existing retrieval enhancement generation techniques for large language models still have the following problems:

[0004] 1) In terms of reordering recalled articles, point-level relevance ranking methods (such as UPR) or column-level methods (such as RankVicuna and RankZephyr) are generally used. The former is efficient and simple but cannot obtain the intrinsic relationship between multiple articles well, while the latter can only return the ranking results of a fixed number of articles, which may introduce retrieval noise and affect the quality of RAG generation.

[0005] 2) In terms of adaptive retrieval of large language models, traditional adaptive retrieval techniques (such as AR) mainly judge and retrieve based on the high or low frequency of the question itself. These methods do not take into account the knowledge boundaries of the large model and have certain limitations. While methods such as FLARE and Self-RAG take into account the knowledge boundaries of the model itself, they have high inference costs or require a lot of prior knowledge, which has great limitations in practical applications.

[0006] Therefore, there is an urgent need to propose a better retrieval enhancement generation technique for large language models in order to solve at least one or more of the above problems. Summary of the Invention

[0007] (a) Technical problems to be solved

[0008] To address the shortcomings of existing technologies, this invention provides an adaptive retrieval enhancement generation method and system based on model knowledge boundaries, which solves the problems of poor generation quality and significant limitations of existing large language model retrieval enhancement generation technologies.

[0009] (II) Technical Solution

[0010] To achieve the above objectives, the present invention provides the following technical solution:

[0011] Firstly, this invention proposes an adaptive retrieval enhancement generation method based on model knowledge boundaries, which includes:

[0012] The prompt words are organized in a format based on an improved prompt template, while the recall information related to the user prompt is adaptively rearranged using a pre-built large language model reorderer; the large language model reorderer includes: a closed-source model with adaptive reordering capability or an open-source model trained on data labeled by a closed-source model with adaptive reordering capability.

[0013] The user prompts and the results of adaptive rearrangement are concatenated and input into the target large language model to obtain the generated results.

[0014] Preferably, training open-source models using data labeled with closed-source models that have adaptive reordering capabilities includes:

[0015] A two-step incremental distillation method is used to distill the closed-source model with adaptive rearrangement capability, and the open-source model is fine-tuned in two steps.

[0016] Among them, the closed-source model with adaptive rearrangement capability is GPT4; the open-source model is Mistral-7B.

[0017] Preferably, the two-step progressive distillation includes:

[0018] S121. Organize the user prompts and recall information according to the improved prompt template format and provide the prompt words to ChatGPT3.5, and output the first sorting result;

[0019] S122. Use the user prompts, recall information and the first ranking results together as training data to perform preliminary training and fine-tuning on the Mistral-7B model;

[0020] S123. Organize the data from some user prompts and recall information into prompt words according to the improved prompt template format and provide them to GPT4, and output the second sorting result;

[0021] S124. The user prompts, recall information, and second ranking results are used together as training data to perform secondary training and fine-tuning on the Mistral-7B model.

[0022] Preferably, the improved prompt template includes: adjusting the keywords in the prompt template to allow the model to adaptively output the number of articles recalled.

[0023] Preferably, before executing S123, the data in the user prompts and recall information are sampled using the K-means algorithm and the Min-max algorithm, respectively.

[0024] Preferably, the K-means algorithm and the Min-max algorithm are used to sample the data in user prompts and recall information, including:

[0025] First, obtain all query statements using OpenAI's text-embedding-ada-002 model;

[0026] All query statements are divided into different clusters based on the K-means algorithm, and uniform random sampling is performed within each cluster;

[0027] Based on the Min-max algorithm, a small number of samples are randomly sampled from the vector representation of the query statement as seeds, and then each time the sample with the smallest maximum inner product with the sampled samples is selected from the remaining samples.

[0028] Preferably, the labeled data is enhanced after S123 is executed and before S124 is executed.

[0029] Preferably, the enhancement method includes: shuffling the order of the recalled articles and the corresponding final ranking results in the labeled data, and adding data with high retrieval noise to the labeled dataset.

[0030] Preferably, training the target model in steps S122 and S124 includes:

[0031] The target model is trained using an autoregressive loss function as the optimization objective function, which is expressed by the following formula:

[0032]

[0033] In the formula, T is the training set of data generated by the teacher model; xi is the input prompt consisting of user queries, retrieved articles, and commands; t <j represents the target token sequence preceding position j; xi represents the input consisting of user queries, retrieved article paragraphs, and commands; ti represents the output corresponding to a single input (in this embodiment, the output is an adaptive sorting result); P θ (t i,j x i ,t <jThis indicates that, with the current parameter being θ, given the input xi and the currently inferred sequence t... <j The next token is deduced to be t. i,j The probability; i represents the current data item i; j represents the current inference to the current token j.

[0034] Secondly, the present invention also proposes an adaptive retrieval enhancement generation system based on model knowledge boundaries. The system includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the steps of the method described above.

[0035] (III) Beneficial Effects

[0036] This invention provides an adaptive retrieval enhancement generation method and system based on model knowledge boundaries. Compared with existing technologies, it has the following advantages:

[0037] This invention proposes an adaptive retrieval enhancement generation method and system based on model knowledge boundaries. It organizes prompt words using an improved prompt template format and adaptively rearranges recall information related to user prompts using a pre-built large language model reorderer. The user prompts and the adaptively rearranged results are then concatenated and input into a target large language model to obtain the generated results. Compared to existing large language model retrieval enhancement generation techniques, this adaptive retrieval enhancement generation technology based on model knowledge boundaries offers higher result quality, faster inference speed, lower cost, and wider applicability across various application scenarios. Attached Figure Description

[0038] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0039] Figure 1 This is a flowchart of an adaptive retrieval enhancement generation method based on model knowledge boundaries according to the present invention;

[0040] Figure 2 This is an embodiment of an adaptive retrieval enhancement generation method based on model knowledge boundaries according to the present invention;

[0041] Figure 3 This is a schematic diagram of the improved prompt template in an embodiment of the present invention;

[0042] Figure 4This is a data display diagram showing the experimental results of the method of this embodiment of the invention on open source models (alpaca-7b and mistral-instruct-7b);

[0043] Figure 5 This is a diagram showing the results of experiments conducted on ChatGPT using the method of this embodiment of the invention;

[0044] Figure 6 The figure shows the results of experiments conducted on the efficiency and cost of the method according to an embodiment of the present invention. Detailed Implementation

[0045] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are described clearly and completely. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0046] This application provides an adaptive retrieval enhancement generation method and system based on model knowledge boundaries, which solves the problems of poor generation quality and large limitations of existing large language model retrieval enhancement generation technologies. As a result, existing large language models can be plugged and used in various task scenarios without the need for further training and contextual knowledge after adding the large language model reorderer pre-built in this application, achieving efficient adaptive retrieval enhancement generation with zero samples.

[0047] To better understand the above technical solutions, the following will provide a detailed explanation of the technical solutions in conjunction with the accompanying drawings and specific implementation methods.

[0048] Furthermore, it should be noted that the adaptive retrieval enhancement generation method based on model knowledge boundaries proposed in this application can solve the problems of language large model illusion and continuous learning, and its application includes, but is not limited to, scenarios such as answering systems, content creation, and dialogue generation. However, for the sake of facilitating the explanation and illustration of the adaptive retrieval enhancement generation method based on model knowledge boundaries proposed in this application, the embodiments of this application take the specific application of the above method in an answering system as an example to explain the specific process of the adaptive retrieval enhancement generation method based on model knowledge boundaries.

[0049] Example 1:

[0050] Firstly, this invention proposes an adaptive retrieval enhancement generation method based on model knowledge boundaries, see [link to relevant documentation]. Figure 1 The method includes:

[0051] S1. Organize prompt words based on the improved prompt template format, and adaptively rearrange recall information related to user prompts using a pre-built large language model reorderer.

[0052] The large language model reorderer includes: a closed-source model with adaptive reordering capability or an open-source model trained on data labeled by a closed-source model with adaptive reordering capability;

[0053] S2. The user prompts and the results of adaptive rearrangement are concatenated and input into the target large language model to obtain the generated results.

[0054] This embodiment proposes an adaptive retrieval enhancement generation method based on model knowledge boundaries. It organizes prompt words using an improved prompt template format and adaptively rearranges recall information related to user prompts using a pre-built large language model reorderer. The user prompts and the adaptively rearranged results are then concatenated and input into the target large language model to obtain the generated results. Compared to existing large language model retrieval enhancement generation techniques, this embodiment offers higher result quality, faster inference speed, lower cost, and wider applicability across various application scenarios.

[0055] See Figure 1-3 The following section uses a response system as an example, combined with an explanation of the specific steps S1-S2, to detail the implementation process of an embodiment of the present invention.

[0056] S1. Organize prompt words based on the improved prompt template format, and adaptively rearrange recall information related to user prompts using a pre-built large language model reorderer.

[0057] The large language model reorderer includes: a closed-source model with adaptive reordering capability or an open-source model trained on data labeled by a closed-source model with adaptive reordering capability.

[0058] S11. Obtain recall information based on user prompts.

[0059] In the process of generating results using the large language model retrieval enhancement, the system first retrieves relevant recall information from the internet based on user prompts. User prompts can be query questions, search keywords, etc., while recall information consists of articles or other content related to these query questions or keywords. Taking an answer system as an example, the user prompt is the query question provided by the user, and the recall information is the articles related to that query question found online. The specific process for obtaining recall information is as follows: after receiving the user's query question, an online search is performed to retrieve articles related to that question. Retrieving recall information based on user prompts can be achieved using existing conventional techniques. This application does not specifically limit this process, as long as the corresponding recall information can be obtained based on the user's prompt. In this embodiment, for ease of explanation, the MS MARCO (Microsoft Machine Reading Comprehension Dataset) dataset is directly used as the user prompt and recall information for subsequent steps. The MS MARCO dataset contains queries and corresponding recall articles.

[0060] S12. Construct a large language model reorderer and organize prompt words based on the format of the improved prompt template, so that the large language model reorderer can obtain the adaptive reordering result of the recall information; wherein, the large language model reorderer includes: a closed-source model with adaptive reordering capability or an open-source model trained on data labeled by a closed-source model with adaptive reordering capability.

[0061] Existing reordering techniques either fail to consider the inherent connections between recalled articles or cannot perform adaptive retrieval based on the boundaries of a large language model. To address this deficiency, this embodiment proposes a prompt framework with adjusted keywords. By organizing content according to this framework and feeding it to a large language model, a large language model reorderer can be constructed. Using this large language model reorderer, the large language model can adaptively reorder recalled articles by combining the inherent connections between them and the model's knowledge boundaries, and obtain the reordered results.

[0062] Specifically, the large language model reorderer constructed in this embodiment includes: a closed-source model with adaptive reordering capabilities, or an open-source model trained using data labeled with a closed-source model possessing adaptive reordering capabilities. When reordering recall information, the large language model reorderer provides user prompts and recall information to the closed-source model with adaptive reordering capabilities by organizing prompt words through an adjusted prompt template. In this case, the closed-source model with adaptive reordering capabilities can be directly used as the large language model reorderer; or the adaptive reordering capabilities of the closed-source model can be distilled into the open-source model. In this case, the open-source model is used as the large language model reorderer, which can adaptively output the number of recall information and can also reorder the output recall information.

[0063] This embodiment makes an adaptive improvement to the existing prompt template. It changes the existing prompt word requirement for the large model to directly sort articles based on keywords such as "relevance" to require selection based on keywords such as "helpfulness," and prompts the model to refuse to answer (i.e., output 0 articles). Specifically, as follows... Figure 3 As shown, the left side represents models such as GPT, and the right side represents models such as Mistral. The improved prompt template (which may be slightly adjusted for different language models due to specific tokens, etc.) allows for adaptive removal of articles that are not helpful in answering the question during the sorting process, taking into account the knowledge boundaries of the language model itself.

[0064] After constructing the large language model reorderer, the reordering results of the recall information are obtained based on the large language model reorderer.

[0065] Furthermore, existing technologies face several challenges in obtaining the reordering results for recall information: Firstly, existing closed-source models are costly to annotate data, while lower-cost closed-source models lack adaptive reordering capabilities (including the weak capabilities of GPT3.5). However, some closed-source models possess adaptive reordering capabilities, such as GPT4, which exhibits considerable adaptive reordering ability. Secondly, existing large-scale open-source language models lack sufficient adaptive retrieval and reordering capabilities, making practical applications generally difficult. To address this issue, the aforementioned adaptive reordering capabilities of closed-source models like GPT4 are distilled into smaller open-source models. The resulting distilled open-source models require no additional training data, are low-cost, and have fast inference speeds.

[0066] For ease of explanation, in this embodiment, the closed-source model with adaptive rearrangement capability is GPT4, and the open-source model is Mistral-7B. This step uses a two-step progressive distillation method to distill the adaptive rearrangement capability of GPT4, that is, using ChatGPT-annotated data as training data, and fine-tuning the open-source model Mistral-7B in two stages. First, a preliminary distillation is performed using a large amount of data annotated with ChatGPT3.5, and then a second distillation is performed using a small amount of data annotated with GPT4. This distillation method can effectively reduce training costs and time overhead. The specific process of the two-step progressive distillation is as follows:

[0067] S121. Organize the user prompts and recall information according to the improved prompt template format and provide the prompt words to ChatGPT3.5, and output the first sorting result.

[0068] During the initial distillation, the user prompts and recall information obtained in step S1 (i.e., queries and corresponding recall articles in the msmarco dataset) are organized into prompt words according to the improved prompt template and provided to ChatGPT3.5. ChatGPT3.5 then outputs a ranking result, which is denoted as the first ranking result.

[0069] S122. The user prompts, recall information and the first ranking results are used together as training data to perform preliminary training and fine-tuning on the Mistral-7B model.

[0070] Next, we will combine the query, the recalled articles, and the first ranking result obtained using ChatGPT3.5 (which is the correct answer that the open-source model Mistral-7B needs to learn) as training data to perform initial training and fine-tuning of the open-source model Mistral-7B.

[0071] When training and fine-tuning Mistral-7B using queries, recalled articles, and ranking results, a classic autoregressive loss function is employed as the optimization objective. This allows the open-source model Mistral-7B to better fit the distribution of more powerful closed-source models. The specific optimization objective function can be expressed by the following formula:

[0072]

[0073] In the formula, T is the training set of data generated by the teacher model (i.e., GPT3.5 in this embodiment); xi is the input prompt consisting of queries, paragraphs, and instructions; t <jrepresents the target token sequence preceding position j, and the corresponding parameters are the parameters of the open-source model Mistral-7B to be optimized; xi represents a single input consisting of a query, paragraph, and instruction; ti represents the output corresponding to a single input (in this embodiment, the output is the adaptive sorting result); P θ (t i,j x i ,t <j This indicates that, with the current parameter being θ, given the input xi and the currently inferred sequence t... <j The next token is deduced to be t. i,j The probability; i represents the current data item i; j represents the current inference to the current token j.

[0074] The secondary distillation process is similar to the primary distillation process described above, except that it uses less data (for example, if 100,000 data points are used in the primary distillation, then 5,000 data points are sampled in the secondary distillation), and this data is then provided to GPT4. This process mainly includes:

[0075] S123. Organize the data from some user prompts and recall information into prompt words according to the improved prompt template format and provide them to GPT4, and output the second sorting result;

[0076] S124. The user prompts, recall information, and second ranking results are used together as training data to perform secondary training and fine-tuning on the Mistral-7B model.

[0077] However, it should be noted that the quality of the sampled data during the secondary distillation process is crucial to the overall distillation effect. Therefore, to improve the quality of the sampled data during secondary distillation, a preferred approach is to use both the K-means algorithm and the Min-max algorithm to sample the data in the MS MARCO dataset before using GPT4 to label the data. The specific sampling process is as follows:

[0078] First, the query statements of the MS MARCO dataset are obtained using OpenAI's text-embedding-ada-002 model. Then, a small amount of data is sampled using both the K-means algorithm and the Min-max algorithm. When using the K-means algorithm, all query statements are divided into different clusters, and uniform random sampling is performed within each cluster. When using the Min-max algorithm, a small amount is randomly sampled from the query statement data as a seed. Then, the vector representations of the remaining query statements are used to calculate the minimum value of the inner product of the vector representations of all query statements in the sampled data. The sampling process using the Min-max algorithm can be expressed by the following formula:

[0079]

[0080] In the formula, S represents the set of samples that have already been sampled; R represents the remaining set of samples; q i and q j Let |q| represent the vector representations of the i-th and j-th samples, respectively; i | represents and|q j | indicates the corresponding modulus; l indicates the item number that needs to be sampled in this step.

[0081] Furthermore, to optimize the performance of the target model, data augmentation is performed on the labeled data after GPT4 annotation. The specific process of data augmentation is as follows:

[0082] 1) Shuffling the order: The order of the dataset obtained after the GPT4 annotation (including the original order of recalled articles and their corresponding final ranking results) is shuffled. This step can enhance the robustness of the target model in subsequent training.

[0083] 2) Synthetic data: To simulate real-world scenarios, some data with high retrieval noise was added to the GPT4 labeled dataset. Specifically, some data consisting entirely of negative examples and some consisting of only one positive example were manually added. This step can enhance the target model's ability to refuse to answer in subsequent training and reduce the impact of the large model illusion.

[0084] After the above data augmentation steps, further complete training data is obtained, including the query, the recalled articles, and the ranking results of the recalled articles. Then, referring to the training method in the preliminary distillation, the open-source model Mistral-7B is further trained.

[0085] Following step S12 above, the large model reorderer is used to adaptively reorder the recall information (i.e., the retrieved articles) to obtain the adaptive reordering result of the recall information.

[0086] S2. The user prompts and the results of adaptive rearrangement are concatenated and input into the target large language model to obtain the generated results.

[0087] Finally, after inputting the user prompts and the adaptive rearrangement results obtained in step S1 above into the target large language model (i.e. the large language model that the user initially accessed and input the user prompts into), the desired enhanced generation results can be obtained.

[0088] This completes the entire process of the adaptive retrieval enhancement generation method based on model knowledge boundaries in this embodiment.

[0089] After the above distillation, training, and optimization, the target model requires no further training or contextual knowledge and can be used in various task scenarios in a plug-and-play manner to achieve efficient adaptive retrieval enhancement generation with zero samples.

[0090] Furthermore, to verify the superiority of the adaptive retrieval enhancement generation method based on model knowledge boundaries proposed in this embodiment, the following implementation data will serve as proof.

[0091] 1) Verify the good performance of the adaptive retrieval enhancement generation method based on model knowledge boundary proposed in this embodiment.

[0092] Experiments were conducted on the ASQA, QAMPARI, and ELI5 datasets. Using Alpaca-7b and Mistral-Instruct 7b as the base, the proposed prompt framework for GPT4 showed improvements of 67.53% and 45.24% (relative values, the same below) respectively compared to no retrieval, 11.01% and 29.12% compared to no rearrangement, and 9.1% and 26.14% compared to the existing rearrangement technique RankVicuna; compared to using GPT4 but without the proposed adaptive rearrangement framework, improvements were 1.42% and 12.30%. When applied to the distilled Mistral-Instruct model obtained in this application, the proposed prompt framework achieved performance comparable to GPT4 using Alpaca-7b, Mistral-7b, and ChatGPT3.5, respectively. This demonstrates that the model obtained using the distilled framework of this application can fully rival the capabilities of GPT4. Specifically... Figure 4-5 As shown, where, Figure 4 The experiments conducted using the method of this embodiment on open-source models (alpaca-7b and mistral-instruct-7b) are as follows. Figure 5 The figure shows experiments conducted on ChatGPT to illustrate the method of this embodiment. The figure includes three datasets (asqa, qampari, eli5) and an overall performance (overral). Each dataset has a specific evaluation metric, but all are metrics reflecting the correctness of the answers. EM stands for exact match, which is the exact match where the answer contains the correct answer; claim is the proportion of sub-answers covered by the answer; prec rec represents precision and recall; and overral is the average of the three datasets.

[0093] 2) The efficiency and cost of the adaptive retrieval enhancement generation method based on model knowledge boundaries proposed in this embodiment are verified.

[0094] See Figure 6In terms of efficiency, the time required for rearranging using the target model obtained by distillation in this application is only 1 / 10 to 1 / 5 of the time required to call the API of a closed-source model such as GPT4, and the result can be obtained within half a second, which is beneficial for applications in real-world pipelines. In terms of cost, for 1,000 data points, calling the GPT4 interface costs about $20.3, while rearranging using the target model obtained by distillation in this application only requires about $0.07 in graphics card rental fees, which is only 1 / 300 of the cost of calling the GPT4 interface, showing a significant advantage.

[0095] Example 2:

[0096] Secondly, this invention also proposes an adaptive retrieval enhancement generation system based on model knowledge boundaries. This system includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the steps of the method described in Embodiment 1. The steps of this method mainly include:

[0097] The prompt words are organized in a format based on an improved prompt template, while the recall information related to the user prompt is adaptively rearranged using a pre-built large language model reorderer; the large language model reorderer includes: a closed-source model with adaptive reordering capability or an open-source model trained on data labeled by a closed-source model with adaptive reordering capability.

[0098] The user prompts and the results of adaptive rearrangement are concatenated and input into the target large language model to obtain the generated results.

[0099] Optionally, training open-source models based on data labeled with closed-source models that have adaptive reordering capabilities includes:

[0100] A two-step incremental distillation method is used to distill the closed-source model with adaptive rearrangement capability, and the open-source model is fine-tuned in two steps.

[0101] Among them, the closed-source model with adaptive rearrangement capability is GPT4; the open-source model is Mistral-7B.

[0102] Optionally, the two-step incremental distillation includes:

[0103] S121. Organize the user prompts and recall information according to the improved prompt template format and provide the prompt words to ChatGPT3.5, and output the first sorting result;

[0104] S122. Use the user prompts, recall information and the first ranking results together as training data to perform preliminary training and fine-tuning on the Mistral-7B model;

[0105] S123. Organize the data from some user prompts and recall information into prompt words according to the improved prompt template format and provide them to GPT4, and output the second sorting result;

[0106] S124. The user prompts, recall information, and second ranking results are used together as training data to perform secondary training and fine-tuning on the Mistral-7B model.

[0107] Optionally, the improved prompt template includes: adjusting the keywords in the prompt template to allow the model to adaptively output the number of articles recalled.

[0108] Optionally, before executing S123, the K-means algorithm and the Min-max algorithm are used to sample the data in the user prompts and recall information, respectively.

[0109] Optionally, the K-means algorithm and the Min-max algorithm are used to sample the data in the user prompts and recall information, including:

[0110] First, obtain all query statements using OpenAI's text-embedding-ada-002 model;

[0111] All query statements are divided into different clusters based on the K-means algorithm, and uniform random sampling is performed within each cluster;

[0112] Based on the Min-max algorithm, a small number of samples are randomly sampled from the vector representation of the query statement as seeds, and then each time the sample with the smallest maximum inner product with the sampled samples is selected from the remaining samples.

[0113] Optionally, the labeled data can be augmented after S123 is executed and before S124 is executed.

[0114] Optionally, the enhancement method includes: shuffling the order of the recalled articles and the corresponding final ranking results in the labeled data, and adding data with high retrieval noise to the labeled dataset.

[0115] Optionally, training the target model in S122 and S124 includes:

[0116] The target model is trained using an autoregressive loss function as the optimization objective function, which is expressed by the following formula:

[0117]

[0118] In the formula, T is the training set of data generated by the teacher model; xi is the input prompt consisting of user queries, retrieved articles, and commands; t<j represents the target token sequence preceding position j; xi represents the input consisting of user queries, retrieved article paragraphs, and commands; ti represents the output corresponding to a single input (in this embodiment, the output is an adaptive sorting result); P θ (t i,j x i ,t <j This indicates that, with the current parameter being θ, given the input xi and the currently inferred sequence t... <j The next token is deduced to be t. i,j The probability; i represents the current data item i; j represents the current inference to the current token j.

[0119] It is understood that the adaptive retrieval enhancement generation system based on model knowledge boundaries provided in this embodiment of the invention corresponds to the adaptive retrieval enhancement generation method based on model knowledge boundaries described above. The explanations, examples, and beneficial effects of the relevant content can be referred to the corresponding content in the adaptive retrieval enhancement generation method based on model knowledge boundaries, and will not be repeated here.

[0120] In summary, compared with existing technologies, it has the following beneficial effects:

[0121] 1. This invention proposes an adaptive retrieval enhancement generation method and system based on model knowledge boundaries. It organizes prompt words using an improved prompt template format and adaptively rearranges recall information related to user prompts using a pre-built large language model reorderer. The user prompts and the adaptively rearranged results are then concatenated and input into the target large language model to obtain the generated results. Compared to existing large language model retrieval enhancement generation technologies, the proposed adaptive retrieval enhancement generation method and system offer higher result quality, faster inference speed, lower cost, and can be applied to various scenarios with better versatility.

[0122] 2. The novel prompt framework proposed in this invention enables large models to reorder based on their own knowledge boundaries and the inherent connections between recalled articles. It can optimize existing search augmentation generation (RAG) architectures in a plug-and-play manner, while also reducing the impact of search noise.

[0123] 3. The two-step progressive distillation method proposed in this invention optimizes and improves the instruction distillation process by employing two-step distillation and two data augmentation methods, resulting in better model performance and stronger robustness. At the same time, compared with the traditional instruction distillation process, this process has shorter training time, lower cost, and better training effect.

[0124] 4. The language big model proposed in this invention has an adaptive retrieval enhancement generation technology based on model knowledge boundaries. It does not require more training or contextual knowledge, and can be used in various task scenarios in a plug-and-play manner to achieve efficient adaptive retrieval enhancement generation with zero samples. Compared with existing technologies, it does not require additional knowledge, has fast reasoning speed, low cost, and can be truly applied to real-world scenarios.

[0125] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0126] The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An adaptive retrieval enhancement generation method based on model knowledge boundaries, characterized in that, The method includes: The prompt words are organized in a format based on an improved prompt template, while the recall information related to the user prompt is adaptively rearranged using a pre-built large language model reorderer; the large language model reorderer includes: a closed-source model with adaptive reordering capability or an open-source model trained on data labeled by a closed-source model with adaptive reordering capability. The user prompts and the results of adaptive rearrangement are concatenated and input into the target large language model to obtain the generated results. Training open-source models using data labeled with closed-source models that have adaptive reordering capabilities includes: A two-step incremental distillation method is used to distill the closed-source model with adaptive rearrangement capability, and the open-source model is fine-tuned in two steps. Among them, the closed-source model with adaptive rearrangement capability is GPT4; the open-source model is Mistral-7B. The two-step incremental distillation includes: S121. Organize the user prompts and recall information according to the improved prompt template format and provide the prompt words to ChatGPT3.5, and output the first sorting result; S122. Use the user prompts, recall information and the first ranking results together as training data to perform preliminary training and fine-tuning on the Mistral-7B model; S123. Organize the data from some user prompts and recall information into prompt words according to the improved prompt template format and provide them to GPT4, and output the second sorting result; S124. Use the user prompts, recall information and second ranking results together as training data to perform secondary training and fine-tuning on the Mistral-7B model. The improved prompt template includes adjusting the keywords in the prompt template to allow the model to adaptively output the number of articles recalled.

2. The method of claim 1, wherein, Before executing S123, the K-means algorithm and the Min-max algorithm are used to sample the data in the user prompts and recall information, respectively.

3. The method as described in claim 2, characterized in that, The K-means and Min-max algorithms were used to sample data from user prompts and recall information, including: First, obtain all query statements using OpenAI's text-embedding-ada-002 model; All query statements are divided into different clusters based on the K-means algorithm, and uniform random sampling is performed within each cluster; Based on the Min-max algorithm, a small number of samples are randomly sampled from the vector representation of the query statement as seeds, and then each time the sample with the smallest maximum inner product with the sampled samples is selected from the remaining samples.

4. The method as described in claim 1, characterized in that, After executing S123 and before executing S124, the labeled data is augmented.

5. The method as described in claim 4, characterized in that, The enhancement method includes: shuffling the order of the recalled articles and the corresponding final ranking results in the labeled data, and adding data with high retrieval noise to the labeled dataset.

6. The method of claim 1, wherein, Training the target model in S122 and S124 includes: The target model is trained using an autoregressive loss function as the optimization objective function, which is expressed by the following formula: In the formula, T is the training set of data generated by the teacher model; An input prompt consisting of user queries, article retrieval requests, and commands; Indicates position j The previous target token sequence; This represents the output corresponding to a single input. This indicates that the current parameter is θ When given input and the currently inferred sequence The next token is deduced to be The probability of; i This indicates that the current number is the [number]. i Data items; j This indicates that the current reasoning has reached the [number]th [number]. j A token.

7. An adaptive retrieval enhancement generation system based on model knowledge boundaries, the system comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 6.