Retrieval-augmented generation method and computing device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By introducing classification tools and generative models into large models, the categories of problems are obtained and reference information is generated, which solves the illusion problem of large models under untrained data and improves the accuracy of generated results and information acquisition capabilities.

WO2026129656A1PCT designated stage Publication Date: 2026-06-25HUAWEI TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: HUAWEI TECH CO LTD
Filing Date: 2025-07-29
Publication Date: 2026-06-25

Application Information

Patent Timeline

29 Jul 2025

Application

25 Jun 2026

Publication

WO2026129656A1

IPC: G06F16/3329

AI Tagging

Application Domain

Digital data information retrieval Special data processing applications

Technology Topics

User input Engineering

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

system
JP2026103537AFinance User input Engineering
Multimodal model customization and orchestration
WO2026135797A1Digital data information retrieval Machine learning User input Engineering
Real-time evaluation framework for ai-based assistants in collaborative environments
US20260178837A1Natural language translation Biological models Data pack User input
system
JP2026103409AOffice automation Resources Information processingNetwork generation
system
JP2026101233ACommerce Information processing User input

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Large models are prone to illusions when the user input question is not in their training data, leading to inaccurate generated results.

Method used

By inputting user questions into a classification tool to obtain categories, and inputting the data of the questions and their categories into a generative model to generate reference information, this information is then combined with a large artificial intelligence model to enhance the generation process.

Benefits of technology

It improves the accuracy of the generated results of large artificial intelligence models, reduces the probability of hallucinations, and enhances the amount of information acquired by the generated models.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN2025111315_25062026_PF_FP_ABST

Patent Text Reader

Abstract

The present disclosure relates to the technical field of computers, and provides a retrieval-augmented generation method and a computing device. The method comprises: first acquiring a question inputted by a user; then inputting the question into a classification tool to obtain the category of the question; then inputting both the question and data, in a knowledge base, belonging to the same category as the question into a generation model to generate corresponding reference information; and finally inputting the reference information and the question into a large artificial intelligence model to generate an answer to the question. The technical solution provided by the present disclosure can improve the accuracy of generated results of a large artificial intelligence model, and reduce the probability of a hallucination in the large model.

Need to check novelty before this filing date? Find Prior Art

Description

Search enhancement generation methods and computing devices

[0001] This application claims priority to Chinese Patent Application No. 202411900805.1, filed with the State Intellectual Property Office of China on December 19, 2024, entitled "Search Enhancement Generation Method and Computing Device", the entire contents of which are incorporated herein by reference. Technical Field

[0002] This disclosure relates to the field of computer technology, and in particular to a retrieval enhancement generation method and computing device. Background Technology

[0003] Large AI models are powerful AI tools that, when trained using massive amounts of data in a knowledge base, can perform a variety of complex tasks such as natural language processing, computer vision, and speech recognition.

[0004] Because different knowledge bases in different fields involve issues such as data security and privacy protection, large models can usually only learn from the knowledge in the training data. When the user inputs a question that is not in its training data, large models often exhibit hallucinations (i.e., the results generated by the large model are inaccurate).

[0005] Therefore, how to reduce the probability of hallucinations in large models is a problem that needs to be solved by those in this field. Summary of the Invention

[0006] This disclosure provides a retrieval enhancement generation method and computing device to improve the accuracy of the generated results of large artificial intelligence models and reduce the probability of large artificial intelligence models exhibiting hallucinations.

[0007] Firstly, a retrieval enhancement generation method is provided, including:

[0008] The problem of obtaining user input;

[0009] Input the question into the classification tool to obtain the question's category;

[0010] Input the question and data from the knowledge base that belong to the same category as the question into the generative model to generate corresponding reference information;

[0011] Input the reference information and questions into the large-scale artificial intelligence model to generate answers to the questions.

[0012] The first aspect of the retrieval enhancement generation method involves inputting the user's question into a classification tool to determine the question's category. Next, the user's question and data from the knowledge base belonging to the same category are input into the generation model to generate corresponding reference information. This reduces the probability of introducing noisy data from the knowledge base (such as data unrelated to the user's question's category) into the generation model, improving the accuracy of the reference information generated by the model that is relevant to the user's question. Finally, the reference information and the user's question are input into a large-scale artificial intelligence model to generate the answer to the question.

[0013] Compared to simply using the user's question as input to the large model, this implementation also includes reference information related to the user's question as input, increasing the amount of information the large model can acquire. Thus, even if the user's question is not in the large model's training data, the large model can obtain not only the user's question but also reference information related to it. Combining these two types of information during training allows for the generation of more accurate answers. Therefore, this implementation improves the accuracy of the AI large model's generated results and reduces the probability of the AI large model exhibiting errors or misinterpretations.

[0014] In one possible implementation of the first aspect, the generative model may include two distinct generative models, such as a first generative model and a second generative model. Inputting the problem and data from the knowledge base belonging to the same category as the problem into the generative model to generate corresponding reference information may include:

[0015] Input the question and the data in the knowledge base that belong to the same category as the question into the first generation model to generate the first reference information;

[0016] Input the question and the data in the knowledge base that belong to the same category as the question into the second generative model to generate second reference information;

[0017] If the similarity between the first reference information and the second reference information is greater than or equal to the target value, the first reference information is determined to be the reference information generated by the generative model.

[0018] In this way, the second generative model can be used as a validation model for the first generative model to verify the accuracy of the first reference information generated by the first generative model, thereby improving the accuracy of the reference information generated by the generative model.

[0019] In one possible implementation of the first aspect, the generative model may include two distinct generative models, such as a first generative model and a second generative model. Inputting the problem and data from the knowledge base belonging to the same category as the problem into the generative model to generate corresponding reference information may include:

[0020] Input the question and the data in the knowledge base that belong to the same category as the question into the first generation model to generate the first reference information;

[0021] Input the question and the data in the knowledge base that belong to the same category as the question into the second generative model to generate second reference information;

[0022] If the similarity between the first reference information and the second reference information is less than the target value, repeat the process of generating the first reference information and generating the second reference information until the similarity between the newly generated first reference information and the newly generated second reference information is greater than or equal to the target value.

[0023] The newly generated first reference information is determined as the reference information generated by the generative model.

[0024] By means of the above method, when the accuracy of the first reference information generated by the first generation model does not meet the preset conditions (i.e., the similarity between the first reference information and the second reference information is less than the target value), the first reference information and the second reference information can be regenerated by the first generation model and the second generation model until the newly generated first reference information meets the preset conditions, thereby improving the accuracy of the reference information generated by the generation model.

[0025] In one possible implementation of the first aspect, the generative model may include two distinct generative models, such as a first generative model and a second generative model. Inputting the problem and data from the knowledge base belonging to the same category as the problem into the generative model to generate corresponding reference information may include:

[0026] Input the question and the data in the knowledge base that belong to the same category as the question into the first generation model to generate the first reference information;

[0027] Input the question and the data in the knowledge base that belong to the same category as the question into the second generative model to generate second reference information;

[0028] If the similarity between the first reference information and the second reference information is less than the target value, repeat the process of generating the first reference information until the similarity between the newly generated first reference information and the second reference information is greater than or equal to the target value.

[0029] The newly generated first reference information is determined as the reference information generated by the generative model.

[0030] By means of the above method, if the accuracy of the first reference information generated by the first generation model does not meet the preset conditions, the first reference information can be regenerated by the first generation model until the newly generated first reference information meets the preset conditions, thereby improving the accuracy of the reference information generated by the generation model.

[0031] In one possible implementation of the first aspect, the training set for the classification tool is generated based on all data in the knowledge base. That is, any data in the knowledge base belongs to the training set of the classification tool, thereby improving the accuracy of the classification tool in classifying user questions or newly added data in the knowledge base.

[0032] In one possible implementation of the first aspect, the training set of the generative model is generated based on all data in the knowledge base. That is, any data in the knowledge base belongs to the training set of the generative model, and data in the knowledge base belonging to the same category as the question naturally also belongs to the training set. In this way, the accuracy of the reference information generated by the generative model based on the user-input question and the data in the knowledge base belonging to the same category as the question (i.e., data in the training set) will be very high, and consequently, the accuracy of the answer generated by the large model based on the question and the reference information will also be very high.

[0033] In one possible implementation of the first aspect, if the training set of the generative model is generated based on all the data in the knowledge base, and the accuracy of the generated reference information is already very high, then the generative model can use a simpler sparse model, for example, the generative model can be obtained by distillation or pruning of a large artificial intelligence model, so as to reduce the complexity of the generative model.

[0034] In one possible implementation of the first aspect, the method further includes:

[0035] The newly acquired data is categorized using a classification tool;

[0036] Add new data after categorization to the knowledge base;

[0037] Adjust the generation model based on the updated knowledge base.

[0038] By using the above method, the generation model can be updated based on the newly added data after classification, thereby further improving the accuracy of the generation model.

[0039] Secondly, a retrieval enhancement generation apparatus is provided, the apparatus comprising:

[0040] The acquisition module is used to acquire user input for questions;

[0041] The classification module is used to input questions into the classification tool and obtain the question's category;

[0042] The generation module is used to input the question and data from the knowledge base that belong to the same category as the question into the generation model to generate corresponding reference information;

[0043] Input the reference information and questions into the large model to generate the answers to the questions.

[0044] In one possible implementation of the second aspect, the generative model includes a first generative model and a second generative model, which are different from each other; the generative module is specifically used for:

[0045] Input the question and the data in the knowledge base that belong to the same category as the question into the first generation model to generate the first reference information;

[0046] Input the question and the data in the knowledge base that belong to the same category as the question into the second generative model to generate second reference information;

[0047] If the similarity between the first reference information and the second reference information is greater than or equal to the target value, the first reference information is determined as the reference information.

[0048] In one possible implementation of the second aspect, the generative model includes a first generative model and a second generative model, which are different from each other; the generative module is specifically used for:

[0049] Input the question and the data in the knowledge base that belong to the same category as the question into the first generation model to generate the first reference information;

[0050] Input the question and the data in the knowledge base that belong to the same category as the question into the second generative model to generate second reference information;

[0051] If the similarity between the first reference information and the second reference information is less than the target value, repeat the process of generating the first reference information and generating the second reference information until the similarity between the new first reference information and the new second reference information is greater than or equal to the target value.

[0052] The new first reference information is designated as the reference information.

[0053] In one possible implementation of the second aspect, the generative model includes a first generative model and a second generative model, which are different from each other; the generative module is specifically used for:

[0054] Input the question and the data in the knowledge base that belong to the same category as the question into the first generation model to generate the first reference information;

[0055] Input the question and the data in the knowledge base that belong to the same category as the question into the second generative model to generate second reference information;

[0056] If the similarity between the first reference information and the second reference information is less than the target value, repeat the process of generating the first reference information until the similarity between the new first reference information and the second reference information is greater than or equal to the target value.

[0057] The new first reference information is designated as the reference information.

[0058] In one possible implementation of the second aspect, the training set for the classification tool is generated based on all the data in the knowledge base.

[0059] In one possible implementation of the second aspect, the training set for the generative model is generated based on all the data in the knowledge base.

[0060] In one possible implementation of the second aspect, the generative model is obtained by distillation or pruning of a large artificial intelligence model.

[0061] In one possible implementation of the second aspect, the classification module is further configured to:

[0062] The newly acquired data is categorized using a classification tool;

[0063] Add new data after categorization to the knowledge base;

[0064] The generation module is also used to adjust the generation model based on the updated knowledge base.

[0065] Thirdly, a computing device is provided, comprising: a memory and a processor, wherein the memory is used to store a computer program; and the processor is used to execute the method described in the first aspect or any embodiment thereof when the computer program is invoked.

[0066] Fourthly, a computer-readable storage medium is provided, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the method described in the first aspect or any embodiment of the first aspect.

[0067] Fifthly, a computer program product is provided that, when run on a computing device, causes the computing device to perform the method described in the first aspect or any embodiment of the first aspect.

[0068] Sixthly, a chip system is provided, including a processor coupled to a memory, the processor executing a computer program stored in the memory to implement the method described in the first aspect or any embodiment thereof. The chip system may be a single chip or a chip module composed of multiple chips.

[0069] It is understood that the beneficial effects of the second to sixth aspects mentioned above can be found in the relevant descriptions in the first aspect mentioned above, and will not be repeated here. Attached Figure Description

[0070] Figure 1 is a schematic diagram of a process for generating answers using a large artificial intelligence model according to an embodiment of this disclosure;

[0071] Figure 2 is a schematic diagram of another process for generating answers using a large artificial intelligence model provided in an embodiment of this disclosure;

[0072] Figure 3 is a flowchart illustrating a search enhancement generation method provided in an embodiment of this disclosure;

[0073] Figure 4 is a schematic diagram of the classification labels provided in an embodiment of this disclosure;

[0074] Figure 5 is a schematic diagram of the classification granularity provided in the embodiments of this disclosure;

[0075] Figure 6 is a schematic diagram of another method for generating answers using a large artificial intelligence model according to an embodiment of this disclosure;

[0076] Figure 7 is a schematic diagram of the retrieval enhancement generation device provided in an embodiment of this disclosure;

[0077] Figure 8 is a schematic diagram of the structure of the computing device provided in an embodiment of this disclosure. Detailed Implementation

[0078] The embodiments of this disclosure are described below with reference to the accompanying drawings. The terminology used in the implementation section of this disclosure is for illustrative purposes only and is not intended to limit the scope of the disclosure. The following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments.

[0079] Large-scale artificial intelligence models can be categorized into unimodal models and multimodal models. Unimodal models are specifically designed for processing a particular type of data (such as text, images, video, or audio), while multimodal models are trained by combining text, images, video, and audio data. Both unimodal and multimodal models require massive amounts of data for training to perform complex tasks such as natural language processing, computer vision, and speech recognition.

[0080] Because different knowledge bases in different fields involve issues such as data security and privacy protection, large models can usually only learn the knowledge from the training data. If the user input question is not in its training data, it may cause the large model to generate inaccurate results, resulting in the large model exhibiting an illusion phenomenon.

[0081] To reduce the probability of large models exhibiting misleading behavior, one approach is to use retrieval augmented generation (RAG). This technique uses vector similarity retrieval to quickly find content in the knowledge base that matches the user's input question, thereby enhancing the input to the large model and assisting it in generating more accurate results. For example, as shown in Figure 1, after receiving a user's question, the large AI model can first vectorize the question, then use a retrieval tool to perform a similarity search in the knowledge base to obtain relevant documents or paragraphs (i.e., context). Next, the augmenter inputs the user's question and the retrieved context into the generator, enabling the generator to generate the answer to the question and return it to the user.

[0082] However, knowledge bases typically contain massive amounts of data. When performing similarity searches, using exact matching results incurs extremely low efficiency. Therefore, RAG (Rich Internet Explorer) usually employs fuzzy matching for similarity searches within the knowledge base. This can lead to situations where, although the search results show a high degree of similarity to the user's question, the actual search results do not match the user's question. This results in inaccurate answers generated by the large model based on the search results, leading to a "model illusion" phenomenon.

[0083] To address this, this disclosure provides a retrieval enhancement generation method, as shown in Figure 2. The method includes: a learner first inputting a user's question into a classification tool. The classification tool classifies the question, determining its category (e.g., category 1), and inputs the question and data belonging to category 1 from the knowledge base into a generation model. The generation model then generates corresponding reference information based on the input parameters (i.e., the user's question and data belonging to category 1 from the knowledge base) and returns it to the learner. Next, the learner inputs the question and reference information into the generator of a large-scale AI model through an enhancer, thereby increasing the amount of information the large model can acquire and enhancing its input. In this way, even if the user's input question is not in the large model's training data, the large model can obtain not only the user's question but also reference information related to it. Combining these two types of information during training yields a more accurate answer. Therefore, the solution of this disclosure can improve the accuracy of the generated results of the large-scale AI model and reduce the probability of the large-scale AI model exhibiting errors.

[0084] The retrieval enhancement generation method provided in this disclosure can realize semantic-level search, such as cross-modal search, object detection and recognition, map query, etc.

[0085] The retrieval enhancement generation method provided in this disclosure can be applied to computing devices, including but not limited to personal computers (PCs), smartphones, netbooks, tablets, handheld computers, and high-computing-power and high-capacity products (such as training and inference all-in-one machines). High-computing-power and high-capacity products have both large storage space and sufficient CPU computing power to support the training and inference of classification tools and generative models.

[0086] Large artificial intelligence models can include large language models (LLM), computer vision models, speech recognition and generation models, etc.

[0087] Figure 3 is a flowchart illustrating a retrieval enhancement generation method provided in an embodiment of this disclosure. As shown in Figure 3, the method may include the following steps:

[0088] S110, Problem of obtaining user input.

[0089] User input can be in text form, or it can be in the form of images, audio, or video. When the user input is in text form, it can be directly vectorized (embedding) using a vectorization model. When the user input is in the form of images, audio, or video, it can be first converted into text form and then vectorized using a vectorization model.

[0090] S120. Input the question into the classification tool to obtain the question category.

[0091] Classification tools can be used to categorize user-submitted questions or newly added data in a knowledge base. Classification tools can include classification models and non-model classification tools. For ease of understanding, the embodiments of this disclosure will subsequently use a classification tool as a classification model as an example for illustrative explanation.

[0092] The classification model can be an open-source model, such as FastText. In some embodiments, the classification model can also be obtained by fine-tuning an open-source model.

[0093] In some embodiments, the categories of existing data in the knowledge base (i.e., data already stored in the knowledge base) can be labeled first, and then a training set for the classification model can be generated based on the labeled existing data. The classification model can then be trained using the training set. As shown in Figure 4, the categories of existing data in the knowledge base may include health, cognition, history, classics, growth, etc.

[0094] In some embodiments, the training set of the classification model may include a predetermined proportion (e.g., 80%, 90%) of the existing data in the knowledge base. That is, the predetermined proportion of existing data in the knowledge base is first labeled, and then the labeled existing data is used as the training set for the classification model. In this case, the trained classification model can also be used to classify and label the unlabeled existing data in the knowledge base.

[0095] In some embodiments, the training set of the classification model may also include all existing data in the knowledge base to form closed-loop data for training and inference of the classification model. That is, all existing data in the knowledge base is first labeled, and then the labeled existing data is used as the training set of the classification model. In this way, any data in the knowledge base belongs to the training set of the classification model, thereby improving the accuracy of the classification model in classifying user-submitted questions or newly added data in the knowledge base (in some examples, the accuracy can reach over 99%).

[0096] When the training set of a classification model consists of all existing data in the knowledge base, the model already achieves high accuracy in classifying user-submitted questions or newly added data in the knowledge base. In this case, a classification model with fewer parameters and layers can be used to reduce its complexity.

[0097] When classifying newly added data or unlabeled existing data in a knowledge base, the classification granularity can be fixed or adjusted according to the actual situation. For example, as shown in Figure 5, for any data document 1 in the knowledge base, the classification model can directly classify document 1, or it can classify each chapter in document 1, or each paragraph in document 1. In some embodiments, the classification model can also classify each sentence in document 1.

[0098] S130. Input the problem and the data in the knowledge base that belong to the same category as the problem into the generative model to generate corresponding reference information.

[0099] In some embodiments, the training set for the generative model may include a predetermined proportion of existing data in the knowledge base.

[0100] In some embodiments, the training set of the generative model may also include all existing data in the knowledge base to form closed-loop data for training and inference of the generative model. In this way, any data in the knowledge base belongs to the training set of the generative model, and data in the knowledge base that belongs to the same category as the question naturally also belongs to the training set of the generative model. This can improve the accuracy of the reference information generated by the generative model based on the user-input question and the data in the knowledge base that belongs to the same category as the question (i.e., data in the training set), thereby improving the accuracy of the answer generated by the large model based on the question and the reference information.

[0101] When the training set of the generative model consists of all existing data in the knowledge base, the accuracy of the reference information generated by the generative model based on the user-input question and data in the knowledge base belonging to the same category as the question is already very high. In this case, a simpler sparse model can be used to reduce the complexity of the generative model. For example, the generative model can be obtained by distillation or pruning of a large artificial intelligence model.

[0102] The computing device can directly input the question and data from the knowledge base belonging to the same category as the question into the generative model to generate corresponding reference information. This reduces the probability of introducing data from the knowledge base that is irrelevant to the user's question category (i.e., noisy data) into the generative model, thereby improving the accuracy of the reference information generated by the generative model that is relevant to the user's question.

[0103] The reference information generated by the generative model depends on the type of generative model. For example, if the generative model is a summary extraction model, the reference information generated by the generative model can include summary information of all data in the knowledge base that belong to the same category as the question and are related to the question raised by the user.

[0104] In some embodiments, the generative model may further include a first generative model and a second generative model that is different from the first generative model. For example, the second generative model may contain more parameters, more layers, etc., than the first generative model. The second generative model can be used to verify the accuracy of the generation results of the first generative model.

[0105] For example, as shown in Figure 6, after the classifier determines the category of the user-input question (e.g., category 1), it can input the question and data belonging to category 1 in the knowledge base into the first and second generative models, respectively. The first generative model can generate first reference information based on the input parameters, and the second generative model can generate second reference information based on the input parameters. The accuracy of the first reference information can then be verified using the second reference information. If the verification passes, the first reference information can be returned to the learner. The learner then uses an augmenter to input the question and the first reference information into the generator of the large-scale artificial intelligence model, thereby enhancing the input to the large-scale model.

[0106] Specifically, the computing device can verify the accuracy of the first reference information by comparing the similarity between the first reference information and the second reference information. If the similarity between the first reference information and the second reference information is greater than or equal to a target value (such as 80%, 90%, etc.), the accuracy of the first reference information is verified, and the computing device can return the first reference information to the learner.

[0107] If the similarity between the first reference information and the second reference information is less than the target value, it indicates that the accuracy of the first reference information is low, and the accuracy verification of the first reference information fails. In this case, the first generative model can repeatedly execute the process of generating the first reference information and compare the similarity between the second reference information and the newly generated first reference information until the similarity between the second reference information and the newly generated first reference information is greater than or equal to the target value, that is, the accuracy verification of the newly generated first reference information passes. Then, the newly generated first reference information that has passed the verification can be returned to the learner to improve the accuracy of the first reference information returned to the learner.

[0108] In some embodiments, if the similarity between the first reference information and the second reference information is less than the target value, the first generative model and the second generative model may also repeatedly execute the process of generating the first reference information and generating the second reference information, respectively, and compare the similarity between the newly generated first reference information and the newly generated second reference information until the similarity between the newly generated first reference information and the newly generated second reference information is greater than or equal to the target value, that is, the accuracy of the newly generated first reference information is verified. Then, the verified newly generated first reference information can be returned to the learner.

[0109] S140. Input the reference information and questions into the large artificial intelligence model to generate answers to the questions.

[0110] In some embodiments, as shown in Figure 2, the generative model can directly return the generated reference information to the learner. The learner inputs the question and reference information into the generator of the large AI model through an enhancer to enhance the large model's input. Then, the generator of the large AI model can, based on the first reference information related to the user's question and in conjunction with the user's input question, derive a relatively accurate answer and return it to the user. In this way, even if the user's input question is not in the large model's training data, the large model can obtain not only the user's question but also reference information related to the user's question. By combining these two types of information for training, a more accurate answer can be returned to the user, thereby improving the accuracy of the large AI model's generated results and reducing the probability of the large AI model exhibiting illusions.

[0111] In some embodiments, as shown in Figure 6, the first generative model can also return validated first reference information to the learner. The learner inputs the question and the first reference information into the generator of the large AI model through an enhancer to enhance the input of the large model. Then, the generator of the large model, based on the first reference information related to the user's question and in conjunction with the user's input question, derives a more accurate answer and returns it to the user.

[0112] In some embodiments, for new data, the new data can first be classified using a classification model, then the classified new data can be added to the knowledge base, and the categories of the new data can be labeled.

[0113] In some embodiments, after adding new data to the knowledge base, the generative model can be adjusted based on the updated knowledge base. Specifically, the generative model can be fine-tuned using either incremental data from the knowledge base or the full dataset from the knowledge base. For example, incremental data can be used to fine-tune the generative model over short periods (such as 3 days or 1 week), while the full dataset can be used to adjust the generative model over longer periods (such as 1 month or 6 months).

[0114] In some embodiments, for newly added data, the categories of the newly added data can be labeled first and added to the knowledge base. Then, the classification model can be fine-tuned based on the newly added data with labeled categories in the knowledge base, or the classification model can be adjusted based on the full dataset in the knowledge base.

[0115] The retrieval enhancement generation method provided in this embodiment first inputs the user's question into a classification tool to obtain the question's category. Then, it inputs both the user's question and data from the knowledge base belonging to the same category as the question into a generative model to generate corresponding reference information. This reduces the probability of introducing noisy data from the knowledge base into the generative model and improves the accuracy of the reference information generated by the model that is relevant to the user's question. Finally, the reference information and the user's question are input into a large-scale artificial intelligence model to generate the answer to the question.

[0116] Compared to simply using the user's question as input to the large model, this implementation also includes reference information related to the user's question as input, increasing the amount of information the large model can acquire. Thus, even if the user's question is not in the large model's training data, the large model can obtain not only the user's question but also reference information related to it. Combining these two types of information during training allows for the generation of more accurate answers. Therefore, this implementation improves the accuracy of the AI large model's generated results and reduces the probability of the AI large model exhibiting errors or misinterpretations.

[0117] Those skilled in the art will understand that the above embodiments are exemplary and not intended to limit this disclosure. Where possible, the execution order of one or more of the above steps can be adjusted to obtain one or more other embodiments. Those skilled in the art can arbitrarily select and combine the above steps as needed, and all combinations that do not depart from the essence of this disclosure fall within the protection scope of this disclosure.

[0118] Based on the same concept, as an implementation of the above method, this disclosure provides a retrieval enhancement generation device. This device embodiment corresponds to the aforementioned method embodiment. For ease of reading, this device embodiment will not repeat the details of the aforementioned method embodiment one by one, but it should be clear that the device in this embodiment can implement all the contents of the aforementioned method embodiment.

[0119] Figure 7 is a schematic diagram of the structure of the retrieval enhancement generation device provided in the embodiment of this disclosure. As shown in Figure 7, the retrieval enhancement generation device provided in this embodiment may include an acquisition module 210, a classification module 220 and a generation module 230.

[0120] Module 210 is used to obtain user input questions;

[0121] The classification module 220 is used to input questions into the classification tool and obtain the question categories;

[0122] The generation module 230 is used to input the question and data from the knowledge base that belong to the same category as the question into the generation model to generate corresponding reference information.

[0123] Input the reference information and questions into the large model to generate the answers to the questions.

[0124] As an optional implementation, the generative model includes a first generative model and a second generative model, which are different from each other; the generation module 230 is specifically used for:

[0125] Input the question and the data in the knowledge base that belong to the same category as the question into the first generation model to generate the first reference information;

[0126] Input the question and the data in the knowledge base that belong to the same category as the question into the second generative model to generate second reference information;

[0127] If the similarity between the first reference information and the second reference information is greater than or equal to the target value, the first reference information is determined as the reference information.

[0128] As an optional implementation, the generative model includes a first generative model and a second generative model, which are different from each other; the generation module 230 is specifically used for:

[0129] Input the question and the data in the knowledge base that belong to the same category as the question into the first generation model to generate the first reference information;

[0130] Input the question and the data in the knowledge base that belong to the same category as the question into the second generative model to generate second reference information;

[0131] If the similarity between the first reference information and the second reference information is less than the target value, repeat the process of generating the first reference information and generating the second reference information until the similarity between the new first reference information and the new second reference information is greater than or equal to the target value.

[0132] The new first reference information is designated as the reference information.

[0133] As an optional implementation, the generative model includes a first generative model and a second generative model, which are different from each other; the generation module 230 is specifically used for:

[0134] Input the question and the data in the knowledge base that belong to the same category as the question into the first generation model to generate the first reference information;

[0135] Input the question and the data in the knowledge base that belong to the same category as the question into the second generative model to generate second reference information;

[0136] If the similarity between the first reference information and the second reference information is less than the target value, repeat the process of generating the first reference information until the similarity between the new first reference information and the second reference information is greater than or equal to the target value.

[0137] The new first reference information is designated as the reference information.

[0138] As an alternative implementation, the training set for the classification tool is generated based on all the data in the knowledge base.

[0139] As an alternative implementation, the training set for the generative model is generated based on all the data in the knowledge base.

[0140] As an alternative implementation, the generative model is obtained by distillation or pruning of a large artificial intelligence model.

[0141] As an optional implementation, the classification module 220 is also used for:

[0142] The newly acquired data is categorized using a classification tool;

[0143] Add new data after categorization to the knowledge base;

[0144] The generation module 230 is also used to adjust the generation model based on the updated knowledge base.

[0145] The device provided in this embodiment can execute the above method embodiment, and its implementation principle and technical effect are similar, so it will not be described again here.

[0146] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is merely an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit. Furthermore, the specific names of the functional units and modules are only for easy differentiation and are not intended to limit the scope of protection of this disclosure. The specific working process of the units and modules in the above system can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.

[0147] Based on the same concept, this disclosure also provides a computing device. FIG8 is a schematic diagram of the structure of the computing device provided in this disclosure. As shown in FIG8, the computing device provided in this disclosure may include: a memory 310 and a processor 320. The memory 310 is used to store computer programs; the processor 320 is used to implement the method described in the above method embodiments when the computer programs are called.

[0148] The computing device provided in this embodiment can execute the above method embodiment, and its implementation principle and technical effect are similar, so they will not be described again here.

[0149] This disclosure also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the methods described in the above-described method embodiments.

[0150] This disclosure also provides a computer program product that, when run on a computing device, causes the computing device to implement the methods described in the above-described method embodiments.

[0151] This disclosure also provides a chip system including a processor coupled to a memory. The processor executes a computer program stored in the memory to implement the method described in the above embodiments. The chip system may be a single chip or a chip module composed of multiple chips.

[0152] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented using software, it can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this disclosure are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in or transmitted through a computer-readable storage medium. The computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium accessible to a computer or a data storage device such as a server or data center that integrates one or more available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, or magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., a solid-state disk (SSD)).

[0153] Those skilled in the art will understand that implementing all or part of the processes in the above embodiments can be accomplished by a computer program instructing related hardware. This program can be stored in a computer-readable storage medium, and when executed, it can include the processes described in the above method embodiments. The aforementioned storage medium can include various media capable of storing program code, such as ROM or random access memory (RAM), magnetic disks, or optical disks.

[0154] The naming or numbering of steps in this disclosure does not imply that the steps in the method flow must be executed in the time / logical order indicated by the naming or numbering. The execution order of the named or numbered process steps can be changed according to the technical purpose to be achieved, as long as the same or similar technical effect can be achieved.

[0155] In the above embodiments, the descriptions of each embodiment have different focuses. For parts that are not described in detail or recorded in a certain embodiment, please refer to the relevant descriptions of other embodiments.

[0156] In the embodiments provided in this disclosure, it should be understood that the disclosed apparatus / devices and methods can be implemented in other ways. For example, the apparatus / device embodiments described above are merely illustrative. For instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the displayed or discussed mutual couplings or direct couplings or communication connections may be through some interfaces; indirect couplings or communication connections between devices or units may be electrical, mechanical, or other forms.

[0157] It should be understood that in the description of this disclosure and the appended claims, the terms "comprising," "including," "having," and any variations thereof are intended to cover a non-exclusive inclusion and mean "including but not limited to," unless otherwise specifically emphasized. For example, a process, method, system, product, or apparatus that includes a series of steps or modules is not necessarily limited to those steps or modules that are explicitly listed, but may include other steps or modules that are not explicitly listed or that are inherent to such process, method, product, or apparatus.

[0158] Furthermore, in the description of this disclosure, unless otherwise stated, "multiple" means two or more. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one of a, b, or c can mean: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple.

[0159] Furthermore, in the description of this disclosure and the appended claims, the terms "first," "second," etc., are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence, nor should they be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. It should be understood that such data can be interchanged where appropriate so that the embodiments described herein can be implemented in a sequence other than that illustrated or described herein; features defined as "first" or "second" may explicitly or implicitly include at least one of those features.

[0160] In the embodiments of this disclosure, the words "exemplarily" or "for example" are used to indicate that they are examples, illustrations, or descriptions. Any embodiment or design described as "exemplarily" or "for example" in the embodiments of this disclosure should not be construed as being more preferred or advantageous than other embodiments or designs. Rather, the use of the words "exemplarily" or "for example" is intended to present the relevant concepts in a specific manner.

[0161] References to "one embodiment" or "some embodiments" in this disclosure mean that one or more embodiments of this disclosure include a particular feature, structure, or characteristic described in connection with that embodiment. Therefore, phrases such as "in one embodiment," "in some embodiments," "in other embodiments," "in still other embodiments," etc., appearing in different parts of this disclosure do not necessarily refer to the same embodiment, but rather mean "one or more, but not all, embodiments," unless otherwise specifically emphasized.

[0162] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this disclosure, and are not intended to limit them. Although this disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features therein. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of this disclosure.

Claims

1. A method for retrieval augmentation generation, the method comprising: include: The problem of obtaining user input; Input the question into the classification tool to obtain the category of the question; Input the question and the data in the knowledge base that belong to the same category as the question into the generative model to generate corresponding reference information; The reference information and the question are input into the large-scale artificial intelligence model to generate the answer to the question.

2. The method of claim 1, wherein, The generative model includes a first generative model and a second generative model, and the first generative model and the second generative model are different. The question and data from the knowledge base belonging to the same category as the question are input into the generative model to generate corresponding reference information, including: The question and data from the knowledge base that belong to the same category as the question are input into the first generation model to generate the first reference information; The question and data from the knowledge base that belong to the same category as the question are input into the second generation model to generate second reference information; If the similarity between the first reference information and the second reference information is greater than or equal to the target value, the first reference information is determined as the reference information.

3. The method of claim 1, wherein, The generative model includes a first generative model and a second generative model, and the first generative model and the second generative model are different. The question and data from the knowledge base belonging to the same category as the question are input into the generative model to generate corresponding reference information, including: The question and data from the knowledge base that belong to the same category as the question are input into the first generation model to generate the first reference information; The question and data from the knowledge base that belong to the same category as the question are input into the second generation model to generate second reference information; If the similarity between the first reference information and the second reference information is less than the target value, the process of generating the first reference information and generating the second reference information is repeated until the similarity between the new first reference information and the new second reference information is greater than or equal to the target value. The new first reference information is determined as the reference information.

4. The method of claim 1, wherein, The generative model includes a first generative model and a second generative model, and the first generative model and the second generative model are different. The question and data from the knowledge base belonging to the same category as the question are input into the generative model to generate corresponding reference information, including: The question and data from the knowledge base that belong to the same category as the question are input into the first generation model to generate the first reference information; The question and data from the knowledge base that belong to the same category as the question are input into the second generation model to generate second reference information; If the similarity between the first reference information and the second reference information is less than the target value, the process of generating the first reference information is repeated until the similarity between the second reference information and the new first reference information is greater than or equal to the target value. The new first reference information is determined as the reference information.

5. The method according to any one of claims 1 to 4, characterized in that, The training set for the classification tool is generated based on all the data in the knowledge base.

6. The method according to any one of claims 1-5, characterized in that, The training set for the generative model is generated based on all the data in the knowledge base.

7. The method according to claim 6, characterized in that, The generative model is obtained by distillation or pruning of a large artificial intelligence model.

8. The method according to any one of claims 1-7, characterized in that, The method further includes: The newly acquired data is categorized using the aforementioned classification tool; Add new data after categorization to the knowledge base; The generative model is adjusted based on the updated knowledge base.

9. A retrieval enhancement generation device, characterized in that, include: The acquisition module is used to acquire user input for questions; The classification module is used to input the question into the classification tool and obtain the category of the question; The generation module is used to input the question and data in the knowledge base that belong to the same category as the question into the generation model to generate corresponding reference information; The reference information and the question are input into the large-scale artificial intelligence model to generate the answer to the question.

10. The apparatus according to claim 9, characterized in that, The generative model includes a first generative model and a second generative model, wherein the first generative model and the second generative model are different; the generative module is specifically used for: The question and data from the knowledge base that belong to the same category as the question are input into the first generation model to generate the first reference information; The question and data from the knowledge base that belong to the same category as the question are input into the second generation model to generate second reference information; If the similarity between the first reference information and the second reference information is greater than or equal to the target value, the first reference information is determined as the reference information.

11. The apparatus according to claim 9, characterized in that, The generative model includes a first generative model and a second generative model, wherein the first generative model and the second generative model are different; the generative module is specifically used for: The question and data from the knowledge base that belong to the same category as the question are input into the first generation model to generate the first reference information; The question and data from the knowledge base that belong to the same category as the question are input into the second generation model to generate second reference information; If the similarity between the first reference information and the second reference information is less than the target value, the process of generating the first reference information and generating the second reference information is repeated until the similarity between the new first reference information and the new second reference information is greater than or equal to the target value. The new first reference information is determined as the reference information.

12. The apparatus according to claim 9, characterized in that, The generative model includes a first generative model and a second generative model, wherein the first generative model and the second generative model are different; the generative module is specifically used for: The question and data from the knowledge base that belong to the same category as the question are input into the first generation model to generate the first reference information; The question and data from the knowledge base that belong to the same category as the question are input into the second generation model to generate second reference information; If the similarity between the first reference information and the second reference information is less than the target value, the process of generating the first reference information is repeated until the similarity between the second reference information and the new first reference information is greater than or equal to the target value. The new first reference information is determined as the reference information.

13. The apparatus according to any one of claims 9-12, characterized in that, The training set for the classification tool is generated based on all the data in the knowledge base.

14. The apparatus according to any one of claims 9-13, characterized in that, The training set for the generative model is generated based on all the data in the knowledge base.

15. The apparatus according to claim 14, characterized in that, The generative model is obtained by distillation or pruning of a large artificial intelligence model.

16. The apparatus according to any one of claims 9-15, characterized in that, The classification module is further configured to: classify the newly acquired data according to the classification tool; add the classified new data to the knowledge base; the generation module is further configured to: adjust the generation model according to the updated knowledge base.

17. A computing device, characterized in that, include: A memory and a processor, the memory being used to store a computer program; the processor being used to execute the method as described in any one of claims 1-8 when the computer program is invoked.

18. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method as described in any one of claims 1-8.

19. A computer program product, characterized in that, When the computer program product is run on a computing device, it causes the computing device to perform the method as described in any one of claims 1-8.