Search enhancement generation method and computing device
By evaluating problem complexity, language model dependency, and confidence, and dynamically adjusting the fusion weights of the knowledge base and the model, the illusion problem in large language models generating answers is solved, improving the accuracy of answers and the adaptability of the system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HENAN QINWEI DIGITAL TECHNOLOGY CO LTD
- Filing Date
- 2026-02-09
- Publication Date
- 2026-06-19
AI Technical Summary
Large language models may produce outputs that do not match the retrieved facts or contain fabricated content due to defects in training data and randomness in decoding strategies when generating answers, which limits their application in scenarios with high reliability requirements.
By receiving user input questions, obtaining a complexity score for the questions and a judgment on the dependence on the language model's own knowledge, calculating confidence and knowledge coverage, dynamically adjusting the fusion weights of the knowledge base and the language model, determining the knowledge fusion strategy based on the fusion weights, and generating the final answer.
It reduces the illusion phenomenon when the language model generates answers, improves the accuracy and reliability of the answers, realizes intelligent perception and real-time evaluation of questions, and enhances the system's adaptability and resource utilization.
Smart Images

Figure CN122240667A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of retrieval-augmented generation (RAG) technology, and more particularly to retrieval-augmented generation methods and computing devices. Background Technology
[0002] Retrieval-enhanced generative techniques, which retrieve relevant information from external knowledge bases to assist large language models in generating answers, have become an important means of improving the factual accuracy of models.
[0003] However, even if the system successfully retrieves the correct reference information, the large language model itself may still produce "illusions" when generating the final answer due to defects in the training data, randomness in the decoding strategy, etc., resulting in the output being inconsistent with the retrieved facts or containing fabricated content. This limits the application of related technologies in scenarios with high reliability requirements. Summary of the Invention
[0004] This application provides a retrieval enhancement generation method and computing device that can reduce the illusion of language models when answering user-input questions and improve the accuracy of the answers.
[0005] According to a first aspect of the embodiments of this application, a retrieval enhancement generation method is provided, the method comprising:
[0006] The problem of receiving user input; Obtain a complexity score for the problem and a judgment result on the dependence on the language model's own knowledge; whereby the complexity score is used to quantify the difficulty of the problem, and the dependence judgment result indicates the degree to which the problem depends on the language model's own knowledge. Input a question into the language model and obtain the confidence of the language model in answering the question and the knowledge coverage of the knowledge points involved in the question in the language model's own knowledge. Based on complexity score, dependency judgment result, confidence and knowledge coverage, the fusion weight of the knowledge base when merging with the language model's own knowledge is determined; Based on the fusion weights, the corresponding knowledge fusion strategies are determined; Based on the strategy of adjusting the fusion weight to obtain retrieval information from the knowledge base, the target retrieval information is obtained. Then, based on the knowledge fusion strategy, the target retrieval information is fused with the language model's own knowledge to generate the final answer to the question.
[0007] This embodiment can achieve intelligent perception of user questions, real-time evaluation of model status, dynamic decision-making on fusion strategies, and accurate utilization of external knowledge, thereby improving the accuracy and reliability of answers in the retrieval enhancement generation process and effectively alleviating the illusion phenomenon generated by the language model when generating answers.
[0008] In one possible implementation, the complexity score of the problem is obtained, including: Extract the number of entities, semantic dependency depth, and domain overlap in the problem; The complexity score is obtained by weighting the number of entities, semantic dependency depth, and domain overlap, and then normalizing the calculation results.
[0009] This embodiment transforms the subjective judgment of problem complexity into an objective and automated calculation based on multi-dimensional semantic features. This quantification method not only achieves a consistent assessment of problem difficulty but also provides accurate input variables for subsequent adaptive fusion decisions. By fusing three key indicators that characterize problems from the perspectives of information density, structural complexity, and knowledge breadth—"number of entities," "semantic dependency depth," and "domain overlap"—it can accurately distinguish whether the user's input problem is a simple query or a complex one, thereby triggering the corresponding knowledge retrieval and fusion strategies more precisely.
[0010] In one possible implementation, obtaining the dependency judgment result on the language model's own knowledge includes: The correlation between computational problems and pre-trained knowledge graphs, pre-trained corpora, and domain fine-tuning parameters of language models; The relevance is weighted to obtain the dependency score; The dependency score is compared with a preset threshold, and the dependency judgment result is output based on the comparison result.
[0011] This implementation transforms the assessment of problem knowledge dependence from fuzzy estimation to precise calculation and classification. The example constructs a comprehensive and robust evaluation system by comprehensively assessing the relevance of the problem to the model's structured knowledge (knowledge graph), unstructured knowledge (corpus), and domain-adaptive knowledge (fine-tuning parameters). This system can more precisely identify problems that appear simple but actually require external verification, as well as problems that seem unfamiliar but have sufficient implicit knowledge within the model. This judgment can directly influence the calculation of subsequent fusion weights (e.g., proactively reducing the weight of external knowledge when there is "high dependence"), effectively preventing excessive redundant retrieval of knowledge already possessed by the model, while also avoiding blind confidence in the model's knowledge blind spots, thus accurately balancing the utilization of internal and external knowledge at the system level.
[0012] In one possible implementation, the confidence level of the language model in answering the question is obtained, including: In the process of generating answers word by word using a language model, the entropy value of the language model for predicting the probability distribution of the next word is calculated. The confidence level of a language model's response to a question is determined based on entropy values; where entropy values are inversely correlated with confidence levels.
[0013] This embodiment does not rely on external feedback, but directly captures the most realistic uncertainty state of the model through the language model's generation mechanism, namely the probability distribution of each prediction step. This approach can accurately distinguish whether the language model's output is deterministic or uncertain, providing a signal for subsequent adaptive fusion decisions. This allows the language model to utilize an external knowledge base when outputting uncertain answers, reducing the illusions the language model may have when generating responses.
[0014] In one possible implementation, the problem involves determining the knowledge coverage of knowledge points within the language model's own knowledge base, including: Based on the internal knowledge index of the language model, the frequency of occurrence of the knowledge points involved in the statistical problem in the pre-training corpus of the language model is calculated. Knowledge coverage is generated based on the frequency of occurrence.
[0015] This embodiment enables an objective and quantitative assessment of the knowledge support provided by the language model. By directly and efficiently correlating the knowledge points in the question with the model's pre-training history, it provides an effective means of measuring the model's own knowledge. This statistically based coverage assessment can effectively distinguish between common questions and questions in specialized, obscure, or emerging fields, allowing for the pre-judgment of whether the model has knowledge blind spots. This knowledge coverage, together with the model's confidence level, constitutes a dual guarantee for assessing the model's internal reliability. It provides a decision-making basis for dynamically increasing the weight of external knowledge and proactively introducing external evidence. Thus, when the model faces unfamiliar knowledge domains, it can proactively retrieve enhancements and fusion efforts in advance, effectively preventing illusions caused by insufficient knowledge and improving the accuracy of answers to marginal or specialized questions.
[0016] In one possible implementation, the fusion weights of the knowledge base when integrating with the language model's own knowledge are determined, including: Based on complexity assessment, confidence level, and knowledge coverage, as well as the weights corresponding to complexity assessment, confidence level, and knowledge coverage, the fusion weight of the knowledge base when fused with the knowledge of the language model is calculated. When the dependency judgment result is high dependency, the fusion weight is adjusted downward; or when the dependency judgment result is low dependency, the fusion weight is adjusted upward to obtain the final fusion weight.
[0017] This embodiment first quantifies the objective difficulty of the problem and the real-time state of the model through weighted synthesis, generating a preliminary fusion tendency. Then, it introduces higher-order judgments (dependencies) on the problem's knowledge attributes for targeted calibration, ultimately outputting a more accurate and robust weight value. This design ensures that the weight calculation process fully responds to dynamic signals (confidence, coverage) during the generation process while closely integrating the static knowledge attribute analysis of the problem (complexity, dependency), achieving a deep fusion of dynamic and static information. This improves the accuracy and contextual adaptability of the system's decisions, ensuring that unnecessary resource consumption and interference are reduced when the language model can provide an answer, and that external support is enhanced when the language model is insufficient to provide an accurate answer. This improves system response speed and resource utilization, reduces illusions, and increases the accuracy of responses.
[0018] In one possible implementation, the corresponding knowledge fusion strategy is determined based on the fusion weights, including: When the fusion weight is not lower than the first threshold, the knowledge base priority fusion strategy is adopted. Alternatively, when the fusion weight is below the first threshold, a fusion strategy dominated by the language model's own knowledge can be adopted.
[0019] In one possible implementation, the strategy for retrieving information from the knowledge base based on fusion weights includes: When the fusion weight is not lower than the first threshold, the search scope is expanded to multiple related fields; Alternatively, when the fusion weight is below the first threshold, the search scope can be narrowed down to a single domain.
[0020] This embodiment achieves a clear and robust decision-making transition from continuous quantities (fusion weights) to discrete actions (fusion strategies). Setting a first threshold essentially defines an action boundary, enabling the system to decisively switch between two fundamentally different processing modes: one actively introducing and deeply fusing external evidence, and the other using the model's intrinsic knowledge as the core and external information as verification. This binary decision-making mechanism based on a clear threshold not only makes system behavior easier to understand, control, and optimize, but more importantly, it ensures that subsequent retrieval, fusion, and generation stages operate effectively under the guidance of this strategy. This allows complex problems to focus on external knowledge, while simple problems rely on the model's intrinsic knowledge, thereby improving the accuracy of the answers and reducing illusions.
[0021] In one possible implementation, the method further includes: After generating the final answer, record the complexity score, dependency judgment result, confidence, knowledge coverage, fusion weight, and answer accuracy. The recorded data is used to optimize the calculation parameters involved in determining the fusion weights.
[0022] This embodiment enables the retrieval enhancement system to possess self-iterative and continuous learning capabilities. This mechanism transforms single, static question-and-answer interactions into data fuel for long-term system performance optimization, achieving a complete closed loop from "execution" to "recording" to "optimization." It can automatically learn from historical successes and failures, continuously fine-tuning its fusion decision model to better adapt to the data distribution and user needs of specific application domains. This not only improves the robustness and adaptability of the system's long-term deployment but also effectively reduces the cost of manual parameter tuning during later maintenance. It ensures that system performance continues to improve over time, providing an intrinsic and sustainable driving force for achieving and maintaining a lower illusion rate and higher response accuracy.
[0023] According to a second aspect of the embodiments of this application, a retrieval enhancement generation apparatus is provided, the apparatus comprising: The question receiving module is used to receive questions input by the user. The problem analysis module is used to obtain a complexity score for the problem and a judgment result on the dependence of the problem on the language model's own knowledge. The complexity score is used to quantify the difficulty of the problem, and the dependence judgment result indicates the degree to which the problem depends on the language model's own knowledge. The model state monitoring module is used to input questions into the language model and obtain the confidence of the language model in answering the questions and the knowledge coverage of the knowledge points involved in the questions in the language model's own knowledge. The adaptive fusion module is used to determine the fusion weights of the knowledge base when fusion with the language model's own knowledge, based on complexity scores, dependency judgment results, confidence scores, and knowledge coverage; and to determine the corresponding knowledge fusion strategies based on the fusion weights. The knowledge base interaction module is used to adjust the strategy for obtaining retrieval information from the knowledge base based on the fusion weight, so as to obtain the target retrieval information. The adaptive fusion module is also used to fuse target retrieval information with the language model's own knowledge based on a knowledge fusion strategy to generate the final answer to the question.
[0024] According to a third aspect of the embodiments of this application, a computing device is provided. The computing device includes a memory and a processor, the memory storing a computer program, and the processor executing the program to implement the method as described above.
[0025] According to a fourth aspect of the embodiments of this application, a computer-readable storage medium is provided, on which a computer program is stored, which, when executed by a processor, implements the methods described in the embodiments of this application.
[0026] According to a fifth aspect of the embodiments of this application, a computer program product is provided, including a computer program that, when executed by a processor, implements the methods described above in the embodiments of this application. Attached Figure Description
[0027] More details, features, and advantages of embodiments of the present application are disclosed in the following description of exemplary embodiments in conjunction with the accompanying drawings, in which: Figure 1 A schematic diagram of the system architecture provided for an exemplary embodiment of this application; Figure 2 A flowchart illustrating an attention layer weighted fusion algorithm provided for an exemplary embodiment of this application; Figure 3 A flowchart of a retrieval enhancement generation method provided as an exemplary embodiment of this application; Figure 4 A schematic diagram of an enhanced retrieval architecture provided for an exemplary embodiment of this application; Figure 5 A schematic block diagram of the functional modules of a retrieval enhancement generation apparatus provided in an exemplary embodiment of this application; Figure 6 A structural block diagram of a server provided for an exemplary embodiment of this application. Detailed Implementation
[0028] Embodiments of this application will now be described in more detail with reference to the accompanying drawings. While some embodiments of this application are shown in the drawings, it should be understood that embodiments of this application can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of the embodiments of this application. It should be understood that the accompanying drawings and embodiments of this application are for illustrative purposes only and are not intended to limit the scope of protection of this application.
[0029] It should be understood that the various steps described in the method implementation of this application may be performed in different orders and / or in parallel. Furthermore, the method implementation may include additional steps and / or omit the steps shown. The scope of this application is not limited in this respect.
[0030] The term "comprising" and its variations as used herein are open-ended, meaning "including but not limited to". The term "based on" means "at least partially based on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Definitions of other terms will be given in the following description. It should be noted that the concepts of "first", "second", etc., mentioned in the embodiments of this application are only used to distinguish different devices, modules, or units, and are not used to limit the order of functions performed by these devices, modules, or units or their interdependencies.
[0031] It should be noted that the terms "one" and "more" mentioned in the embodiments of this application are illustrative rather than restrictive. Those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".
[0032] The names of the messages or information exchanged between multiple devices in the embodiments of this application are for illustrative purposes only and are not intended to limit the scope of these messages or information.
[0033] As an optional but non-limiting implementation, in response to a user's active request, sending a prompt message to the user can be done via a pop-up window, where the prompt message can be presented in text format. Furthermore, the pop-up window can also include a selection control allowing the user to choose "agree" or "disagree" to provide personal information to the electronic device. It is understood that the above notification and user authorization process is merely illustrative and does not constitute a limitation on the implementation of this application's embodiments. Other methods that comply with relevant laws and regulations can also be applied to the implementation of this application's embodiments.
[0034] like Figure 1 As shown, Figure 1 This is a schematic diagram of a system architecture provided as an example of an embodiment of this application. The system may include a terminal 10, a server 20, a language model 30, and a knowledge base 40. Specifically, the server 20 may include a problem analysis module 21, a model state monitoring module 22, an adaptive fusion module 23, and a knowledge base interaction module 24.
[0035] It should be noted that the language model 30 and the knowledge base 40 can be deployed on different servers, or on the same server, or both on server 20. The embodiments are not limited to these.
[0036] Terminal 10, as a user interface, is used to receive questions input by the user, send the questions to server 20 for processing, receive the answers to the questions sent by server 20, and display the answers to the user.
[0037] Server 20 is used to receive questions sent by terminal 10. Through the coordinated operation of its internal modules, it analyzes the questions, monitors the model status, decides on fusion strategies, retrieves external knowledge and completes knowledge fusion, and finally generates the answer and returns it to terminal 10.
[0038] Language model 30, specifically a large language model (LLM), is used to generate text answers based on the input question and contextual information. Server 20 obtains its initial output or internal state by inputting the user's question into language model 30, and finally passes the fused information to language model 30 to generate the final answer, or directly generates the final answer using the fused information.
[0039] Knowledge base 40 is an external trusted knowledge source for the system, used to store structured and unstructured domain knowledge. When needed, it provides the system with supplementary, real-time or professional reference information to make up for the lack of knowledge of language model 30 itself or to correct any illusions that may exist in language model 30.
[0040] In this embodiment, server 20 integrates the following core functional modules to achieve adaptive knowledge fusion: The problem analysis module 21 serves as the starting point for the system to perceive and understand the user-inputted problem. It is used to perform in-depth analysis of the original input problem, extract and quantify the key characteristics of the problem. Specifically, it can include assessing the complexity of the problem (such as judging its difficulty by the number of entities, syntactic structure, etc.) and judging the degree of dependence of the problem on the knowledge of the language model 30, providing core basis for subsequent fusion decisions.
[0041] In this embodiment, the problem analysis module 21 is responsible for extracting deep features from the input user questions, providing key decision-making basis for the adaptive knowledge fusion mechanism. The problem analysis module 21 processes questions primarily in the following two dimensions: (1) Complexity assessment.
[0042] This assessment aims to quantify the difficulty level of the problem. Specifically, the module uses semantic parsing technology to extract three key features from the problem: Number of entities: Identify the specific objects or concepts mentioned in the problem.
[0043] Semantic dependency depth: The hierarchical depth of the grammatical and semantic modification relationships between words in a sentence.
[0044] Interdisciplinary overlap: Determine whether the problem involves multiple professional fields (e.g., a cross-disciplinary problem involving both "medical" and "engineering").
[0045] The evaluation process is as follows: First, entity extraction, semantic dependency analysis, and domain identification are sequentially performed on the input question text to obtain quantified values corresponding to the three features, denoted as S1, S2, and S3, respectively. Then, these feature values are substituted into a preset weighted calculation formula: Complexity Score C = α1 × S1 + α2 × S2 + α3 × S3 (where α1, α2, and α3 are the weight coefficients of each feature, and α1 + α2 + α3 = 1). Finally, the calculation result C is normalized (e.g., 1 if C > 1, 0 if C < 0) to ensure that the output complexity score C falls within the standardized range of 0 to 1, where 0 represents the simplest and 1 represents the most complex.
[0046] (2) Knowledge dependence judgment.
[0047] This judgment is used to assess the degree to which the user's question depends on the knowledge already possessed by the language model 30 itself, in order to determine whether a heavy reliance on external knowledge bases is necessary. The judgment relies on the three main knowledge carriers of the language model 30: K1 (Pre-trained Knowledge Graph): Stores structured entity-relationship knowledge.
[0048] K2 (pre-trained corpus): stores unstructured text knowledge.
[0049] K3 (Domain Fine-tuning Parameters): Stores the adaptation knowledge obtained after fine-tuning for the vertical domain.
[0050] The judgment process includes the following steps: First, calculate the relevance of the user's question to K1, K2, and K3 respectively, obtaining M1, M2, and M3. Then, sum these relevances using weighted averages to calculate the dependency score D, for example: D = w1×M1 + w2×M2 + w3×M3 (where w1, w2, and w3 are weights, and w1+w2+w3=1). In a specific calculation example, assuming M1=1.0, M2=0.944, and M3=0.5, with weights of 0.4, 0.5, and 0.1 respectively, then D = 0.4×1.0 + 0.5×0.944 + 0.1×0.5 =0.922. Finally, compare the score D with a preset threshold T obtained through training on a test dataset to output the final dependency judgment label (e.g., "high dependency" or "low dependency"). For example, if the threshold T=0.7 in a general scenario, then since 0.922≥0.7, the label "high dependency" will be output. This label will directly affect the calculation of subsequent fusion weights, such as reducing the weight of external knowledge when "high dependency" is applied.
[0051] The model state monitoring module 22 is used to perceive the internal state of the language model 30 in real time and monitor its dynamic performance when processing the current problem. Key indicators monitored include the confidence level of the answer generated by the language model 30 (reflecting the certainty of its output) and the coverage of the language model 30's pre-trained knowledge with the current problem (reflecting the relevance of its own knowledge). This state data is an important input for dynamically adjusting the knowledge fusion strategy.
[0052] In this embodiment, the model state monitoring module 22 is responsible for sensing and quantifying the internal state of the language model 30 in real time when processing the current user question, providing real-time and dynamic model-side basis for fusion decision-making. Its monitoring mainly includes the following two key dimensions: (1) Calculation of model confidence.
[0053] This metric is used to evaluate the certainty of the language model 30 in generating its own answers. The module calculates the confidence level by statistically analyzing the entropy of the probability distribution generated by the language model 30 in predicting the next candidate token during the token-by-token answer generation process.
[0054] Specifically, at each step of the generation process, the model outputs a probability distribution covering the entire vocabulary, representing the likelihood of each word becoming the next output. The entropy value of this distribution reflects the degree of concentration or disorder in the model's predictions: the lower the entropy value, the more concentrated the probability distribution, and the more confident the model is in the prediction results, i.e., the higher the confidence level; conversely, the higher the entropy value, the more uncertain the model is, and the lower the confidence level. This method transforms the inherent uncertainty of the model generation process into a quantifiable confidence index. It should be noted that the model involved in the embodiment can specifically be language model 30.
[0055] (2) Knowledge coverage assessment.
[0056] This metric assesses the prevalence or relevance of the core knowledge points involved in the current user question within the corpus encountered by the language model 30 during its pre-training phase. In implementation, the module uses the language model 30's internal knowledge index to statistically analyze the frequency of key knowledge points (such as entities and core concepts) extracted from the question within its vast pre-training corpus. The frequency is then standardized (e.g., normalized to between 0 and 1) to obtain a knowledge coverage score. A higher score (closer to 1) indicates that the relevant knowledge appears frequently in the model's pre-training corpus, suggesting a more comprehensive knowledge base within the model; a lower score (closer to 0) indicates that the question may touch upon the edge or blind spots of the model's pre-training knowledge.
[0057] The adaptive fusion module 23 integrates the outputs of the problem analysis module 21 and the model state monitoring module 22, dynamically calculates the fusion weights of the knowledge base 40 information and the language model 30's own knowledge in the final answer generation, and determines the optimal knowledge fusion strategy accordingly (e.g., whether to prioritize external knowledge or rely primarily on the model's own knowledge). This module achieves a crucial leap from "perception" to "decision-making".
[0058] In this embodiment, the adaptive fusion module 23 is the decision-making center of the system. It receives the "complexity score" and "knowledge dependency label" from the problem analysis module 21, as well as the "model confidence" and "knowledge coverage score" from the model state monitoring module 22, and dynamically decides the core parameters of knowledge fusion based on these multi-dimensional information.
[0059] (1) Calculation of dynamic fusion weights.
[0060] One of the core functions of the module is to calculate a dynamic fusion weight W (ranging from 0 to 1). This weight directly determines the relative proportion of information provided by the knowledge base 40 and the knowledge of the language model 30 itself in the fusion process when generating the final answer. The larger the W value, the more the final answer tends to rely on and integrate external knowledge; the smaller the W value, the more it should rely on and depend on the model's own knowledge.
[0061] The calculation of weights follows a pre-defined rule that comprehensively considers both problem characteristics and model state. In a specific embodiment, this rule can be represented by the following logical formula: The fusion weight W = α×complexity score + β×(1 - model confidence) + γ×(1 - knowledge coverage score).
[0062] Here, α, β, and γ are adjustable weight coefficients, satisfying α + β + γ = 1. These coefficients can be trained and optimized using a large number of historical dialogue samples to adapt the system to different application scenarios.
[0063] α× Complexity score: The more complex the problem (the higher the score), the more likely it is to increase the weight of external knowledge in order to introduce more professional or cross-disciplinary information.
[0064] β×(1 - model confidence): The more uncertain the model is about the answer (the lower the confidence, the higher 1 - confidence), the more it needs to increase the weight of external knowledge to provide support and correction.
[0065] γ×(1 - Knowledge Coverage Score): The lower the coverage of the problem knowledge in the model's pre-training corpus (the lower the score, the higher the 1 - coverage score), the more it indicates that the model's own knowledge is insufficient, thus increasing the weight of external knowledge.
[0066] In addition, to prevent the model from excessively retrieving external noise information for simple problems that it has already mastered, when the knowledge dependency label output by the problem analysis module 21 is "high dependency", the module will make an additional downward adjustment to the initially calculated W value (for example, subtract a fixed value of 0.2) to finally determine the fusion weight.
[0067] (2) Integration strategy decision.
[0068] Based on the calculated final fusion weight W, the adaptive fusion module 23 maps it to a specific knowledge fusion strategy. For example, the system can preset a decision threshold (e.g., 0.6): when W ≥ 0.6, the "external knowledge priority" fusion strategy is triggered, meaning the retrieval results will be used as the main information and deeply weighted with model knowledge for fusion; when W < 0.6, the "model knowledge-driven" fusion strategy is triggered, meaning the content generated by the model itself is mainly used, and the retrieval results are only used as a reference for verification or fine-tuning. This dynamic decision-making mechanism realizes intelligent adaptation across the entire chain from "whether to retrieve" to "how to use the retrieval results".
[0069] In this embodiment, after determining the fusion weight W, the adaptive fusion module 23 selects and executes a specific knowledge fusion strategy based on its value to achieve efficient and intelligent integration of information from the external knowledge base 40 and the knowledge of the language model 30 itself. The fusion logic and implementation steps are as follows: (1) Decision-making and execution of integration strategy.
[0070] The choice of fusion strategy is directly determined by the fusion weight W, reflecting the principle of "dynamic adaptation": External knowledge-first fusion strategy (triggered when W ≥ 0.6): This strategy is suitable for complex, uncertain, or problems with insufficient model knowledge coverage. During execution, the system prioritizes structuring the raw information retrieved from the knowledge base (e.g., breaking it down and organizing it according to patterns such as "entity-relationship" or "claim-evidence"), transforming it into a more regular knowledge representation. Then, these structured external knowledge representations are weighted and concatenated or deeply interacted with the language model's own knowledge or contextual representations according to the proportion indicated by W (or weights dynamically calculated through an attention mechanism), ensuring that external evidence plays a dominant or significant role in the final answer generation.
[0071] Model knowledge-driven fusion strategy (triggered when W < 0.6): This strategy is suitable for relatively simple problems with high model confidence and comprehensive knowledge coverage. During execution, the system primarily relies on the content generated by the language model itself as the backbone of the answer. External knowledge retrieved under this strategy mainly plays the role of a "verifier" or "safety barrier." The system will compare the model output with the retrieved information for consistency: if they are consistent in key facts, the credibility of the model output is enhanced; if contradictions are found, a more credible source (usually favoring verifiable external knowledge) may be selected or a reprocessing process may be triggered, thereby effectively eliminating potential illusions generated by the model and avoiding the introduction of unnecessary noise into simple problems.
[0072] (2) Knowledge fusion based on rectangular attention mechanism.
[0073] To achieve efficient integration of external and internal knowledge in the aforementioned strategy, this system employs an improved attention mechanism (which can be called a rectangular attention mechanism) in a preferred embodiment to replace or enhance some attention computations in the standard Transformer architecture. Its core idea is to construct a fusion layer during the model decoding phase that allows for deep and dynamic interaction between internally generated states and external knowledge representations.
[0074] The specific implementation process can be divided into the following stages: Input phase: The system processes two inputs in parallel. One is the user's original query, and the other is the relevant original documents or knowledge fragments provided by the knowledge base interaction module 24.
[0075] Encoding phase: External knowledge path: Encode the retrieved documents (e.g., using a dedicated encoder or a shared model underlying) to generate a series of knowledge embedding vectors (denoted as K), which serve as "knowledge memory" that can be queried during fusion.
[0076] Internal model path: The user query is encoded and processed through the model's own self-attention layer, etc., to form an internal representation tensor (denoted as A) that represents the model's current internal state and inference context.
[0077] The core fusion stage (cross-attention layer): This stage is the key to achieving dynamic fusion.
[0078] Cross-attention calculation is performed using the model's internal representation A as the query and the external knowledge embedding K as the key and value.
[0079] The similarity between items in A and K is calculated and normalized using the Softmax function to obtain a set of dynamic attention weights α. The weights α reflect the degree of attention the internal reasoning process pays to external knowledge.
[0080] The external knowledge value is weighted and summed using weight α. Then, this weighted external knowledge information is combined with the internal representation A through matrix addition or linear transformation after concatenation to generate a new context tensor (fusion tensor) that integrates internal and external information.
[0081] Output phase: The generated "fusion tensor" is input into the decoder of language model 30. Based on this enhanced context, which has been infused with relevant external knowledge, the decoder autoregressively generates the final intelligent response that combines model intelligence with external factual evidence.
[0082] like Figure 2 As shown, Figure 2 A flowchart illustrating the algorithm for weighted fusion at the attention layer. Figure 2 The flowchart visually illustrates the specific algorithmic architecture for knowledge fusion during the decoding stage in this application embodiment, namely an improved "encoder-decoder" process. The core of this architecture lies in deeply integrating externally retrieved multi-source knowledge with the model's internal state through an attention mechanism before decoding generation. The specific process may include: (1) Multi-source knowledge input and independent encoding.
[0083] The flowchart at the top shows N parallel input branches, each representing a retrieved Retrieved Content (RC) and its surrounding context. These branches correspond to multiple related documents or knowledge fragments obtained by the knowledge base interaction module 24. Each "RC + Surrounding Context" pair is independently fed into an encoder for processing. This parallel and independent encoding design ensures a full understanding and vectorized representation of each piece of external knowledge, avoiding potential confusion or loss of information during early mixing.
[0084] (2) Encoding represents aggregation.
[0085] The output representations from all the independent encoders are concatenated (symbolized by "|||" in the diagram) to form a comprehensive contextual memory containing multi-perspective external knowledge. This memory contains high-dimensional representations of all relevant information extracted from the knowledge base, waiting to be queried and used during the generation phase.
[0086] (3) Fusion generation (decoding).
[0087] The bottom of the flowchart is the crucial decoding and generation stage. The decoder receives the model's own understanding of the user's question (internal state, not explicitly shown in the diagram, but implicit in the decoder's initial state), and can simultaneously access the context memory built in the preceding steps, which contains all external knowledge, through the attention mechanism.
[0088] Specifically, the decoder performs the following operations when generating each new token: Cross-attention computation: Compute the current hidden state of the decoder (as a query) against all encoded representations (as keys and values) in the external knowledge memory.
[0089] Dynamic weight allocation: Through mechanisms such as Softmax, a dynamic attention weight is assigned to each piece of knowledge in memory. This achieves "on-demand" access—the decoder decides which part of the external knowledge to focus on based on the needs of the current generation.
[0090] Context-aware generation: The decoder integrates weighted and aggregated external knowledge information with its own internal state to jointly predict the next token. This process is repeated until a complete final answer is generated.
[0091] The implementation method preserves the independence and integrity of each piece of knowledge by encoding each retrieval result individually. During the decoding phase, an attention mechanism is used to uniformly query all encoded knowledge, enabling the model to dynamically and selectively integrate multi-source information based on the generation context. The entire process (retrieval, encoding, attention fusion, and generation) can be jointly trained or optimized, allowing the model to learn how to most effectively utilize external evidence to generate more accurate and less misleading answers.
[0092] Through the aforementioned strategies and mechanisms, the system not only adaptively selects the fusion strategy based on the problem and model state at the macro level, but also achieves the organic integration of external knowledge and internal reasoning processes at the micro level of model computation, thereby fundamentally improving the accuracy and reliability of the answer.
[0093] The knowledge base interaction module 24 serves as an interface with an external knowledge base (knowledge base 40). It dynamically adjusts the retrieval behavior (e.g., expands or narrows the retrieval scope) according to the fusion weights and strategies determined by the adaptive fusion module 23, accurately and efficiently obtains the most relevant target retrieval information from the knowledge base 40, and provides this information to the adaptive fusion module 23.
[0094] In this embodiment, the knowledge base interaction module 24 is the "strategy executor" for intelligent interaction between the system and the external knowledge world. It receives the core decision signal—fusion weight W—output from the adaptive fusion module 23, and dynamically and finely adjusts the retrieval behavior of the external knowledge base 40 accordingly. Its goal is to ensure that the retrieved information is highly matched with the actual needs of the current problem and the decision-making mode of the system, thereby providing sufficient knowledge support while minimizing the introduction of irrelevant or low-quality information.
[0095] The module works on the basis of a weight-policy mapping rule. The core logic of this rule is that the level of the fusion weight W directly reflects the system's judgment on the "degree of dependence on external knowledge". Therefore, the retrieval strategy should change in tandem with it to optimize the balance between the "quality" and "quantity" of the retrieval results.
[0096] When W ≥ 0.6: This situation typically corresponds to scenarios with complex problems, uncertain models, or insufficient knowledge coverage. The system decision tends to prioritize the fusion of external knowledge. In this case, the knowledge base interaction module 24 will execute an extended retrieval strategy. Specifically, the module will proactively expand the scope of the retrieved knowledge domain (e.g., from one core domain to three related domains) and adjust the parameters of the retrieval algorithm to improve recall. The purpose of this is to recall as many potentially relevant information fragments as possible, providing rich raw materials for subsequent in-depth analysis and fusion, and avoiding the omission of key evidence due to an overly narrow retrieval scope.
[0097] When W < 0.6: This scenario typically corresponds to simple problems, high model confidence, and comprehensive knowledge coverage. The system's decision-making tends towards "model knowledge-driven fusion." In this case, the knowledge base interaction module 24 executes a precision retrieval strategy. Specifically, the module strictly limits the retrieval scope (e.g., only searching within a single core domain that best matches the problem) and adjusts the parameters of the retrieval algorithm to improve accuracy. The purpose of this is to perform only a small-scale, high-precision confirmatory retrieval, provided the model itself can already answer the question well. This allows for the rapid acquisition of a small amount of the most relevant reference information for consistency verification, effectively controlling the risk of introducing noise and improving the overall system response efficiency.
[0098] Through the aforementioned dynamic strategy adjustments, the knowledge base interaction module 24 transforms the system's retrieval behavior from fixed and blind to an intelligent and adaptive process closely linked to problem analysis, model state evaluation, and fusion decision-making. This not only optimizes the efficiency of knowledge acquisition but also lays a reliable information foundation for generating high-quality, low-illusion responses.
[0099] Through the collaboration of the four modules mentioned above, server 20 can achieve intelligent perception of user questions, real-time monitoring of model status, adaptive decision-making on fusion strategies, and accurate utilization of external knowledge, thereby alleviating the illusion problem that occurs during the retrieval enhancement generation process.
[0100] Based on the above embodiments, this application also provides a retrieval enhancement generation method, which can be applied to the aforementioned server 20, such as... Figure 3 As shown, the method may include the following steps: In step S310, the user inputs a question.
[0101] In this embodiment, the user-submitted natural language questions can be received through the user interface of terminal 10, and the entire processing flow can be initiated. Server 20 receives the questions sent by terminal 10.
[0102] In step S320, the complexity score of the problem and the dependence judgment result on the language model's own knowledge are obtained. The complexity score is used to quantify the difficulty of the problem, and the dependence judgment result indicates the degree to which the problem depends on the language model's own knowledge.
[0103] This step corresponds to the function of the problem analysis module 21 mentioned above, which processes the received user problems.
[0104] In the embodiment, features such as the number of entities, semantic dependency depth, and domain overlap in the problem are extracted through semantic parsing, and a complexity score (e.g., a value between 0 and 1) is generated through weighted calculation and normalization.
[0105] By calculating the relevance between the question and the pre-trained knowledge graph, corpus, and fine-tuning parameters of the language model 30, and comparing the weighted scores with thresholds, a judgment result on the degree of dependence of the question on the model's own knowledge is output (such as a "high dependence" or "low dependence" label).
[0106] In step S330, the question is input into the language model, and the confidence of the language model in answering the question and the knowledge coverage of the knowledge points involved in the question in the language model's own knowledge are obtained.
[0107] This step corresponds to the function of the model status monitoring module 22, which inputs user questions into the language model 30 and monitors its processing status in real time: Obtaining confidence level: By calculating the entropy value of the predicted probability distribution of the language model during the word-by-word generation of answers, the certainty of the model's own answer can be quantified. The lower the entropy value, the higher the confidence level.
[0108] Knowledge coverage is obtained by using the internal index of the language model to count the frequency of occurrence of the core knowledge points involved in the problem in its massive pre-training corpus and converting them into a coverage score (e.g., a value between 0 and 1) to evaluate the degree of background support of the model's own knowledge for the problem.
[0109] In step S340, the fusion weight of the knowledge base when fusion with the language model's own knowledge is determined based on the complexity score, dependency judgment result, confidence and knowledge coverage.
[0110] This step integrates four key evaluation dimensions from steps S320 and S330 for the core computational functions of the adaptive fusion module 23: The complexity score, (1-confidence) and (1-knowledge coverage score) are used as positive driving forces and linearly combined according to the preset trainable coefficients (α, β, γ) to initially calculate a fusion weight W (between 0 and 1).
[0111] Furthermore, this initial weight is calibrated based on the dependency assessment results (for example, the W value is reduced when the dependency is judged to be "high"), ultimately determining a dynamic fusion weight. This weight directly determines the relative importance ratio of external knowledge base information and the language model's own knowledge in answer generation during subsequent processes.
[0112] In step S350, the corresponding knowledge fusion strategy is determined based on the fusion weights.
[0113] This step, regarding the decision-making function of the adaptive fusion module 23, can map the final fusion weight W calculated in step S340 to a specific high-level fusion strategy: If the W value is not lower than the preset threshold (e.g., 0.6), then the "external knowledge priority fusion" strategy is adopted.
[0114] If the W value is lower than the threshold, then the "model knowledge-driven fusion" strategy is adopted.
[0115] In step S360, the strategy for obtaining retrieval information from the knowledge base is adjusted based on the fusion weight to obtain the target retrieval information. Then, the target retrieval information is fused with the language model's own knowledge based on the knowledge fusion strategy to generate the final answer to the question.
[0116] This step coordinates the execution functions of the knowledge base interaction module 24 and the adaptive fusion module 23, and includes two levels of operations: Adaptive retrieval: The knowledge base interaction module 24 adjusts the retrieval behavior based on the fusion weight W. When W is high, it performs expanded retrieval to improve recall; when W is low, it performs precise retrieval to improve precision, thereby obtaining the target retrieval information that best matches the current context from the knowledge base 40.
[0117] Strategic Fusion and Generation: The system executes specific fusion operations according to the knowledge fusion strategy determined in step S350. For example, under the "external knowledge priority" strategy, the target retrieval information is deeply integrated into the language model generation process; under the "model knowledge-driven" strategy, the retrieval information is mainly used to verify the model output. Finally, the language model 30 generates and outputs the final answer to the question based on the fused information context.
[0118] This embodiment can achieve intelligent perception of user questions, real-time evaluation of model status, dynamic decision-making on fusion strategies, and accurate utilization of external knowledge, thereby improving the accuracy and reliability of answers in the retrieval enhancement generation process and effectively alleviating the illusion phenomenon generated by the language model when generating answers.
[0119] Based on the above embodiments, in order to transform an abstract problem into a specific numerical indicator through a set of quantifiable and computable standardized operations, in another embodiment provided in this application, when obtaining a complexity score for the problem, the above step S320 may further include the following steps: Step S321: Extract the number of entities, semantic dependency depth, and domain overlap in the problem.
[0120] The example first performs deep semantic parsing on the natural language question input by the user. By calling or integrating named entity recognition, dependency parsing, and domain classification models, three core feature metrics are extracted from the question text: Entity count: Identify and count the total number of specific named entities (such as people's names, place names, technical terms, etc.) that appear in the problem.
[0121] Semantic dependency depth: Analyze the grammatical structure of the problem sentence and calculate the maximum or average depth of its dependency syntax tree to reflect the complexity of the syntactic structure.
[0122] Interdisciplinary overlap: This metric determines the number of professional fields involved in the problem. For example, a problem that involves both "computer science" and "molecular biology" has an interdisciplinary overlap of 2. This metric is used to identify complex, cross-disciplinary problems.
[0123] Step S322: The number of entities, semantic dependency depth and domain crossover are weighted and calculated, and the calculation results are normalized to obtain a complexity score.
[0124] In this embodiment, after extracting the original feature values of the three dimensions mentioned above, the system performs a comprehensive calculation using a preset, optimizable weighted formula: Complexity Score = α × (Number of Entities) + β × (Semantic Dependency Depth) + γ × (Domain Crossover). Here, α, β, and γ are the weight coefficients corresponding to each feature, and their sum is 1. These coefficients can be obtained through machine learning training on a specific dataset to achieve the best fit for the difficulty of different types of problems. Subsequently, the weighted calculation result is normalized (e.g., through max-min scaling or the Sigmoid function), mapping it to a fixed interval (e.g., 0 to 1), ultimately outputting a standardized, comparable complexity score. A higher score indicates a more complex problem.
[0125] Through steps S321 and S322 described above, this embodiment transforms the subjective judgment of problem complexity into an objective and automated calculation based on multi-dimensional semantic features. This quantification method not only achieves a consistent assessment of the difficulty of problems but also provides accurate input variables for subsequent adaptive fusion decisions. By fusing three key indicators that characterize problems from the perspectives of information density, structural complexity, and knowledge breadth—"entity count," "semantic dependency depth," and "domain cross-cutting"—it can accurately distinguish whether the user's input problem is a simple query or a complex one, thereby triggering the corresponding knowledge retrieval and fusion strategies more precisely.
[0126] Based on the above embodiments, in order to accurately determine the degree of dependence of the user's question on the existing knowledge within the language model, in another embodiment provided in this application, in order to obtain the determination result of the dependence on the language model's own knowledge, the above step S320 may further include the following steps: Step S323: Calculate the correlation between the problem and the pre-trained knowledge graph, pre-trained corpus, and domain fine-tuning parameters of the language model.
[0127] In this embodiment, relevance calculations are performed on the three types of core knowledge carriers of language model 30: Relevance with pre-trained knowledge graph (K1): Match entities and relations in the question with structured facts in the knowledge graph, calculate semantic matching or embedding similarity score (M1), and evaluate the question's dependence on structured fact knowledge.
[0128] Relevance with the pre-training corpus (K2): The question is treated as a query, and semantic retrieval is performed in the large-scale unstructured text corpus used for model pre-training or the likelihood value is calculated through the language model itself to obtain a relevance score (M2), which evaluates the question's dependence on general text knowledge.
[0129] Relevance to domain fine-tuning parameter (K3): By analyzing the fit between the problem features and the feature distribution represented by the parameters obtained after the model is fine-tuned in the domain, a relevance score (M3) is calculated to evaluate the dependence of the problem on specific vertical domain knowledge.
[0130] Step S324, perform weighted calculation on the relevance to obtain a dependence score.
[0131] The embodiment assigns different weights (w1, w2, w3, and w1 + w2 + w3 = 1) to the above three relevance scores (M1, M2, M3), and performs weighted summation to calculate a comprehensive dependence score (D), that is: D = w1×M1 + w2×M2 + w3×M3. The weights can be set according to the importance of the three types of knowledge in different application scenarios or obtained through data learning to ensure that the score can accurately reflect the overall dependence level.
[0132] Step S325, compare the dependence score with a preset threshold, and output a dependence judgment result based on the obtained comparison result.
[0133] By comparing the calculated comprehensive dependence score D with a preset threshold (T) determined through training and optimization with a large amount of test data. Output the corresponding classification judgment according to the comparison result: If D ≥ T, output the judgment result of "high dependence", indicating that the problem can be largely solved by relying on the model's own knowledge.
[0134] If D < T, output the judgment result of "low dependence", indicating that the problem exceeds the comfort zone of the model's own knowledge and external knowledge bases need to be relied on.
[0135] Through the above steps of this embodiment, the transformation from fuzzy estimation of the problem knowledge dependence to precise calculation and classification judgment can be realized. The embodiment constructs a comprehensive and robust evaluation system by comprehensively evaluating the relevance of the problem to the model's structured knowledge (knowledge graph), unstructured knowledge (corpus), and domain adaptation knowledge (fine-tuning parameters), and can more delicately identify those problems that seem simple but actually require external verification, as well as those problems that seem unfamiliar but have sufficient implicit knowledge inside the model. This judgment can directly act on the subsequent calculation of the fusion weight (for example, actively reducing the external knowledge weight when "high dependence"), effectively preventing excessive redundant retrieval of the knowledge already mastered by the model, and at the same time avoiding blind confidence in the model's knowledge blind spots, so as to accurately balance the utilization of internal and external knowledge at the system level.
[0136] Based on the above embodiments, in another embodiment provided in this application, the confidence level of the language model in answering the question is refined, so as to measure the degree of certainty of the language model's output when generating the answer in real time and quantitatively through objective and calculable intrinsic indicators. Therefore, the above step S330 may further include the following steps: Step S331: During the process of generating answers word by word using the language model, calculate the entropy value of the probability distribution of the language model for predicting the generation of the next word.
[0137] When a language model generates answers in an autoregressive manner, specifically at each step (i.e., when generating each word), it calculates the probability that all candidate words in the entire vocabulary will become the next output word based on the current context, thus forming a probability distribution. This probability distribution can be captured and analyzed in real time. The entropy value of the probability distribution directly measures the degree of dispersion or uncertainty of the distribution: a highly concentrated distribution (where the probability of a word is close to 1) has a low entropy value; while a nearly uniform distribution, where choices are difficult to make, has a high entropy value.
[0138] Step S332: Determine the confidence level of the language model's answer to the question based on the entropy value; wherein, the entropy value is inversely correlated with the confidence level.
[0139] The embodiment uses the entropy value calculated in step S331 as a direct basis for determining the confidence level of the model's answer to the current question. There is a clear inverse correlation between the two: the lower the entropy value of the probability distribution, the more "certain" the model is in generating each step, and therefore the higher its overall confidence level; conversely, the higher the entropy value, the more "hesitant" and uncertain the model is in the generation process, and the lower its overall confidence level. A comprehensive confidence quantification can be obtained by statistically analyzing the entropy values at each step of the entire generation process (e.g., averaging, taking the maximum value, etc.).
[0140] This embodiment provides an objective quantification method based on information entropy that can be computed in real time. It does not rely on external feedback but directly captures the most realistic uncertainty state of the model through the language model's generation mechanism, specifically the probability distribution of each prediction step. This approach can accurately distinguish whether the language model's output is deterministic or uncertain, providing a signal for subsequent adaptive fusion decisions. This allows the language model to utilize an external knowledge base when outputting uncertain answers, reducing the illusions the language model may have when generating responses.
[0141] Based on the above embodiments, in order to evaluate the coverage of the knowledge involved in a user's question within the existing knowledge system of the language model through quantifiable statistical methods, and thus determine the background support strength of the language model's own knowledge for the question, in another embodiment provided in this application, when obtaining the knowledge coverage of the knowledge points involved in the question within the language model's own knowledge, the above step S330 may further include the following steps: Step S333: Based on the internal knowledge index of the language model, calculate the frequency of occurrence of the knowledge points involved in the problem in the pre-training corpus of the language model.
[0142] The implementation first extracts key knowledge points from the user's question (e.g., core terms or concepts obtained through entity recognition and keyword extraction). Then, it leverages the language model's internal knowledge index (such as inverted indexes, vector indexes, etc.) built or accessible during the pre-training phase to perform rapid queries within the massive pre-training corpus previously learned by the model. The frequency of each extracted knowledge point within the entire pre-training corpus is statistically analyzed. This step correlates the core concepts of the question with the model's knowledge history, quantifying the prevalence of these concepts within the language model's own knowledge base through frequency statistics.
[0143] Step S334: Generate knowledge coverage based on occurrence frequency.
[0144] After obtaining the original frequency of each knowledge point, the implementation example standardizes and aggregates them to generate a comprehensive knowledge coverage score. Specific processing methods may include: logarithmically scaling the frequency of individual knowledge points to smooth out extreme values; aggregating the frequencies of multiple knowledge points through averaging, weighted averaging, or taking the highest value; and finally normalizing the aggregation result to a fixed numerical range (e.g., between 0 and 1). This score is the final knowledge coverage score. A higher score indicates that the knowledge points involved in the problem appear more frequently in the model's pre-training corpus, and the model may be more "familiar" with them; a lower score indicates that these knowledge points are relatively obscure or novel, possibly close to the edge of the model's knowledge base.
[0145] This embodiment enables an objective and quantitative assessment of the knowledge support provided by the language model. By directly and efficiently correlating the knowledge points in the question with the model's pre-training history, it provides an effective means of measuring the model's own knowledge. This statistically based coverage assessment can effectively distinguish between common questions and questions in specialized, obscure, or emerging fields, allowing for the pre-judgment of whether the model has knowledge blind spots. This knowledge coverage, together with the model's confidence level, constitutes a dual guarantee for assessing the model's internal reliability. It provides a decision-making basis for dynamically increasing the weight of external knowledge and proactively introducing external evidence. Thus, when the model faces unfamiliar knowledge domains, it can proactively retrieve enhancements and fusion efforts in advance, effectively preventing illusions caused by insufficient knowledge and improving the accuracy of answers to marginal or specialized questions.
[0146] Based on the above embodiments, in another embodiment provided in this application, step S340 may further include the following steps: Step S341: Based on complexity assessment, confidence level, and knowledge coverage, as well as the weights corresponding to complexity assessment, confidence level, and knowledge coverage, calculate the fusion weight of the knowledge base when it is fused with the knowledge of the language model itself.
[0147] The implementation first calculates the initial weights, in which complexity score, model confidence score, and knowledge coverage score are used as three core input variables. The system assigns preset, optimizable importance weights to these three variables (e.g., α, β, and γ, where α+β+γ=1). During calculation, the complexity score is directly used as a positive driving factor; while the confidence and coverage scores are calculated using their "uncertainty" complements (i.e., "1-confidence score" and "1-coverage score") to reflect the logic that the more uncertain the model or the less knowledge it covers, the more external knowledge is needed. The system obtains a preliminary fusion weight value by adding these three weighted terms. This calculation process achieves a comprehensive quantitative consideration of problem difficulty, model confidence, and knowledge background support.
[0148] Step S342: When the dependency judgment result is high dependency, adjust the fusion weight downward, or when the dependency judgment result is low dependency, adjust the fusion weight upward to obtain the final fusion weight.
[0149] After obtaining the initial weights, a final calibration is performed by introducing another dimension from the problem analysis: the knowledge dependency assessment result. This step can execute a conditional adjustment rule: If the dependency assessment result is "high dependency", it indicates that the current problem largely falls within the scope of the model's knowledge. To avoid unnecessary and potentially noisy external searches for simple or familiar problems, the initial fusion weights can be "down-adjusted" (e.g., by subtracting a fixed offset or attenuating proportionally).
[0150] If the dependency assessment result is "low dependency," it indicates that the problem may extend beyond the model's core knowledge area. To ensure sufficient external evidence is obtained, the initial fusion weights can be "adjusted upwards" (e.g., by adding a fixed value or proportionally boosting them).
[0151] This calibration step yields the final fusion weights used to guide all subsequent operations. This mechanism ensures that the weight calculation not only reflects the real-time state but also respects the essential knowledge attributes of the problem.
[0152] This embodiment first quantifies the objective difficulty of the problem and the real-time state of the model through weighted synthesis, generating a preliminary fusion tendency. Then, it introduces higher-order judgments (dependencies) on the problem's knowledge attributes for targeted calibration, ultimately outputting a more accurate and robust weight value. This design ensures that the weight calculation process fully responds to dynamic signals (confidence, coverage) during the generation process while closely integrating the static knowledge attribute analysis of the problem (complexity, dependency), achieving a deep fusion of dynamic and static information. This improves the accuracy and contextual adaptability of the system's decisions, ensuring that unnecessary resource consumption and interference are reduced when the language model can provide an answer, and that external support is enhanced when the language model is insufficient to provide an accurate answer. This improves system response speed and resource utilization, reduces illusions, and increases the accuracy of responses.
[0153] Based on the above method embodiments, in another specific embodiment provided in this application, the process of determining the corresponding knowledge fusion strategy based on the fusion weight in step S350 is clearly defined. This process maps continuously calculated fusion weight values to discrete, executable macro-strategies, and is a key decision point for achieving adaptive knowledge fusion.
[0154] Specifically, the process involves determining the relationship between the fusion weight W and a preset first threshold (e.g., 0.6), and then choosing one of two explicit strategies: In step S351, when the fusion weight is not lower than the first threshold, the knowledge base priority fusion strategy is determined to be adopted.
[0155] Alternatively, in step S352, when the fusion weight is lower than the first threshold, a fusion strategy dominated by the language model's own knowledge is determined to be adopted.
[0156] In this embodiment, when the fusion weight is determined to be no less than a first threshold (i.e., W ≥ threshold), a knowledge base-first fusion strategy is adopted. This strategy is suitable for scenarios where external knowledge is evaluated as crucial.
[0157] When the system determines that the fusion weight is below the first threshold (i.e., W < threshold), it adopts a fusion strategy dominated by the language model's own knowledge. This strategy is suitable for scenarios where the model's own knowledge is evaluated as relatively reliable and sufficient.
[0158] This embodiment achieves a clear and robust decision-making transition from continuous quantities (fusion weights) to discrete actions (fusion strategies). Setting a first threshold essentially defines an action boundary, enabling the system to decisively switch between two fundamentally different processing modes: one actively introducing and deeply fusing external evidence, and the other using the model's intrinsic knowledge as the core and external information as verification. This binary decision-making mechanism based on a clear threshold not only makes system behavior easier to understand, control, and optimize, but more importantly, it ensures that subsequent retrieval, fusion, and generation stages operate effectively under the guidance of this strategy. This allows complex problems to focus on external knowledge, while simple problems rely on the model's intrinsic knowledge, thereby improving the accuracy of the answers and reducing illusions.
[0159] Based on the above embodiments, in another embodiment provided in this application, when adjusting the strategy for obtaining retrieval information from the knowledge base based on the fusion weight, the above step S360 may further include the following steps: Step S361: When the fusion weight is not lower than the first threshold, expand the search scope to multiple related fields.
[0160] Alternatively, in step S362, when the fusion weight is lower than the first threshold, the search scope is narrowed to a single domain.
[0161] In this embodiment, when the fusion weight is determined to be no less than a first threshold (i.e., W ≥ threshold), an operation to expand the search scope can be performed. This means the search will no longer be limited to the most directly relevant single knowledge domain, but will extend to querying multiple related domains. For example, for a complex problem involving "biomechanics and materials engineering," the search will be performed simultaneously in biomedical databases and engineering literature repositories. This aims to improve the recall rate of the search, striving to broadly cover all potentially relevant information, providing the most comprehensive possible raw materials for subsequent in-depth analysis, and ensuring that no key evidence is missed in complex scenarios that heavily rely on external knowledge.
[0162] When the fusion weight is determined to be below the first threshold (i.e., W < threshold), a narrowing of the search scope can be performed. This makes the search highly focused and precise, strictly limited to a single domain that best matches the core of the question. For example, for a purely general knowledge question about "astronomical orbital periods," the search will be limited to basic science or astronomy knowledge bases. This aims to maximize the accuracy of the search, quickly and accurately locating the most authoritative and relevant limited amount of reference information. Its core purpose is to efficiently verify or supplement information, thereby minimizing the introduction of irrelevant or low-quality noise information and improving efficiency and cleanliness when the language model can already handle the question well.
[0163] This embodiment translates macroscopic fusion strategies (external priority or model-driven) directly into microscopic retrieval execution parameters (range breadth and narrowness), making the knowledge acquisition process no longer isolated or fixed, but rather an organic and controlled component of the adaptive workflow. Thus, when needed (high weight), it can provide a rich knowledge base through extensive retrieval, effectively supporting the generation of complex answers and improving accuracy; when not needed (low weight), it can avoid resource waste and information pollution through precise retrieval, improving response speed.
[0164] Based on the above embodiments, a closed-loop optimization mechanism is proposed to enable the system to continuously improve itself. This method adds data recording and model optimization steps after completing a single question-and-answer process, allowing the system to automatically adjust its core decision parameters using historical experience. Therefore, in another embodiment provided in this application, the method may further include the following steps: Step S370: After generating the final answer, record the complexity score, dependency judgment result, confidence, knowledge coverage, fusion weight, and answer accuracy.
[0165] After a complete question-and-answer interaction is completed and a final answer is generated, the implementation does not immediately end the processing of this session. Instead, it initiates a data archiving and annotation process, which associates and records the key process data generated by each core module during the interaction with the final result. Specifically, the recorded data items include: the complexity score and dependency judgment results generated by the question analysis module 21; the confidence score and knowledge coverage output by the model state monitoring module 22; the fusion weights calculated and used by the adaptive fusion module 23; and the answer accuracy (or other quality assessment indicators) obtained after manual or automated evaluation of the answer. These data collectively constitute a complete sample of the decision trajectory from question input to result evaluation.
[0166] Step S380: Optimize the calculation parameters involved in determining the fusion weights using the recorded data.
[0167] The implementation will periodically, or after accumulating a sufficient number of decision trajectory samples, initiate an offline parameter optimization process. This process uses the historical data recorded in step S370 as training samples, and its optimization objective is to find a better set of weight coefficients (e.g., α, β, γ in the aforementioned formula, and the magnitude of knowledge dependency adjustment, etc.) so that the fusion weights calculated based on these coefficients can better predict or lead to higher answer accuracy. This is typically achieved by constructing a loss function (e.g., a function with fusion weights as intermediate variables and final accuracy as the objective) and employing machine learning optimization algorithms such as gradient descent. The optimized new parameters will be updated in the adaptive fusion module 23 to guide future fusion weight calculations, thereby enabling the system's adaptive decision-making capabilities to continuously evolve with accumulated usage experience.
[0168] This embodiment enables the retrieval enhancement system to possess self-iterative and continuous learning capabilities. This mechanism transforms single, static question-and-answer interactions into data fuel for long-term system performance optimization, achieving a complete closed loop from "execution" to "recording" to "optimization." It can automatically learn from historical successes and failures, continuously fine-tuning its fusion decision model to better adapt to the data distribution and user needs of specific application domains. This not only improves the robustness and adaptability of the system's long-term deployment but also effectively reduces the cost of manual parameter tuning during later maintenance. It ensures that system performance continues to improve over time, providing an intrinsic and sustainable driving force for achieving and maintaining a lower illusion rate and higher response accuracy.
[0169] Based on the functions and embodiments of the above modules, the embodiments provided in this application can be implemented through a clear and coherent six-step process. This process summarizes the complete closed loop from receiving user questions to generating the final answer and performing self-optimization, demonstrating the system's end-to-end adaptive processing capabilities.
[0170] (1) Problem characteristics analysis and data extraction.
[0171] The system first receives a natural language question input by the user. Then, the question analysis module is activated to perform in-depth analysis of the question. This module performs two core tasks: first, through semantic parsing and weighted calculation, it outputs a complexity score that quantifies the difficulty of the question; second, by matching the question to the model's knowledge carriers and judging based on a threshold, it outputs a knowledge dependence judgment result representing the degree to which the question relies on the model's own knowledge. These two results together constitute the question characteristic data, serving as the initial input for subsequent decision-making.
[0172] (2) Real-time monitoring of model status.
[0173] Simultaneously, the system inputs the user's question into the language model and activates the model state monitoring module. This module monitors the internal dynamics of the language model in real time as it processes the question: it obtains the confidence level by calculating the probability distribution entropy during the model generation process; and it obtains the knowledge coverage score by statistically analyzing the frequency of occurrence of the question's knowledge points in the model's pre-training corpus. These two metrics together constitute the model state data, reflecting the model's real-time ability and confidence in handling the current question.
[0174] (3) Adaptive fusion decision.
[0175] The adaptive fusion module receives problem characteristic data from step (1) and model state data from step (2). Based on this multidimensional information, the module dynamically calculates the external knowledge fusion weight W using a preset, optimizable weight calculation formula (e.g., considering complexity, confidence, and coverage). The weight W is a value between 0 and 1, and its magnitude directly determines the relative importance of external knowledge in subsequent fusion. Next, the module determines the macro-level knowledge fusion strategy to be adopted (e.g., "external knowledge priority fusion" or "model knowledge-driven fusion") based on the comparison between the W value and a preset threshold (e.g., 0.6).
[0176] (4) Dynamic retrieval and information acquisition.
[0177] The knowledge base interaction module dynamically adjusts its retrieval behavior to the external knowledge base based on the fusion weight W determined in step (3). Specifically, when the W value is high, the module performs extended retrieval to improve recall; when the W value is low, the module performs precise retrieval to improve precision. Through this adaptive retrieval strategy, the system efficiently and accurately obtains the target retrieval information that best matches the current problem and decision from the knowledge base.
[0178] (5) Strategic knowledge integration and answer generation.
[0179] The adaptive fusion module performs specific knowledge integration operations according to the fusion strategy determined in step (3). It deeply fuses the target retrieval information obtained in step (4) with the knowledge of the language model itself. This process may be achieved through improved attention mechanisms (such as...). Figure 4 The architecture shown ensures that external information is effectively incorporated into the generation context according to a predetermined strategy. Ultimately, the language model generates the final answer based on this enhanced context.
[0180] (6) Data recording and system self-optimization.
[0181] After the system outputs its final answer, the process does not immediately end. As a system with learning capabilities, it stores the complete decision-making trajectory of this interaction—including question characteristic data, model state data, fusion weights W, and the answer accuracy obtained through feedback—into its log module. This accumulated historical data is used for subsequent offline analysis, continuously optimizing key parameters (such as coefficients α, β, and γ) in the weight calculation formula using machine learning methods (such as optimization algorithms). This allows the system's adaptive decision-making capabilities to continuously evolve with increased usage experience, achieving sustained performance improvements.
[0182] This embodiment systematically demonstrates how to organically integrate problem analysis, status monitoring, intelligent decision-making, adaptive retrieval, and fusion generation to form a retrieval enhancement solution that can dynamically respond to problems of different difficulties and types and has self-improvement capabilities.
[0183] As a specific implementation of the above embodiments, such as Figure 4 As shown, Figure 4 This is a schematic diagram of a search enhancement architecture provided for yet another embodiment of this application. Figure 4 The present application presents a clear flowchart illustrating the specific technical architecture for dynamically and deeply integrating external knowledge base information with the internal knowledge of the language model during the decoding stage in one embodiment of the present application. This architecture is an improved encoder-decoder architecture, the core of which is to introduce a cross-attention layer as the fusion hub.
[0184] The workflow and core component functions of this architecture are as follows: (1) Dual-path parallel coding.
[0185] The system employs dual-path parallel processing to acquire internal state and external knowledge separately: Internal paths ( Figure 4 (Left side of the middle): The user query first undergoes semantic understanding and encoding by the query encoder. Subsequently, the encoded representation is input to the generator's self-attention layer. This layer captures the dependencies between words within the query, further refining and forming an internal attention tensor A representing the model's current reasoning logic and context. Tensor A is a vectorized representation of the model's "thinking" process.
[0186] External path ( Figure 4(Right side of the middle section): Relevant documents retrieved from the knowledge base are fed into the document encoder. This encoder converts the original text into a high-dimensional vector representation, forming a series of external knowledge embeddings K. Embeddings K systematically encode the retrieved external facts and information.
[0187] (2) Cross-attention dynamic fusion.
[0188] The internal tensor A and the external knowledge embedding K converge at the cross-attention layer, which is the core of the dynamic fusion.
[0189] Similarity calculation: Using the internal tensor A as the query and the external knowledge embedding K as the key and value, a similarity matrix is calculated between the two. This essentially involves comparing the model's internal reasoning logic with external factual information item by item.
[0190] Dynamic weight generation: The similarity matrix is normalized using the Softmax function to generate a set of dynamic attention weights α. The weights α quantify the importance of each piece of external knowledge in the context of generating the current word.
[0191] Weighted knowledge fusion: The values of the external knowledge embedding K are weighted and summed using dynamic weights α. This weighted external information, which is highly relevant to the current context, is then integrated with the internal tensor A through matrix addition or linear transformation after concatenation. This step outputs a fused tensor that organically combines the semantic logic within the model with the filtered external factual knowledge.
[0192] (3) Context-aware answer generation.
[0193] Finally, the generated fusion tensor is input into the decoder. Based on the information-enhanced context infused with external knowledge, the decoder progressively predicts and generates the final, high-quality answer in an autoregressive manner. This entire process ensures that external evidence is deeply and dynamically integrated into every step of text generation.
[0194] Figure 4The fusion architecture shown is the underlying key technology supporting the adaptive knowledge fusion strategy (especially "external knowledge priority fusion") implemented in this application embodiment. It ensures information integrity through separate encoding and achieves on-demand dynamic knowledge focusing and fusion through cross-attention during the decoding stage, constructing an end-to-end, optimizable knowledge enhancement generation pipeline. This design enables the language model not only to access external knowledge but also to learn how to intelligently "reference" and "fusion" this knowledge, thereby providing a guarantee at the model level for generating accurate, reliable answers with a low illusion rate.
[0195] To illustrate the technical solutions and effects of the embodiments of this application more specifically, a typical example of handling complex cross-disciplinary problems is provided below. This embodiment demonstrates how the system generates accurate and reliable answers through a complete adaptive process when faced with highly difficult, interdisciplinary problems.
[0196] (1) Problem input.
[0197] A user inputs a complex interdisciplinary professional question through terminal 10: "How can we use the finite element analysis method in mechanical engineering, combined with the human skeletal biomechanical model in the biomedical field, to optimize the implantation angle of artificial joints?"
[0198] (2) Problem characteristics analysis.
[0199] Problem analysis module 21 processes this problem: Complexity Assessment: The module extracted 5 key entities (finite element analysis, mechanical engineering, biomedicine, human skeletal biomechanics model, and artificial joint), analyzed the semantic dependency depth to be 4 layers, and identified that the problem spans two domains, "mechanical engineering" and "biomedicine" (domain overlap of 2). After weighted calculation and normalization using a preset algorithm, the output complexity score is 0.85 (close to the maximum value of 1), indicating that this is an extremely complex problem.
[0200] Knowledge Dependency Assessment: The module calculates the matching degree between the question and the language model's own knowledge carriers (pre-trained knowledge graphs, corpora, etc.), and finds that the matching degree is low. After weighted scoring and comparison with a threshold, the output knowledge dependency label is "low dependency", indicating that the question significantly exceeds the comfort zone of the model's own knowledge and needs to rely heavily on external knowledge.
[0201] (3) Model status monitoring.
[0202] Model status monitoring module 22 monitors the processing status of language model 30 for this problem in real time: Confidence level assessment: During the model's attempt to generate an answer, the entropy of its predicted probability distribution is calculated to be 0.7 (which is relatively high). This indicates that the model has low confidence in how to answer the question and is "hesitant".
[0203] Knowledge coverage assessment: The frequency of occurrence of the core knowledge points involved in the statistical problem in the model's pre-training corpus was obtained, resulting in a coverage score of 0.2 (at a low level), which confirms that the knowledge involved in the problem is relatively obscure or specialized in the model's existing experience.
[0204] (4) Adaptive fusion decision.
[0205] Adaptive fusion module 23 makes a decision based on the above information: Fusion weight calculation: Substituting the complexity score (0.85), model confidence (0.7), knowledge coverage (0.2), and preset coefficients (e.g., α=0.4, β=0.3, γ=0.3) into the dynamic weight calculation formula: W = 0.4×0.85 + 0.3×(1-0.7) + 0.3×(1-0.2) = 0.67. Since the knowledge dependency is "low dependency," there is no need to adjust the weights downwards, and the final fusion weight W is determined to be 0.67.
[0206] Fusion strategy determined: Since W=0.67 ≥ the preset first threshold (e.g., 0.6), the module decision adopts the "external knowledge priority fusion" strategy.
[0207] (5) Adaptive retrieval execution.
[0208] The knowledge base interaction module 24 executes an extended retrieval strategy based on the high fusion weight (W=0.67): Expand the search scope: Instead of searching in a single field, simultaneously launch queries to "Mechanical Engineering Databases" (such as the ANSYS Literature Database) and "Biomedical Databases" (such as the PubMed Orthopedics Special Issue).
[0209] Improve recall: Adjust search parameters to focus on retrieving relevant documents as comprehensively as possible. In this example, the recall rate was increased to about 90%, and 10 highly relevant search results were obtained.
[0210] (6) Knowledge integration and answer generation.
[0211] The final generation steps can be performed according to the "external knowledge priority integration" strategy: Knowledge structuring: First, the information from multiple retrieved documents is summarized and structurally broken down according to the framework of "method (finite element analysis) - model (bone mechanics) - application scenario (joint implantation)".
[0212] Deep weighted fusion: During the decoding and generation stage, this structured external knowledge is deeply weighted and fused with the language model's own contextual knowledge according to the fusion weights (approximately 67% vs 33%). Specifically, this can be achieved through... Figure 4 The cross-attention mechanism shown enables external facts to be deeply involved in the generation logic.
[0213] Final output: Based on the fused, rich, and accurate context, Language Model 30 generated a detailed answer including specific finite element analysis steps, relevant skeletal biomechanical parameters, and implantation angle optimization formulas. Verification showed that this answer effectively integrated cross-disciplinary expertise, did not create a factual illusion, and accurately resolved the user's complex problem.
[0214] This embodiment clearly demonstrates its superior ability to handle complex, cross-domain problems that are prone to hallucinations in traditional RAG systems. By quantitatively analyzing problem complexity and knowledge dependence, and sensing model confidence and knowledge coverage in real time, it intelligently makes the decision of "heavily relying on external knowledge," thereby guiding extensive retrieval and deep knowledge fusion, thus ensuring the accuracy and reliability of the final answer. This empirically demonstrates the significant effect of the embodiment in alleviating hallucinations.
[0215] To further illustrate the adaptability and efficiency optimization of the embodiments of this application, a simple common-sense problem processing example is used for comparison and explanation below. This embodiment demonstrates how the system intelligently suppresses unnecessary complex processing and achieves efficient and accurate answers when faced with simple problems that the model has mastered.
[0216] (1) Problem input.
[0217] A user enters a basic astronomy question: "How long does it take for the Earth to revolve around the Sun once?"
[0218] (2) Problem characteristics analysis.
[0219] Problem analysis module 21 processes this problem: Complexity Assessment: The module extracts two entities (Earth and Sun), and analysis shows that the semantic dependency depth is only one layer, and the problem does not involve cross-domain knowledge (domain crossover is 0). After weighted calculation and normalization, the output complexity score is 0.1 (close to the minimum value of 0), indicating that this is an extremely simple problem.
[0220] Knowledge Dependency Assessment: The module calculates the matching degree between the question and the language model's own knowledge, finding that its matching degree with pre-trained knowledge graphs and other carriers is as high as 0.95. Based on this, the output knowledge dependency label is "high dependency," indicating that the question is entirely within the high coverage range of the model's own knowledge and theoretically requires little external support.
[0221] (3) Model status monitoring.
[0222] Model status monitoring module 22 monitors the status of language model 30 in real time: Confidence level assessment: During the model's answer generation process, the entropy of its predicted probability distribution is calculated to be 0.1 (a very low level), indicating that the model has a very high confidence in answering this common-sense question, almost "without hesitation".
[0223] Knowledge coverage assessment: The frequency of occurrence of knowledge points in the pre-training corpus was statistically analyzed, resulting in a coverage score of 0.9 (which is at a very high level), confirming that "Earth's revolution period" is an extremely common piece of knowledge in the model training data.
[0224] (4) Adaptive fusion decision.
[0225] Adaptive fusion module 23 integrates information to make decisions: Fusion weight calculation: Substituting the complexity score (0.1), model confidence (0.1), knowledge coverage (0.9), and preset coefficients into the formula, the initial weight is calculated as: W = 0.4×0.1 + 0.3×(1-0.1) + 0.3×(1-0.9) = 0.34. Since the knowledge dependency label is "high dependency," the weight is adjusted downwards by 0.2 according to the rules, and the final fusion weight W is determined to be 0.14.
[0226] Fusion strategy determined: Since W=0.14<the preset first threshold (e.g., 0.6), the module decision adopts the "model knowledge-led fusion" strategy.
[0227] (5) Adaptive retrieval execution.
[0228] The knowledge base interaction module 24 executes a precision retrieval strategy based on a low fusion weight (W=0.14): Narrow the search scope: Search only within highly relevant single-domain knowledge sources such as the "Astronomical Common Sense Database".
[0229] Improve accuracy: Adjust search parameters to obtain the most accurate and authoritative answers. In this example, the search accuracy was improved to about 99%, and only two highly consistent search results were obtained (both containing "about 365 days").
[0230] (6) Knowledge integration and answer generation.
[0231] The system executes the generation steps according to the "model knowledge-driven fusion" strategy: Model-driven generation: Language Model 30, based on its own knowledge, directly generates the preliminary answer "365 days 5 hours 48 minutes 46 seconds".
[0232] External knowledge verification: The system performs a quick consistency check between the retrieved concise information ("approximately 365 days") and the detailed answer generated by the model itself. This is because the two are completely consistent on the core facts (approximately 365 days).
[0233] Final output: The system retains the more accurate and detailed answer generated by the model itself as the final output. This process avoids unnecessary deep queries and complex fusion calculations to external knowledge bases, introduces no retrieval noise, and achieves 100% accuracy.
[0234] This embodiment contrasts sharply with the previous one, demonstrating the resource optimization and efficiency improvement capabilities of this application's embodiment when handling simple, highly deterministic common-sense problems. By accurately identifying the low complexity and high knowledge dependence of the problem, and combining this with the model's own high confidence and high knowledge coverage, the system intelligently makes the decision to "prioritize model knowledge and perform only lightweight verification." This significantly reduces unnecessary retrieval and computational overhead, improving system response efficiency while ensuring answer accuracy. It exemplifies the adaptive principle of "simple problems rely on their own inherent strengths," effectively preventing efficiency waste or noise introduction that may result from traditional fixed processes.
[0235] In the case of dividing each functional module according to its corresponding functions, the embodiments of this application provide a search enhancement generation device, which can be a server, a terminal, or a chip applied to a server. Figure 5 A schematic block diagram of the functional modules of a search enhancement generation apparatus provided for an exemplary embodiment of this application. (See diagram below.) Figure 5 As shown, the search enhancement generation device includes: Question receiving module 51 is used to receive questions input by the user; Problem analysis module 52 is used to obtain a complexity score for the problem and a judgment result on the dependence on the language model's own knowledge; wherein, the complexity score is used to quantify the difficulty of the problem, and the dependence judgment result indicates the degree of dependence of the problem on the language model's own knowledge; The model state monitoring module 53 is used to input questions into the language model and obtain the confidence of the language model in answering the questions and the knowledge coverage of the knowledge points involved in the questions in the language model's own knowledge. The adaptive fusion module 54 is used to determine the fusion weight of the knowledge base when fusion with the language model's own knowledge based on complexity score, dependency judgment result, confidence and knowledge coverage; and to determine the corresponding knowledge fusion strategy based on the fusion weight. The knowledge base interaction module 55 is used to adjust the strategy for obtaining retrieval information from the knowledge base based on the fusion weight, so as to obtain the target retrieval information. The adaptive fusion module 54 is also used to fuse target retrieval information with the language model's own knowledge based on a knowledge fusion strategy to generate the final answer to the question.
[0236] This embodiment can achieve intelligent perception of user questions, real-time evaluation of model status, dynamic decision-making on fusion strategies, and accurate utilization of external knowledge, thereby improving the accuracy and reliability of answers in the retrieval enhancement generation process and effectively alleviating the illusion phenomenon generated by the language model when generating answers.
[0237] In one possible implementation, the problem analysis module 52 is further used for: Extract the number of entities, semantic dependency depth, and domain overlap in the problem; The complexity score is obtained by weighting the number of entities, semantic dependency depth, and domain overlap, and then normalizing the calculation results.
[0238] This embodiment transforms the subjective judgment of problem complexity into an objective and automated calculation based on multi-dimensional semantic features. This quantification method not only achieves a consistent assessment of problem difficulty but also provides accurate input variables for subsequent adaptive fusion decisions. By fusing three key indicators that characterize problems from the perspectives of information density, structural complexity, and knowledge breadth—"number of entities," "semantic dependency depth," and "domain overlap"—it can accurately distinguish whether the user's input problem is a simple query or a complex one, thereby triggering the corresponding knowledge retrieval and fusion strategies more precisely.
[0239] In one possible implementation, the problem analysis module 52 is further used for: The correlation between computational problems and pre-trained knowledge graphs, pre-trained corpora, and domain fine-tuning parameters of language models; The relevance is weighted to obtain the dependency score; The dependency score is compared with a preset threshold, and the dependency judgment result is output based on the comparison result.
[0240] This implementation transforms the assessment of problem knowledge dependence from fuzzy estimation to precise calculation and classification. The example constructs a comprehensive and robust evaluation system by comprehensively assessing the relevance of the problem to the model's structured knowledge (knowledge graph), unstructured knowledge (corpus), and domain-adaptive knowledge (fine-tuning parameters). This system can more precisely identify problems that appear simple but actually require external verification, as well as problems that seem unfamiliar but have sufficient implicit knowledge within the model. This judgment can directly influence the calculation of subsequent fusion weights (e.g., proactively reducing the weight of external knowledge when there is "high dependence"), effectively preventing excessive redundant retrieval of knowledge already possessed by the model, while also avoiding blind confidence in the model's knowledge blind spots, thus accurately balancing the utilization of internal and external knowledge at the system level.
[0241] In one possible implementation, the model state monitoring module 53 is further used for: In the process of generating answers word by word using a language model, the entropy value of the language model for predicting the probability distribution of the next word is calculated. The confidence level of a language model's response to a question is determined based on entropy values; where entropy values are inversely correlated with confidence levels.
[0242] This embodiment does not rely on external feedback, but directly captures the most realistic uncertainty state of the model through the language model's generation mechanism, namely the probability distribution of each prediction step. This approach can accurately distinguish whether the language model's output is deterministic or uncertain, providing a signal for subsequent adaptive fusion decisions. This allows the language model to utilize an external knowledge base when outputting uncertain answers, reducing the illusions the language model may have when generating responses.
[0243] In one possible implementation, the model state monitoring module 53 is further used for: Based on the internal knowledge index of the language model, the frequency of occurrence of the knowledge points involved in the statistical problem in the pre-training corpus of the language model is calculated. Knowledge coverage is generated based on the frequency of occurrence.
[0244] This embodiment enables an objective and quantitative assessment of the knowledge support provided by the language model. By directly and efficiently correlating the knowledge points in the question with the model's pre-training history, it provides an effective means of measuring the model's own knowledge. This statistically based coverage assessment can effectively distinguish between common questions and questions in specialized, obscure, or emerging fields, allowing for the pre-judgment of whether the model has knowledge blind spots. This knowledge coverage, together with the model's confidence level, constitutes a dual guarantee for assessing the model's internal reliability. It provides a decision-making basis for dynamically increasing the weight of external knowledge and proactively introducing external evidence. Thus, when the model faces unfamiliar knowledge domains, it can proactively retrieve enhancements and fusion efforts in advance, effectively preventing illusions caused by insufficient knowledge and improving the accuracy of answers to marginal or specialized questions.
[0245] In one possible implementation, the adaptive fusion module 54 is further used for: Based on complexity assessment, confidence level, and knowledge coverage, as well as the weights corresponding to complexity assessment, confidence level, and knowledge coverage, the fusion weight of the knowledge base when fused with the knowledge of the language model is calculated. When the dependency judgment result is high dependency, the fusion weight is adjusted downward; or when the dependency judgment result is low dependency, the fusion weight is adjusted upward to obtain the final fusion weight.
[0246] This embodiment first quantifies the objective difficulty of the problem and the real-time state of the model through weighted synthesis, generating a preliminary fusion tendency. Then, it introduces higher-order judgments (dependencies) on the problem's knowledge attributes for targeted calibration, ultimately outputting a more accurate and robust weight value. This design ensures that the weight calculation process fully responds to dynamic signals (confidence, coverage) during the generation process while closely integrating the static knowledge attribute analysis of the problem (complexity, dependency), achieving a deep fusion of dynamic and static information. This improves the accuracy and contextual adaptability of the system's decisions, ensuring that unnecessary resource consumption and interference are reduced when the language model can provide an answer, and that external support is enhanced when the language model is insufficient to provide an accurate answer. This improves system response speed and resource utilization, reduces illusions, and increases the accuracy of responses.
[0247] In one possible implementation, the adaptive fusion module 54 is further used for: When the fusion weight is not lower than the first threshold, the knowledge base priority fusion strategy is adopted. Alternatively, when the fusion weight is below the first threshold, a fusion strategy dominated by the language model's own knowledge can be adopted.
[0248] In one possible implementation, the strategy for retrieving information from the knowledge base based on fusion weights includes: When the fusion weight is not lower than the first threshold, the search scope is expanded to multiple related fields; Alternatively, when the fusion weight is below the first threshold, the search scope can be narrowed down to a single domain.
[0249] This embodiment achieves a clear and robust decision-making transition from continuous quantities (fusion weights) to discrete actions (fusion strategies). Setting a first threshold essentially defines an action boundary, enabling the system to decisively switch between two fundamentally different processing modes: one actively introducing and deeply fusing external evidence, and the other using the model's intrinsic knowledge as the core and external information as verification. This binary decision-making mechanism based on a clear threshold not only makes system behavior easier to understand, control, and optimize, but more importantly, it ensures that subsequent retrieval, fusion, and generation stages operate effectively under the guidance of this strategy. This allows complex problems to focus on external knowledge, while simple problems rely on the model's intrinsic knowledge, thereby improving the accuracy of the answers and reducing illusions.
[0250] In one possible implementation, the device further includes an optimization module, specifically used for: After generating the final answer, record the complexity score, dependency judgment result, confidence, knowledge coverage, fusion weight, and answer accuracy. The recorded data is used to optimize the calculation parameters involved in determining the fusion weights.
[0251] This embodiment enables the retrieval enhancement system to possess self-iterative and continuous learning capabilities. This mechanism transforms single, static question-and-answer interactions into data fuel for long-term system performance optimization, achieving a complete closed loop from "execution" to "recording" to "optimization." It can automatically learn from historical successes and failures, continuously fine-tuning its fusion decision model to better adapt to the data distribution and user needs of specific application domains. This not only improves the robustness and adaptability of the system's long-term deployment but also effectively reduces the cost of manual parameter tuning during later maintenance. It ensures that system performance continues to improve over time, providing an intrinsic and sustainable driving force for achieving and maintaining a lower illusion rate and higher response accuracy.
[0252] This application also provides a computing device, including: at least one processor; a memory for storing at least one processor-executable instruction; wherein the at least one processor is configured to execute instructions to implement the method disclosed in the embodiments of this application.
[0253] The aforementioned processor can also be called a central processing unit (CPU), which can be an integrated circuit chip with signal processing capabilities. Each step in the method disclosed in this application can be implemented by integrated logic circuits in the processor's hardware or by software instructions. The aforementioned processor can be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. A general-purpose processor can be a microprocessor or any conventional processor. The steps of the method disclosed in this application can be directly implemented by a hardware decoding processor, or by a combination of hardware and software modules in the decoding processor. The software modules can be located in memory, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. The processor reads information from the memory and, in conjunction with its hardware, completes the steps of the above method.
[0254] Furthermore, the computing device can specifically be a server. Various operations / processes according to the embodiments of this application, implemented via software and / or firmware, can be transmitted from a storage medium or network to a server with a dedicated hardware architecture, such as... Figure 6 The server 1900 shown is equipped with the programs that constitute the software. When various programs are installed on the server, it is able to perform various functions, including those mentioned above. Figure 6 A structural block diagram of a server provided for an exemplary embodiment of this application.
[0255] Server 1900 is intended to represent various forms of digital electronic computer devices, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, mainframe computers, and other suitable computers. The components shown herein, their connections and relationships, and their functions are merely examples and are not intended to limit the implementation of the embodiments described and / or claimed herein.
[0256] like Figure 6 As shown, server 1900 includes a computing unit 1901, which can perform various appropriate actions and processes based on a computer program stored in read-only memory (ROM) 1902 or loaded into random access memory (RAM) 1903 from storage unit 1908. RAM 1903 may also store various programs and data required for the operation of server 1900. Server 1900 also includes a GPU 1910. Computing unit 1901, ROM 1902, GPU 1910, and RAM 1903 are interconnected via bus 1904. Input / output (I / O) interface 1905 is also connected to bus 1904. The number of GPUs 1910 may include multiple GPUs.
[0257] Multiple components in server 1900 are connected to I / O interface 1905, including: input unit 1906, output unit 1907, storage unit 1908, and communication unit 1909. Input unit 1906 can be any type of device capable of inputting information to server 1900. Input unit 1906 can receive input numeric or character information and generate key signal inputs related to user settings and / or function control of the server. Output unit 1907 can be any type of device capable of presenting information and may include, but is not limited to, a monitor, speaker, video / audio output terminal, vibrator, and / or printer. Storage unit 1908 may include, but is not limited to, disks and optical discs. Communication unit 1909 allows server 1900 to exchange information / data with other devices via a network such as the Internet, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers, and / or chipsets, such as Bluetooth™ devices, WiFi devices, WiMax devices, cellular communication devices, and / or the like.
[0258] The computing unit 1901 can be various general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 1901 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1901 performs the various methods and processes described above. For example, in some embodiments, the methods disclosed in the embodiments of this application can be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as storage unit 1908. In some embodiments, part or all of the computer program can be loaded and / or installed on a server via ROM 1902 and / or communication unit 1909. In some embodiments, the computing unit 1901 can be configured to perform the methods disclosed in the embodiments of this application by any other suitable means (e.g., by means of firmware).
[0259] This application also provides a computer-readable storage medium, wherein when the instructions in the computer-readable storage medium are executed by the processor of a server, the server is able to perform the methods disclosed in the embodiments of this application.
[0260] The computer-readable storage medium in this application embodiment may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. The aforementioned computer-readable storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specifically, the aforementioned computer-readable storage medium may include an electrical connection based on one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
[0261] The aforementioned computer-readable medium may be included in the aforementioned server; or it may exist independently and not assembled into the server.
[0262] This application also provides a computer program product, including a computer program, wherein the computer program, when executed by a processor, implements the methods disclosed in the embodiments of this application.
[0263] In embodiments of this application, computer program code for performing the operations of this application can be written in one or more programming languages or a combination thereof. These programming languages include, but are not limited to, object-oriented programming languages such as Java, Smalltalk, and C++, as well as conventional procedural programming languages such as C or similar languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network (including a local area network (LAN) or a wide area network (WAN)), or it can be connected to an external computer.
[0264] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.
[0265] The modules, components, or units described in the embodiments of this application can be implemented in software or hardware. The names of the modules, components, or units do not necessarily constitute a limitation on the module, component, or unit itself.
[0266] The functions described above in this document can be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary hardware logic components that can be used include: field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), system-on-a-chip (SoCs), complex programmable logic devices (CPLDs), and so on.
[0267] The above description is merely an embodiment of this application and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of disclosure in this application is not limited to technical solutions formed by specific combinations of the above-described technical features, but should also cover other technical solutions formed by arbitrary combinations of the above-described technical features or their equivalents without departing from the above-described concept. For example, technical solutions formed by substituting the above features with (but not limited to) technical features with similar functions disclosed in this application.
[0268] While specific embodiments of this application have been described in detail by way of examples, those skilled in the art should understand that the above examples are for illustrative purposes only and are not intended to limit the scope of this application. Those skilled in the art should understand that modifications can be made to the above embodiments without departing from the scope and spirit of this application. The scope of this application is defined by the appended claims.
Claims
1. A retrieval enhancement generation method, characterized in that, The method includes: The problem of receiving user input; Obtain a complexity score for the problem and a judgment result on the dependence on the language model's own knowledge; wherein, the complexity score is used to quantify the difficulty of the problem, and the dependence judgment result indicates the degree to which the problem depends on the language model's own knowledge; Input the question into the language model, and obtain the confidence of the language model in answering the question and the knowledge coverage of the knowledge points involved in the question in the language model's own knowledge. Based on the complexity score, the dependency judgment result, the confidence level, and the knowledge coverage, the fusion weight of the knowledge base when merging with the language model's own knowledge is determined; Based on the fusion weights, the corresponding knowledge fusion strategy is determined; Based on the fusion weight adjustment, the strategy for obtaining retrieval information from the knowledge base is adjusted to obtain target retrieval information. Then, based on the knowledge fusion strategy, the target retrieval information is fused with the language model's own knowledge to generate the final answer to the question.
2. The method according to claim 1, characterized in that, The process of obtaining a complexity score for the problem includes: Extract the number of entities, semantic dependency depth, and domain overlap in the problem; The complexity score is obtained by weighting the number of entities, semantic dependency depth, and domain overlap, and then normalizing the calculation results.
3. The method according to claim 1, characterized in that, Obtain the dependency judgment results on the language model's own knowledge, including: Calculate the correlation between the problem and the pre-trained knowledge graph, pre-trained corpus, and domain fine-tuning parameters of the language model; The relevance is weighted to obtain a dependency score; The dependency score is compared with a preset threshold, and the dependency judgment result is output based on the comparison result.
4. The method according to claim 1, characterized in that, The step of obtaining the confidence level of the language model's answer to the question includes: During the process of generating answers word by word using the language model, the entropy value of the language model for predicting the probability distribution of the next word is calculated. The confidence level of the language model's answer to the question is determined based on the entropy value; wherein the entropy value is inversely correlated with the confidence level.
5. The method according to claim 1, characterized in that, Obtaining the knowledge coverage of the knowledge points involved in the problem within the language model's own knowledge includes: Based on the internal knowledge index of the language model, the frequency of occurrence of the knowledge points involved in the question in the pre-training corpus of the language model is statistically analyzed. Knowledge coverage is generated based on the frequency of occurrence.
6. The method according to claim 1, characterized in that, The determination of the fusion weights when integrating the knowledge base with the language model's own knowledge includes: Based on the complexity assessment, the confidence level, and the knowledge coverage, as well as the weights corresponding to the complexity assessment, the confidence level, and the knowledge coverage, the fusion weight of the knowledge base when fusion with the language model's own knowledge is calculated. When the dependency judgment result is high dependency, the fusion weight is adjusted downward; or when the dependency judgment result is low dependency, the fusion weight is adjusted upward to obtain the final fusion weight.
7. The method according to claim 1, characterized in that, The step of determining the corresponding knowledge fusion strategy based on the fusion weights includes: When the fusion weight is not lower than the first threshold, the knowledge base priority fusion strategy is adopted. Alternatively, when the fusion weight is lower than the first threshold, a fusion strategy dominated by the language model's own knowledge is determined to be adopted.
8. The method according to claim 7, characterized in that, The strategy for adjusting the retrieval information from the knowledge base based on the fusion weight includes: When the fusion weight is not lower than the first threshold, the search scope is expanded to multiple related fields; Alternatively, when the fusion weight is lower than the first threshold, the search scope is narrowed down to a single domain.
9. The method according to claim 1, characterized in that, The method further includes: After generating the final answer, record the complexity score, dependency judgment result, confidence, knowledge coverage, fusion weight, and answer accuracy. The recorded data is used to optimize the calculation parameters involved in determining the fusion weights.
10. A computing device, characterized in that, include: At least one processor; Memory for storing the at least one processor-executable instruction; The at least one processor is configured to execute the instructions to implement the method as described in any one of claims 1-9.