Dual-path generation method, medium and system based on dynamic updating of weight coefficients
By dynamically updating weight coefficients and selectively calling the retrieval or reasoning module, the problem of unreasonable resource allocation in complex task scenarios is solved, and efficient and accurate query processing is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HUNAN AGRI UNIV
- Filing Date
- 2025-11-11
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies struggle to balance time latency and accuracy in complex task scenarios, and unreasonable resource allocation leads to low query efficiency and insufficient accuracy.
By acquiring the characteristic parameters of user queries, dynamically updating weight coefficients, and selectively calling the retrieval thinking module or the reasoning thinking module, resources can be rationally allocated and optimized.
It improved the efficiency and accuracy of query processing, ensured the rational allocation of resources, and enhanced the user experience.
Smart Images

Figure CN121998073B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of information retrieval technology, and in particular to a dual-path generation method, medium, and system based on dynamic updating of weight coefficients. Background Technology
[0002] Retrieval-Augmented Generation (RAG) modules, such as retrieval-enhanced generation (RAG), have made natural language-based question-and-answer interaction the mainstream mode. However, this technology faces several core challenges in its application. Firstly, user queries commonly exhibit colloquialisms, vague descriptions, and contextual ambiguity. Especially in complex task scenarios, such as those requiring multi-round information integration, deep logical analysis, or domain-specific expertise, traditional RAG systems struggle to accurately capture user intent, leading to search results that deviate from actual needs and significantly limiting the professionalism and comprehensiveness of generated answers. Secondly, while widely adopted query rewriting techniques can optimize input expression through semantic refinement, existing methods largely rely on single-rewrite strategies, which have limitations. They cannot effectively parse the multi-layered characteristics of user semantics, nor can they achieve a balance among multi-dimensional objectives. This limitation results in a long-term accuracy rate of less than 70% in solving complex problems, and the generated results often exhibit logical breaks or knowledge conflicts.
[0003] In this context, a reasoning module emerged. Based on the human reasoning process, it creates multiple intelligent agents and combines various thinking templates, such as lateral thinking, sequential thinking, critical thinking, and integrative thinking, to stimulate the potential thinking abilities of language models or other intelligent systems, thereby solving reasoning problems; this module greatly improves the accuracy of solving complex problems. However, when faced with simple problems, it reduces problem-solving efficiency.
[0004] Balancing time latency and question accuracy, and selectively employing both, has been a hot topic and a challenge in recent years. Chinese patent application number 202510092762.7 discloses an answer generation method, a knowledge question-answering system, and an electronic device. While it discloses a technical solution that selects corresponding retrieval and answer generation strategies based on the complexity level of the query question, it doesn't consider the current operational status of each generation module. This fails to prevent the problem of excessive load on a particular generation module due to an excessive number of queries of a certain complexity. How to balance resource allocation among different methods to improve query efficiency while reducing resource waste, and how to address the issues of unreasonable dynamic resource allocation, low query efficiency, and insufficient accuracy in complex task scenarios in existing technologies, are urgent technical problems to be solved in this field. Summary of the Invention
[0005] Based on this, the purpose of this application is to provide a dual-path generation method, medium, and system with dynamically updated weight coefficients to solve at least one of the technical problems mentioned in the background art.
[0006] Firstly, this application provides a dual-path generation method based on dynamically updated weight coefficients, including:
[0007] Obtain the user's current query question and extract the feature parameters of the query question;
[0008] Obtain the feature indicators of each feature parameter, and / or the operation indicators of the retrieval thinking module and the reasoning thinking module, in order to dynamically determine the weight coefficients of each feature parameter;
[0009] The difficulty coefficient of the query problem is determined based on each feature parameter and its corresponding weight coefficient.
[0010] Based on the difficulty level, selectively invoke the retrieval thinking module or the reasoning thinking module to output the query results.
[0011] Furthermore, the steps of obtaining the user's current query question and extracting the feature parameters of the query question include:
[0012] Several standard question templates are obtained, and each standard question template and query question are encoded into an embedding representation to obtain several template embedding vectors and query embedding vectors; then the cosine similarity between each template embedding vector and query embedding vector is obtained, and the variance of each cosine similarity is calculated to obtain the semantic dispersion.
[0013] Extract the statement analysis information of the query question, and obtain the number of child nodes corresponding to each sentence in the query question based on the statement analysis information, and use the maximum number of child nodes corresponding to each sentence as the syntactic complexity.
[0014] The segmentation query yields several target words, which are then matched with a set of technical terms to obtain technical terms. The number of technical terms and target words is counted, and the ratio between the two is taken as the domain specificity.
[0015] Furthermore, the step of obtaining the feature indicators of each feature parameter, and / or the operational indicators of the retrieval thinking module and the reasoning thinking module, in order to dynamically determine the weight coefficients of each feature parameter, also includes:
[0016] Obtain the volatility of each feature parameter within a set time period; determine the baseline weight of each feature parameter based on its volatility and current weight coefficient.
[0017] Obtain the load balance between modules and determine the additional weights of each characteristic parameter;
[0018] Based on the baseline weight and additional weight of each feature parameter, the updated weight coefficients of each feature parameter are obtained.
[0019] Furthermore, the step of determining the baseline weight of each feature parameter based on its volatility and current weight coefficient includes:
[0020] Sum the volatility of each characteristic parameter to obtain the total volatility;
[0021] Based on the ratio between the volatility of the current characteristic parameter and the sum of volatility, the volatility factor of the current characteristic parameter is determined, so as to obtain the inverse volatility factor of the current characteristic parameter.
[0022] The benchmark weight of the current feature parameter is obtained by multiplying the current weight coefficient of the current feature parameter with the anti-fluctuation factor.
[0023] Furthermore, the steps of obtaining the load balancing degree between modules and determining the additional weights of each characteristic parameter also include:
[0024] Obtain the current load of each module and determine the load difference between modules;
[0025] Obtain the time taken by each module to output query results, and determine the latency difference between modules;
[0026] The load balance is determined based on the load difference and the latency difference.
[0027] Set the adjustment coefficient for each module, calculate the product of the adjustment coefficient and the load balance of each module, and obtain the additional weight of each characteristic parameter.
[0028] Furthermore, the steps of obtaining the feature indicators of each feature parameter, and / or the operational indicators of the retrieval thinking module and the reasoning thinking module, and dynamically determining the weight coefficients of each feature parameter, include:
[0029] Obtain the validation results of each query result in the current period and the feature parameters of the corresponding query questions, and input them into the logistic regression classifier to obtain the regression coefficients of each feature parameter;
[0030] The regression coefficients of each feature parameter are fused with the current weights according to a set ratio to obtain the updated weight coefficients of each feature parameter.
[0031] Furthermore, the step of fusing the regression coefficients of each feature parameter with the current weights according to a set ratio to obtain the updated weight coefficients of each feature parameter includes:
[0032] The historical retained weights are obtained by multiplying the current weights by the set smoothing coefficient.
[0033] Normalize the regression coefficients to obtain the characteristic contribution of each characteristic parameter;
[0034] The instantaneous contribution weight is obtained by multiplying the feature contribution degree by the anti-smoothing coefficient.
[0035] Summing the historical retention weights and the current contribution weights yields the updated weight coefficients for each feature parameter.
[0036] Furthermore, the method also includes:
[0037] Obtain the routing accuracy of each query question within a certain time period, and determine whether it is less than the accuracy threshold. If not, leave it unchanged; if so, perform a rollback operation, extract several query questions for manual classification, and return to step S1.
[0038] Secondly, this application also provides a computer storage medium storing executable program code; the executable program code is used to execute the dual-path generation method based on dynamic updating of weight coefficients as described in any one of the first aspects.
[0039] Thirdly, this application also provides a computer system, including a memory and a processor; the memory stores program code executable by the processor; the program code is used to execute the dual-path generation method based on dynamic updating of weight coefficients as described in any one of the first aspects.
[0040] This invention provides a dual-path generation method, medium, and system based on dynamically updated weight coefficients. It obtains the user's current query question and extracts its feature parameters. These feature parameters allow for a more accurate understanding of the user's query intent, providing a data foundation and explicit input for subsequent calculation steps, reducing unnecessary computation. Then, it obtains the feature indicators of each feature parameter, and / or the operational indicators of the retrieval and reasoning modules, dynamically determining the weight coefficients of each feature parameter. This allows for real-time calculation of the weight coefficients of each feature parameter based on the specific operational status of the entire system. Consequently, the weight coefficients of each feature parameter are no longer fixed but change in real-time according to the system's operational status to adapt to the current system requirements. The system first considers the actual situation; then, based on each feature parameter and its corresponding weight coefficient, it determines the difficulty coefficient of the query problem. By dynamically adjusting the weight coefficients of each feature parameter, the difficulty coefficient of the query problem is no longer static but adapts to the current situation of the system. This provides a basis for subsequent selective invocation of the retrieval and reasoning modules to dynamically allocate computing resources. Finally, based on the difficulty coefficient, the system selectively invokes either the retrieval or reasoning module to output the query results. By selectively invoking appropriate processing modules based on the difficulty coefficient of each query problem, it ensures reasonable resource allocation while balancing efficiency and accuracy, improving user experience and ensuring accuracy in problem-solving while increasing processing efficiency. This solves the problems of unreasonable dynamic resource allocation, low query efficiency, and insufficient accuracy in complex task scenarios of existing technologies. Attached Figure Description
[0041] Figure 1 This is a flowchart of a dual-path generation method based on dynamic updating of weight coefficients according to an embodiment of the present invention;
[0042] Figure 2 This is a flowchart of a dual-path generation method based on elastic difficulty threshold decision-making according to an embodiment of the present invention. Detailed Implementation
[0043] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present invention.
[0044] It should be noted that if the embodiments of the present invention involve directional indications, such as up, down, left, right, front, back, etc., these directional indications are only used to explain the relative positional relationships and movement of the components in a specific posture. If the specific posture changes, the directional indications will also change accordingly. Furthermore, if the embodiments of the present invention involve descriptions such as "first," "second," "S1," "S2," "step one," "step two," etc., these descriptions are for descriptive purposes only and should not be construed as indicating or implying their relative importance, or implicitly indicating the number of technical features indicated or the order of method execution. Those skilled in the art will understand that anything that does not violate the inventive concept and is within the scope of the present invention should be included in the protection scope of the present invention.
[0045] In response to the challenges of varying complexity and scenario adaptation in user queries, this invention proposes a dual-path generation method that integrates fast thinking (retrieval thinking, such as using a knowledge graph retrieval generation module) and slow thinking (reasoning thinking, such as using a multi-step reasoning hybrid generation module). This method achieves precise task allocation and resource optimization through a dynamic routing mechanism.
[0046] Specifically, the system first constructs a hierarchical knowledge base architecture, linking structured knowledge graphs with unstructured domain documents in multiple dimensions to form a three-layer knowledge network covering basic facts, entity relationships, and logical rules. This architecture achieves independent management and efficient retrieval of knowledge modules through a dynamic indexing strategy, ensuring that queries of varying complexity can quickly locate the appropriate knowledge sub-base, selectively invoking the retrieval thinking module or the reasoning thinking module, and outputting the query results.
[0047] In the core processing of this dynamic routing mechanism, the system introduces a multi-dimensional quantitative evaluation model:
[0048] Core Invention Point 1: Dynamic Weighting Strategy for Complexity: Taking into account the feature parameters of the query problem, such as semantic ambiguity, entity density, syntactic complexity and domain specificity, a dynamic weighting strategy is used to generate complexity results, which serve as the basis for subsequent judgments.
[0049] Core Invention Point Two: Elastic Threshold Decision-Making: Based on the complexity of real-time computation, an elastic threshold decision-making mechanism is adopted to direct simple queries to fast thinking, such as the Knowledge Graph Retrieval Generation Module (Graph RAG), which uses graph relationship path mining and context enhancement generation technology to quickly output concise answers; and to direct complex queries to slow thinking, such as the Multi-Step Reasoning Hybrid Generation Module, which generates deep reasoning results through staged processing such as problem decomposition, cross-modal evidence collection, and logical chain verification. (one)
[0051] Based on the first inventive point, this invention provides a dual-path generation method based on dynamically updated weight coefficients, comprising:
[0052] S1: Obtain the user's current query question and extract the feature parameters of the query question;
[0053] Specifically, it is optional, but not limited to, obtaining the current query question input by the user and extracting features from the query question to obtain the feature parameters of the query question; the feature parameters may include any one or more of semantic discreteness, syntactic complexity, and domain specificity.
[0054] Preferably, the steps of obtaining the user's current query question and extracting the feature parameters of the query question may include:
[0055] S11: Obtain several standard question templates, and encode each standard question template and query question into an embedding representation to obtain several template embedding vectors and query embedding vectors; then obtain the cosine similarity between each template embedding vector and query embedding vector, and calculate the variance of each cosine similarity to obtain the semantic dispersion.
[0056] Specifically, all standard question templates can be obtained, but are not limited to, from existing databases or data files. Since computers cannot directly process human language, existing neural network models can be used to encode each standard question template and query question into an embedding representation. This means mapping discrete data (such as standard question templates and query questions) into a continuous, low-dimensional space, so that the standard question templates and query questions are converted into numerical forms that computers can process, thereby obtaining several template embedding vectors and query embedding vectors.
[0057] Specifically, the semantic dispersion can be obtained by optionally, but not limited to, calculating the cosine similarity between each template embedding vector and the query embedding vector using any existing cosine similarity acquisition model, and calculating the variance of each cosine similarity using any existing variance acquisition model. Preferably, the cosine similarity acquisition model uses the `sklearn.metrics.pairwise.cosine_similarity` function, and the variance acquisition model uses the `numpy.var` function.
[0058] S12: Extract the statement analysis information of the query question, so as to obtain the number of child nodes corresponding to each sentence in the query question based on the statement analysis information, and use the maximum number of child nodes corresponding to each sentence as the syntactic complexity.
[0059] Specifically, optional but not limited to using any existing natural language processing tool to perform syntactic analysis on the query question, extracting statement analysis information from the query question, thereby obtaining the boundaries and root nodes of each sentence in the query question based on the statement analysis information, and identifying the number of child nodes corresponding to each sentence in the query question through any existing child node counting module, with the maximum number of child nodes corresponding to each sentence being the syntactic complexity; the statement analysis information includes part-of-speech tagging, named entity recognition, dependency relation analysis, sentence boundaries, etc. Preferably, the natural language processing tool is SpaCy.
[0060] S13: The segmentation query problem yields several target words, which are then matched with a set of technical terms to obtain technical terms; the number of technical terms and target words are counted, and the ratio between the two is the domain specificity.
[0061] Specifically, the method can be optional but not limited to obtaining a set of professional terms for a specific professional field from existing databases or data files, then calling any existing word acquisition module to segment the query question to obtain several target words, then matching each target word with the set of professional terms, and determining whether each target word belongs to the set of professional terms. If it does, the target word is a professional word, and the number of professional words is counted. Finally, the ratio between the number of professional words and the number of target words is obtained to obtain the domain specificity.
[0062] Preferably, the word retrieval module can use Python's string split() method or the regular expression module.
[0063] For example, the following code can be used to extract feature parameters of the query question:
[0064] ddef extract_features(query):
[0065] # Semantic Discreteness (Variance of Similarity to Standard Question Template)
[0066] template_embs = [model.encode(t) for t in load_templates()]
[0067] query_emb = model.encode(query)
[0068] similarities = [cosine_similarity(query_emb, e) for e in template_embs]
[0069] semantic_var = np.var(similarities)
[0070] # Syntactic complexity (dependency tree depth)
[0071] doc = nlp(query)
[0072] syntax_depth = max([len(list(sent.root.children)) for sent indoc.sents])
[0073] # Domain proprietaryness (percentage of terms used)
[0074] domain_terms = load_domain_terms("medical")
[0075] term_count = len([t for t in query.split() if t in domain_terms])
[0076] domain_spec = term_count / len(query.split())
[0077] return {
[0078] 'semantic_var': semantic_var,
[0079] 'syntax_depth': syntax_depth,
[0080] 'domain_spec': domain_spec
[0081] }
[0082] Another example is that after entering a query, the output results can be as shown in the following code:
[0083] # Example Output
[0084] features = extract_features(clean_query)
[0085] # {'semantic_var': 0.15, 'syntax_depth': 2, 'domain_spec': 0.4}
[0086] S2: Obtain the feature indicators of each feature parameter, and / or the operation indicators of the retrieval thinking module and the reasoning thinking module, so as to dynamically determine the weight coefficient of each feature parameter;
[0087] Specifically, the system may optionally include a retrieval thinking module that generates query results by integrating knowledge graph retrieval and a reasoning thinking module that generates query results through multi-step reasoning. Optionally, before the iteration begins, a person skilled in the art may determine the initial weight coefficients of each feature parameter based on actual conditions such as historical data or the current operating status of each module, providing a data basis for the timely updating / dynamic determination of the weight coefficients. After the iteration begins, the feature indicators of each feature parameter, and / or the operating indicators of the retrieval thinking module and the reasoning thinking module, are obtained, and the weight coefficients of each feature parameter are updated and dynamically determined in real time. This is one of the inventive points of this application: the weight coefficients of each feature parameter are not fixed during the iteration process, but dynamically change and update to adapt to the indicator status of the feature parameters and the operating status of each module within this period, adapting to the actual situation of the current system, thereby improving retrieval accuracy, load balancing, and retrieval efficiency.
[0088] S3: Determine the difficulty coefficient of the query question based on each feature parameter and its corresponding weight coefficient;
[0089] Specifically, the difficulty coefficient of the query problem can be obtained by summing the various feature parameters of the query problem extracted in step S1 and the weight coefficients dynamically determined in step S2, according to their weights. This provides a basis for judgment in subsequent steps. By comprehensively judging the difficulty coefficient of the query problem from multiple feature dimensions, the accuracy of the judgment is improved. At the same time, the weight coefficients of each feature parameter can be dynamically adjusted to adapt the current weight coefficients to the current system situation, thereby indirectly affecting the difficulty coefficient of the query problem. As a result, computing resources can be dynamically allocated in subsequent steps based on the running status of each module.
[0090] S4: Based on the difficulty level, selectively call the retrieval thinking module or the reasoning thinking module to output the query results.
[0091] Specifically, depending on the difficulty level of each query, the retrieval or reasoning module can be selectively invoked to output the query results. Since each module consumes different computing resources, different modules can be invoked to process the query separately based on its difficulty level, in order to avoid wasting computing resources and improve processing efficiency.
[0092] More specifically, options include, but are not limited to, setting a difficulty threshold and determining whether the difficulty coefficient of each query exceeds the threshold. If it does, the corresponding query is considered complex and cannot be answered directly by simply searching the existing knowledge base; this can be defined as a difficult question. If not, the corresponding query is considered simple and can be answered by simply searching the existing knowledge base; this can be defined as an easy question. Therefore, depending on the difficulty coefficient of the query, different modules can be selectively called to perform knowledge graph retrieval for easy questions and multi-step reasoning and hybrid generation for difficult questions, thus obtaining the query results for each question. This saves computing resources, improves processing efficiency, and ensures the accuracy of the output query results.
[0093] This embodiment presents a multi-path generation method based on dynamically updated weight coefficients. By acquiring the user's current query question and extracting its feature parameters, the method aims to more accurately understand the user's query intent and provide a data foundation and explicit input for subsequent calculation steps, reducing unnecessary computation. Then, it acquires the feature indicators of each feature parameter, and / or the operational indicators of the retrieval and reasoning modules, dynamically determining the weight coefficients of each feature parameter. This allows for real-time calculation of the weight coefficients of each feature parameter based on the specific operational status of the entire system, ensuring that the weight coefficients are no longer fixed but change in real-time according to the system's operation to adapt to the current system. In practice, the difficulty coefficient of the query is determined based on each feature parameter and its corresponding weight coefficient. By dynamically adjusting the weight coefficients of each feature parameter, the difficulty coefficient is no longer static but adapts to the current system situation. This provides a basis for subsequent selective invocation of the retrieval and reasoning modules and dynamic allocation of computing resources. Finally, based on the difficulty coefficient, either the retrieval or reasoning module is selectively invoked to output the query results. By selectively invoking appropriate processing modules based on the difficulty coefficient of each query, reasonable resource allocation is ensured, balancing efficiency and accuracy, improving user experience, and guaranteeing accuracy while increasing processing efficiency. This solves the problems of unreasonable dynamic resource allocation, low query efficiency, and insufficient accuracy in complex task scenarios found in existing technologies.
[0094] Preferably, in step S2, during the process of dynamically determining the weight coefficients of each feature parameter, it is optional, but not limited to, setting iteration conditions and determining whether each module has reached the iteration conditions during operation to decide whether to update the weight coefficients. If so, the weight coefficients are dynamically updated according to the feature indicators of the feature parameters and / or the operation indicators of each module. The difficulty coefficient of subsequent query questions is calculated based on the updated weight coefficients. Then, the retrieval thinking module and the reasoning thinking module are selectively called according to the difficulty threshold. By updating the weight coefficients of each feature parameter, the difficulty coefficient of each query question is indirectly changed to avoid excessive load on a single module in the system, thereby reallocating computing resources and ensuring that the data processing speed and result accuracy of the system are not affected.
[0095] Further preferred, the iteration conditions can be set as the number of system runs or the running time. Each time a module completes processing a query and obtains the answer to the query, it can be considered a successful run. Alternatively, it can be set as an accuracy threshold, and the query results are verified to obtain the verification accuracy. The verification accuracy is checked to see if it is lower than the set threshold. When the set iteration conditions are met, step S2 is executed to dynamically determine the weight coefficients of each feature parameter. More specifically, the iteration conditions can include small cycles and large cycles. For example, a small cycle can be set to a module running time of 2 hours, and a large cycle can be set to a module running time of 24 hours. Or, a small cycle can be set to a number of queries processed reaching 1,000, and a large cycle can be set to a number of queries processed reaching 10,000. By dividing the iteration conditions into short-term and long-term cycles, we can capture short-term changes in the system state through short-term cycles, adjust the difficulty coefficient in a timely manner, and avoid performance degradation due to short-term anomalies. We can also fine-tune the process based on specific problems to improve local efficiency. On the other hand, we can use long-term historical data to adjust the difficulty coefficients of each feature parameter, avoiding being misled by short-term fluctuations in short-term cycles and ensuring that the adjustment of the difficulty coefficient is based on the true trend. Through nested updates of short and long cycles, and considering both the local and overall operational status of the entire system, we can significantly improve the data processing efficiency of each module.
[0096] Further preferably, the characteristic indicators may include the volatility of each characteristic parameter; the operational indicators may include the load balancing of each module; and when the iteration condition of a small cycle is met, the step of dynamically determining the weight coefficient of each characteristic parameter based on the characteristic indicators of each characteristic parameter, and / or the operational indicators of the retrieval thinking module and the reasoning thinking module, may include:
[0097] S21: Obtain the volatility of each feature parameter within a set time period; determine the baseline weight of each feature parameter based on its volatility and current weight coefficient.
[0098] More specifically, but not limited to, collecting the characteristic values of each characteristic parameter once in each time period within a set time period to obtain the time series data of each characteristic parameter, so as to further calculate the volatility σ_i of each characteristic parameter, and then determining the benchmark weight of each characteristic parameter based on the volatility σ_i of each characteristic parameter and the current weight coefficient; the volatility is preferably the variance of the time series data of each characteristic parameter.
[0099] Preferably, the step of determining the baseline weight of each feature parameter based on the volatility of each feature parameter and the current weight coefficient may include:
[0100] S211: Sum the volatility of each characteristic parameter to obtain the total volatility;
[0101] S212: Determine the volatility factor of the current characteristic parameter based on the ratio between the volatility of the current characteristic parameter and the sum of volatility, so as to obtain the anti-volatility factor of the current characteristic parameter;
[0102] S213: Calculate the product of the current weight coefficient of the current characteristic parameter and the anti-fluctuation factor to obtain the benchmark weight of the current characteristic parameter.
[0103] Specifically, optionally but not limited to, collecting the characteristic values of each characteristic parameter at regular intervals within a set time period to obtain time series data of each characteristic parameter, further calculating the volatility σ_i of each characteristic parameter, where i is the i-th volatility, and summing the volatility of each characteristic parameter to obtain the total volatility Σσ, and obtaining the ratio between the volatility of the current characteristic parameter and the total volatility to obtain the volatility factor σ_i / Σσ of the current characteristic parameter, and obtaining the anti-volatility factor 1 - σ_i / Σσ of the current characteristic parameter based on the volatility factor, and finally calculating the product of the current weight coefficient and the anti-volatility factor of the current characteristic parameter to obtain the benchmark weight of the current characteristic parameter; the volatility is preferably the variance of the time series data of each characteristic parameter. It is worth noting that the current weight coefficient of the characteristic parameter can be selected as an initial weight coefficient arbitrarily set by those skilled in the art before the iteration starts, and is updated to the current weight coefficient after the iteration starts.
[0104] More specifically, the anti-volatility factor is expressed as 1 - σ_i / Σσ, meaning that the greater the volatility, the lower the weight of the corresponding feature parameter, in order to avoid the unstable feature having too much influence on the weight.
[0105] For example, if the semantic dispersion (current weight coefficient Base_Weight=0.5) has a σ_i of 0.3, and all features have a Σσ of 0.8, then the baseline weight adjusted for semantic dispersion is: Base_Weight×(1 - σ_i / Σσ) =0.5×(1 -0.3 / 0.8) = 0.5×0.625 = 0.3125.
[0106] S22: Obtain the load balance between modules and determine the additional weights of each characteristic parameter;
[0107] Specifically, optional but not limited to collecting operational metrics of each module (such as the number of pending queries in each module, the average time spent processing each query, etc.) to determine the load balancing degree between modules, Load_Imbalance, can be used as an additional weight for each feature parameter.
[0108] Preferably, the steps of obtaining the load balancing degree between modules and determining the additional weights of each characteristic parameter may include:
[0109] S221: Obtain the current load of each module and determine the load difference between modules; specifically, obtain the number of currently pending query questions in the retrieval thinking module and the reasoning thinking module, and calculate the difference between the two to obtain the load difference between the modules.
[0110] S222: Obtain the time taken by each module to output query results and determine the delay rate difference between modules; specifically, obtain the time taken by the retrieval thinking module and the reasoning thinking module to process each query question within a set time period, calculate the average value, obtain the delay rate index of each module, and obtain the delay rate difference between the delay rate indices of each module.
[0111] S223: Determine the load balance degree based on the load difference and the latency difference; specifically, the load balance degree can be obtained by weighted summation of the load difference and the latency difference according to the prior balance coefficient.
[0112] Specifically, options include, but are not limited to, setting a load balancing coefficient, obtaining the number of pending queries for each module (as the current load of each module), obtaining the number of queries processed by each module within a set time period, calculating the average time taken to process each query (as the latency metric for each module), obtaining the difference between the current load and latency metric for each module (as the load difference and latency difference), and finally, weighted summing the load difference and latency difference based on the load balancing coefficient to obtain the load balancing degree.
[0113] For example, the load balancing degree (Load_Imbalance) of each module can be calculated according to Equation 2-1:
[0114] Load_Imbalance=β×|Queue_G - Queue_R| +γ×|Latency_G - Latency_R| 2-1
[0115] The meanings of each parameter are shown in Table 1:
[0116] Table 1: Schematic Table of Load Balancing Parameters
[0117]
[0118] Preferably, the balancing coefficient can be adjusted in real time based on the operating data of each module using the following code:
[0119] from scipy.stats import pearsonr
[0120] corr_q, _ = pearsonr(queue_diff, routing_drop)
[0121] corr_l, _ = pearsonr(latency_diff, routing_drop)
[0122] total = abs(corr_q) + abs(corr_l)
[0123] β = abs(corr_q) / total
[0124] γ = abs(corr_l) / total
[0125] For example, if the results are: corr_q = 0.6, corr_l = 0.4, then β = 0.6, γ = 0.4.
[0126] S224: Set the adjustment coefficient of each module, calculate the product of the adjustment coefficient of each module and the load balance degree, and obtain the additional weight of each characteristic parameter.
[0127] Specifically, optional but not limited to summing the baseline weight and additional weight of each feature parameter based on their additional weights, the updated weight coefficients of each feature parameter are obtained. This makes the weights no longer fixed, but dynamically change with the volatility (σ_i) and load balancing (Load_Imbalance) of each feature parameter. When the system is unbalanced (i.e., Load_Imbalance is large), the load balancing is adjusted through the adjustment coefficient λ of each module to avoid global imbalance caused by local optimization. This allows the difficulty coefficient of each query problem to be accurately calculated through the combined effect of multiple parameters in subsequent steps, ensuring the accuracy of query problem classification based on difficulty coefficient in subsequent steps, thereby improving system resource allocation and processing efficiency.
[0128] S23: Based on the baseline weight and additional weight of each feature parameter, obtain the updated weight coefficients of each feature parameter.
[0129] Specifically, the algorithm can, but is not limited to, summing the baseline weights and additional weights of each feature parameter to obtain the updated weight coefficients of each feature parameter. On the one hand, if the fluctuation of a certain feature parameter is too large, it indicates that the feature parameter has poor stability, so it needs to be suppressed by the baseline weight to reduce the impact of the feature parameter on the difficulty coefficient. On the other hand, based on the current operating indicators of each module, the load balance between modules is obtained as the additional weight of each feature parameter to avoid the situation where the difficulty coefficient calculation method or classification method is unreasonable, resulting in a situation where the number of query problems that a certain module needs to handle is far higher than its own processing capacity, while the number of query problems that another module needs to handle is far lower than its own processing capacity, causing the overall system processing speed to decrease.
[0130] For example, the updated weight coefficients of each feature parameter can be calculated according to Equation 2-2:
[0131] Adjusted_Weight_i = Base_Weight_i×(1 - σ_i / Σσ) +λ× Load_Imbalance2-2
[0132] Where Adjusted_Weight_i is the weight coefficient of the i-th feature parameter after the update, Base_Weight_i is the current weight coefficient of the i-th feature parameter, σ_i is the volatility of the i-th feature parameter, Σσ is the sum of the volatility of all feature parameters, λ is the adjustment coefficient, and Load_Imbalance is the load balancing degree.
[0133] It is worth noting that the weights of each feature parameter must be within the weight limit range, and the sum of the weights of each feature parameter must be 1, in order to avoid a single feature parameter having too much impact on the difficulty coefficient of the query problem and reducing the accuracy of the judgment. Preferably, the weight limit range is [0.1, 0.6].
[0134] For example, the specific meanings of each parameter in Equation 2-2 can be found in Table 2:
[0135] Table 2: Difficulty Coefficient Weighting Table
[0136]
[0137] Once the updated weight coefficients are obtained, the feature parameters can be summed according to their weights to determine the difficulty coefficient of the query problem.
[0138] For example, one can optionally sum the feature parameters according to Equation 2-3 based on the updated current weight coefficients to obtain the difficulty coefficient of the query problem:
[0139] Complexity_Score=∑(Feature_i×Adjusted_Weight_i) 2-3
[0140] Where Complexity_Score is the difficulty coefficient of the query question, Feature_i is the value of the i-th feature parameter, and Adjusted_Weight_i is the current weight coefficient of the i-th feature parameter after the update.
[0141] In this embodiment, the specific steps for updating the weight coefficients of the present invention are given. First, the baseline weight of each feature parameter is determined based on the volatility index. Then, the additional weight of each feature parameter is determined based on the load balancing of each module. Thus, the updated weight coefficient of each feature parameter is obtained based on the baseline weight and the additional weight. This method can indirectly affect the difficulty coefficient of the query problem by adjusting the weight coefficient of each feature parameter, thereby balancing the computing resources of each module and improving query efficiency and query accuracy.
[0142] Preferably, the operational metrics may include the verification results of each query result; when the iteration conditions of a large cycle are met, the step of dynamically determining the weight coefficients of each feature parameter based on the feature metrics of each feature parameter, and / or the operational metrics of the retrieval thinking module and the reasoning thinking module, further includes:
[0143] S24: Obtain the verification results of each query result in the current period and the feature parameters of the corresponding query question, and input them into the logistic regression classifier to obtain the regression coefficients of each feature parameter;
[0144] S25: The regression coefficients of each feature parameter are fused with the current weights according to a set ratio to obtain the updated weight coefficients of each feature parameter.
[0145] Specifically, it is optional, but not limited to, obtaining the verification results of each query result and the feature parameters of the corresponding query question within the current time period, and inputting any existing logistic regression classifier to obtain the regression coefficients of each feature parameter, and then merging the regression coefficients of each feature parameter with the current weights according to a set ratio to obtain the updated weight coefficients of each feature parameter.
[0146] Preferably, the step of fusing the regression coefficients of each feature parameter with the current weights according to a set ratio to obtain the updated weight coefficients of each feature parameter may include:
[0147] S251: Obtain the product of the current weight and the set smoothing coefficient to get the historical retained weight;
[0148] Specifically, a smoothing coefficient α can be optionally set, and the product of the current weight and the smoothing coefficient can be obtained to retain the weights for the past, thereby inheriting some of the results obtained in the previous iteration. This avoids sudden changes in the behavior of each module due to updating the weight coefficients of each feature parameter only based on the current running status of each model.
[0149] S252: Normalize the regression coefficients to obtain the characteristic contribution of each characteristic parameter;
[0150] Specifically, since the regression coefficients obtained at this time may be positive or negative and vary in size, they cannot be integrated with the current weights of each feature parameter according to the set ratio. Therefore, it is necessary to normalize the regression coefficients first and scale them to the same range as the weights to obtain the feature contribution of each feature parameter.
[0151] S253: Obtain the product of the feature contribution and the anti-smoothing coefficient to get the instantaneous contribution weight;
[0152] S254: Summing the historical retention weights and the current contribution weights yields the updated weight coefficients for each feature parameter.
[0153] Specifically, an anti-smoothing coefficient (1-α) can be determined based on the smoothing coefficient. The product of the feature contribution and the anti-smoothing coefficient is then calculated to obtain the instantaneous contribution weight. This allows the entire system to dynamically adjust the weights of each feature parameter based on the current operating status of each module, thereby optimizing the allocation of computing resources. At the same time, the update amplitude is controlled by the anti-smoothing coefficient to avoid abrupt changes in behavior. Finally, the two are summed to obtain the updated weight coefficients of each feature parameter, ensuring the adaptability of the entire system while maintaining stability.
[0154] For example, the updated weight coefficients of each feature parameter can be obtained according to Equation 2-4:
[0155] new_weight_i = α × Base_Weight_i + (1-α) × normalized(feature_contrib_i) 2-4
[0156] Wherein, new_weight_i is the weight coefficient after the i-th feature parameter is updated, α is the smoothing coefficient, which can be arbitrarily set by those skilled in the art, Base_Weight_i is the current weight of the i-th feature parameter, (1-α) is the anti-smoothing coefficient, feature_contrib_i is the regression coefficient of the i-th feature parameter, normalized(feature_contrib_i) is the feature contribution of the i-th feature parameter, and normalized() represents normalizing the parameters in parentheses.
[0157] For example, the above steps can be implemented using the following code:
[0158] Input: Sample X (feature matrix) and labels y (1 = correct route, 0 = incorrect) of queries that were successfully routed in the past 24 hours.
[0159] from sklearn.linear_model import LogisticRegression
[0160] model = LogisticRegression()
[0161] model.fit(X, y) # Train the classifier
[0162] # Output feature contribution (basis for weight adjustment)
[0163] feature_contrib = model.coef_[0] # e.g., [0.5, -0.2, 0.3] corresponds to semantic discreteness / syntactic complexity / domain specificity
[0164] For example, the steps to adjust the corresponding weights of each feature parameter based on its contribution can be as follows:
[0165] # Original weights: Base_Weight = [0.5, 0.3, 0.2]
[0166] # New weight calculation (with smoothing constraint):
[0167] new_weight_i = α × Base_Weight_i + (1-α) × normalized(feature_contrib_i)
[0168] # Constraints:
[0169] # 1. Weight range ∈ [0.1, 0.6]
[0170] # 2. Total weights = 1
[0171] # Where α = 0.7 (historical weighting) to avoid drastic fluctuations.
[0172] Preferably, after the weights are updated, they can be verified using the following steps:
[0173] S26: Obtain the routing accuracy of each query question within a certain time period, and determine whether it is less than the accuracy threshold. If not, leave it unchanged; if so, perform a rollback operation, extract several query questions for manual classification, and return to step S1.
[0174] Specifically, optional but not limited to setting an accuracy threshold, when the iteration conditions of a large cycle are met, the weight coefficients are dynamically updated, and the routing success rate of the query questions in the subsequent several small cycles is obtained. It is then determined whether the success rate is less than the accuracy threshold. If not, it means that the weight coefficients have been updated successfully, and the efficiency and accuracy of the system have improved. If yes, it means that the weight coefficients have been updated unsuccessfully, resulting in the query questions being misclassified and the wrong modules being used to process them. Therefore, the updated weights have actually affected the efficiency and accuracy of the system. A rollback operation is required, that is, the weights before the update are restored, and some query questions are manually classified to avoid the failure of the feature parameter weight coefficient update due to the distortion of routing accuracy.
[0175] Example: Validation steps and conditions may include:
[0176] Key metric: routing accuracy;
[0177] Qualification criteria: Accuracy rate > 90% for three consecutive periods;
[0178] Failure handling: Roll back the weight + increase the proportion of manual review.
[0179] (II) Invention Point Two: On the other hand, the present invention also provides a dual-path generation method based on elastic difficulty threshold decision-making, comprising:
[0180] P1: Obtain the user's current query question and extract the feature parameters of the query question; for details, you can refer to step S1, which will not be repeated here.
[0181] P2: Determine the difficulty level of the query based on the feature parameters; for details, you can refer to steps S2-S3, which will not be repeated here.
[0182] P3: Dynamically update the difficulty threshold based on the feature indicators of the feature parameters, and / or the operation indicators of the retrieval thinking module and the reasoning thinking module;
[0183] P4: Determine whether the difficulty coefficient exceeds the difficulty threshold. Based on the determination result, selectively call the retrieval thinking module or the reasoning thinking module, and output the query results.
[0184] Specifically, optionally but not limited to setting a difficulty threshold by those skilled in the art before the iteration begins, a dynamically updated difficulty threshold is used after the iteration begins. The difficulty coefficient of each query question is then determined to be within the threshold. Based on the determination result, either the retrieval thinking module or the reasoning thinking module is selectively invoked to process the query question and output the query result. By selectively invoking different modules based on the difficulty coefficient of the query question, knowledge graph retrieval is performed for easier questions, while multi-step reasoning is used for more difficult questions, resulting in query results for each question. This saves computational resources while ensuring the accuracy of the output query results. It is worth noting that the retrieval thinking module or the reasoning thinking module is merely an example and is not limited to this. The retrieval thinking module can be any fast thinking module capable of simple processing of query questions and quickly outputting query results for easier questions, while the reasoning thinking module can be any slow thinking module capable of multi-step reasoning and accurately outputting query results for more difficult questions.
[0185] This embodiment presents a dual-path generation method based on elastic difficulty threshold decision-making according to the present invention. It obtains the user's current query question and extracts its feature parameters. These feature parameters more accurately determine the difficulty of the query question, providing a data foundation and clear input for subsequent steps, reducing unnecessary calculations. Then, based on the feature parameters, the difficulty coefficient of the query question is determined, providing a data foundation for subsequent classification of the query question according to the difficulty coefficient. Next, the difficulty threshold is dynamically updated based on the feature indicators of the feature parameters and / or the operating indicators of the retrieval and reasoning modules. By dynamically updating the difficulty threshold based on the overall system operation, such as the feature indicators of the feature parameters and / or the operating indicators of the retrieval and reasoning modules, the classification of each query question in subsequent steps is changed to ensure reasonable resource allocation and guarantee system efficiency and accuracy. Finally, it determines whether the difficulty coefficient exceeds the difficulty threshold. Based on the determination result, the retrieval or reasoning module is selectively invoked to output the query result. By selectively invoking appropriate processing modules based on the difficulty coefficient of each query question, reasonable resource allocation is ensured, while balancing efficiency and accuracy, improving user experience. This solves the problems of unreasonable dynamic resource allocation, low query efficiency, and insufficient accuracy in existing technologies under complex task scenarios.
[0186] Preferably, step P3 may, but is not limited to, determining whether each module has reached the iteration condition during execution based on the iteration condition described in the dual-path generation method based on dynamically updated weight coefficients. If so, the difficulty threshold is dynamically updated based on the feature indicators of the feature parameters and / or the operation indicators of the retrieval thinking module and the reasoning thinking module, so as to reallocate computing resources and thus ensure the system's operating efficiency and accuracy.
[0187] Further preferred, the operating metrics may include accuracy and latency; when the iteration condition of a small cycle is met, the step of dynamically updating the difficulty threshold based on the feature metrics of the feature parameters, and / or the operating metrics of the retrieval thinking module and the reasoning thinking module, may include:
[0188] P31: Obtain the accuracy and latency of each module within a set time period to obtain accuracy and latency metrics.
[0189] P32: Obtain the difference between the accuracy indicators of each module and the difference between the latency indicators of each module to get the accuracy difference and latency difference.
[0190] Specifically, optional but not limited to, within a set time period, the total number of problems processed by each module and the number of problems processed correctly are obtained by reading the work logs of each module or by manual statistics. The ratio between the number of correctly processed problems and the total number of problems processed is calculated to obtain the accuracy rate of each module, which is the accuracy rate indicator. The average time spent by each module in processing each problem within the set time period is obtained as the latency rate, which is the latency rate indicator. The difference between the accuracy rate indicators of each module and the difference between the latency rate indicators of each module are then obtained to obtain the accuracy rate difference and latency rate difference.
[0191] P33: Obtain the maximum value among the various latency rate indicators as the normalization parameter to normalize the latency rate difference and obtain the normalized value;
[0192] P34: Based on the prior accuracy coefficient and the time coefficient, sum the accuracy difference and the normalization value to obtain the updated difficulty threshold.
[0193] Specifically, since the accuracy difference and latency difference need to be summed proportionally to calculate the updated difficulty threshold, but the accuracy difference and latency difference may not be comparable due to numerical differences, the maximum value among the latency indicators can be selected as the normalization parameter to normalize the latency difference and obtain a normalized value, so that the accuracy difference and latency difference are on the same order of magnitude. Then, based on the prior accuracy coefficient and time coefficient, the accuracy difference and the normalized value are summed to obtain the updated difficulty threshold.
[0194] For example, the difficulty threshold can be updated according to Equation 3-1. :
[0195] 3-1
[0196] in, The updated difficulty threshold. These are the accuracy metrics for the retrieval thinking module and the reasoning thinking module, respectively. These are the latency metrics for the retrieval thinking module and the reasoning thinking module, respectively. To obtain the maximum value of the latency rate index of the retrieval thinking module or the latency rate index of the inference thinking module. It is worth noting that the two coefficients 0.05 and 0.03 in Equation 3-1 are the accuracy coefficient and time coefficient set in advance by those skilled in the art, respectively, to control the degree of influence of the accuracy index and latency rate index on the current threshold benchmark.
[0197] Preferably, since the method of classifying query questions based on a single-point threshold still has many problems, such as noise or short-term fluctuations causing slight changes in the difficulty threshold of the query question, which may lead to completely opposite judgment results, it is necessary to allow the difficulty threshold of the query question to fluctuate within a certain range, and to further subdivide the query questions within this range to improve the classification accuracy.
[0198] When the iteration conditions for a small cycle are met, the step of dynamically updating the difficulty threshold based on the feature indicators of the feature parameters and / or the operational indicators of the retrieval thinking module and the reasoning thinking module also includes:
[0199] P35: Obtain the current difficulty threshold and set the buffer coefficient and buffer baseline value;
[0200] P36: Obtain the product of the set buffer coefficient and the set buffer baseline value as the buffer threshold;
[0201] P37: Subtract the buffer threshold from the current difficulty threshold to get the first difficulty threshold, and add the buffer threshold to the current difficulty threshold to get the second difficulty threshold.
[0202] Specifically, options include, but are not limited to, setting a buffer coefficient and a buffer baseline value σ, using their product as the buffer threshold, then subtracting the buffer threshold from the current difficulty threshold to obtain a first difficulty threshold, and adding the buffer threshold to the current difficulty threshold to obtain a second difficulty threshold. This expands the difficulty threshold from a single point to an interval, facilitating further subdivision of query questions within the intervals between each difficulty threshold. Preferably, before iterating the difficulty threshold, an initial difficulty threshold needs to be set by someone skilled in the art, and this initial difficulty threshold can be set according to the technical field of the query question.
[0203] For example, if the buffer coefficient is set to 0.1, then the buffer threshold is 0.1σ. The first difficulty threshold can be represented as θ−0.1σ, and the second difficulty threshold can be represented as θ+0.1σ.
[0204] A further preferred step involves determining whether the difficulty coefficient exceeds a difficulty threshold, selectively invoking the retrieval thinking module or the reasoning thinking module based on the determination result, and outputting the query results. This includes:
[0205] P41: When the difficulty coefficient is less than the first difficulty threshold, the corresponding query question is an easy question; when the difficulty coefficient is greater than the second difficulty threshold, the corresponding query question is a difficult question; when the difficulty coefficient is not less than the first difficulty threshold and not greater than the second difficulty threshold, the corresponding query question is judged manually to determine whether it is an easy question or a difficult question.
[0206] P42: Based on the classification results of each query question, input the easy questions into the retrieval thinking module and the difficult questions into the reasoning thinking module, and output the query results.
[0207] Specifically, optional but not limited to classifying each query question according to the first difficulty threshold, the second difficulty threshold, and the difficulty coefficient of each query question obtained in step P37, resulting in easy and difficult questions, and then calling different modules to process each query question according to the classification results, so as to improve processing efficiency and accuracy. When the difficulty coefficient is less than the first difficulty threshold, it means that the corresponding query question is relatively easy, requiring less computing resources and data processing. When the difficulty coefficient is greater than the second difficulty threshold, it means that the corresponding query question is relatively difficult, requiring more computing resources and data processing. When the difficulty coefficient is not less than the first difficulty threshold and not greater than the second difficulty threshold, it means that the query question needs to be further judged by humans to determine whether the corresponding query question is easy or difficult, so as to improve the classification accuracy. After all query questions have been classified, they can be input into the corresponding modules according to the classification results of each query question, and the query results can be output.
[0208] As an example, you can optionally use the following code to call the corresponding module for processing based on the difficulty level of the query:
[0209] def route_decision(score, theta):
[0210] sigma = 0.1 # Buffer band width factor
[0211] lower_bound = theta - sigma
[0212] upper_bound = theta + sigma
[0213] if score < lower_bound:
[0214] return "graph_rag" # Force a simple problem
[0215] elif score > upper_bound:
[0216] return "reasoning" # Force a complex problem
[0217] else:
[0218] return "human_review" # Requires manual review
[0219] # Current threshold (medical field)
[0220] theta_medical = 0.62
[0221] decision = route_decision(0.755, theta_medical)
[0222] # Returns "reasoning" (0.755 > 0.62 + 0.1 = 0.72)
[0223] Alternatively, the retrieval thinking module can be invoked to process easy questions using the following code:
[0224] Prompt example:
[0225] Answer the questions based on the following knowledge graph content:
[0226] [Knowledge Fragment]
[0227] - Entity A attribute: {value1}
[0228] Entity A → Relationship → Entity B
[0229] Question: {query}
[0230] Please answer concisely in 1-2 sentences.
[0231] Algorithm pseudocode:
[0232] MATCH path=(start)-[r*1..3]->(end)
[0233] WHERE start.name =~'(?i).*'+$query+'.*'
[0234] WITH path, [n IN nodes(path) | n.description] AS context
[0235] ORDER BY tfidf(context) DESC
[0236] RETURN context LIMIT 3
[0237] Alternatively, the reasoning module can be invoked to handle difficult problems via the following code:
[0238] def validate_answer(answer):
[0239] claims = extract_claims(answer) # Extract declarative statements
[0240] verification_results = []
[0241] for claim in claims:
[0242] # Knowledge Graph Fact Verification
[0243] kg_check = neo4j.run("MATCH (e) WHERE e.property = $value RETURNcount(e)", value=claim.value)
[0244] # Logical Contradiction Detection
[0245] logic_check = detect_contradictions(claim, prior_context)
[0246] verification_results.append(kg_check and logic_check)
[0247] return all
[0248] Furthermore, after each module processes the query and obtains the query results, the results may not match the facts due to reasons such as flaws in the logical rule design, loopholes in the reasoning algorithm, or semantic biases. Therefore, it is necessary to verify the query results to determine whether the classification of each query question is correct. The following code can be used to verify the query results:
[0249] def validate_answer(answer):
[0250] claims = extract_claims(answer) # Extract declarative statements
[0251] verification_results = []
[0252] for claim in claims:
[0253] # Knowledge Graph Fact Verification
[0254] kg_check = neo4j.run("MATCH (e) WHERE e.property = $value RETURNcount(e)", value=claim.value)
[0255] # Logical Contradiction Detection
[0256] logic_check = detect_contradictions(claim, prior_context)
[0257] verification_results.append(kg_check and logic_check)
[0258] return all(verification_results)
[0259] Preferably, the operating metrics may include historical accuracy; when the iteration conditions of a large cycle are met, the step of dynamically updating the difficulty threshold based on the feature metrics of the feature parameters, and / or the operating metrics of the retrieval thinking module and the reasoning thinking module, includes:
[0260] P38: Obtain the historical accuracy of each module, and derive the updated difficulty threshold based on the historical accuracy and the difficulty threshold;
[0261] Specifically, step P38 may include:
[0262] P381: Obtain the historical accuracy of each module and get the accuracy difference between each module;
[0263] P382: Obtain the product of the accuracy difference and the learning coefficient to get the threshold correction amount;
[0264] P383: Sum the current difficulty threshold and the threshold correction amount to obtain the updated difficulty threshold.
[0265] Specifically, due to the diverse types and varying complexities of user queries, each module needs to dynamically update its threshold benchmark based on actual operating conditions. This is to prevent overloading of any module and to ensure that queries are more accurately assigned to appropriate modules. Therefore, it is optional, but not limited to, setting the learning coefficient by those skilled in the art and obtaining the time period based on iteration conditions (e.g., if the iteration condition is that the running time of each module is not less than 24 hours, then the current time period is the most recent 24 hours). The historical accuracy of each module within the current time period is collected to obtain the accuracy difference between modules. The threshold correction amount is obtained based on the accuracy difference and the learning coefficient. The current difficulty threshold is then corrected based on the threshold correction amount to obtain the updated difficulty threshold.
[0266] For example, the updated difficulty threshold for each time period can be calculated using Equation 3-2:
[0267] θ(t+1) =θ(t) + α × (A_g - A_r) 3-2
[0268] Where θ(t+1) is the difficulty threshold for the next time period, θ(t) is the difficulty threshold for the current time period, α is the learning rate, A_g is the accuracy of the retrieval thinking module in the current time period, and A_r is the accuracy of the reasoning thinking module in the current time period.
[0269] More specifically, according to Equation 3-2, when the accuracy of the retrieval thinking module is higher, (A_g - A_r) is positive, and θ(t+1) will be larger than θ(t), causing more query questions to be classified as easy questions. This allows more query questions to be assigned to the retrieval thinking module, thus expanding its processing scope and improving the overall accuracy of problem processing. When the accuracy of the reasoning thinking module is higher, (A_g - A_r) is negative, and θ(t+1) will be smaller than θ(t), causing more query questions to be classified as difficult questions. This allows more query questions to be assigned to the reasoning thinking module, thus expanding its processing scope and improving the overall accuracy of problem processing.
[0270] P39: Determine whether the current load of the retrieval thinking module is greater than the corresponding safe load threshold. If not, keep it unchanged. If so, determine the temporary floating threshold and subtract the temporary floating threshold from the updated difficulty threshold to obtain the floating difficulty threshold.
[0271] P310: Determine whether the current load of the reasoning module is greater than the corresponding safe load threshold. If not, leave it unchanged. If so, determine the temporary floating threshold and add the temporary floating threshold to the updated difficulty threshold to obtain the floating difficulty threshold.
[0272] Specifically, it is optional, but not limited to, obtaining the maximum load of each module to obtain the safe load threshold δmax of each module. Then, it is determined whether the current load of the retrieval thinking module or the reasoning thinking module is greater than its corresponding safe load threshold. If not, it means that each module is running normally and there is no need to optimize the load of each module by adjusting the difficulty threshold. If so, it means that the current module load is too large and some of the load of the current module needs to be transferred to other modules. When the load Q_g of the retrieval thinking module is too large, the difficulty threshold needs to be added to the temporary floating threshold δ to obtain the floating difficulty threshold. When the load Q_r of the reasoning thinking module is too large, the difficulty threshold needs to be subtracted from the temporary floating threshold δ to obtain the floating difficulty threshold. In this way, the load of each module can be optimized by adjusting the difficulty threshold to avoid overload of a single module.
[0273] Preferably, the step of determining the temporary floating threshold includes:
[0274] P3101: Obtain the maximum load and safety factor of each module, calculate the product of the maximum load and safety factor of each module, and obtain the safe load of each module;
[0275] P3102: Obtain the difference between the current load and the safe load of each module, calculate the product of the prior sensitivity coefficient and each difference, and obtain the temporary floating threshold.
[0276] For example, the temporary floating threshold δ can be calculated using Equation 3-3:
[0277] δ = k × (Current_Queue_Length - 0.8 × Max_Queue_Capacity) 3-3
[0278] Where k is the sensitivity coefficient (optionally set to 0.001) to ensure that δ is positively correlated with the overload level; Current_Queue_Length is the current load of each module, 0.8 is the safety factor, which can be arbitrarily set by those skilled in the art, and Max_Queue_Capacity is the maximum load of the module. More specifically, when the current load of each module returns to normal, that is, when it is not greater than the safe load threshold δmax, θ is restored to the value determined by the accuracy of each module in step S411 (i.e., temporary floating and not persistent).
[0279] For example, the steps to determine whether to temporarily float the threshold baseline based on the current load of each module can be as follows:
[0280] Queue_G = Current load of the retrieval module
[0281] Queue_R = Current load of the inference module
[0282] Max_Queue = Maximum load capacity (e.g., 1000)
[0283] If Queue_G > 0.8 Max_Queue:
[0284] θ_temp = θ_new + δ # Temporarily increase the threshold
[0285] Where δ = min(0.1, 0.01 × (Queue_G - 0.8 × Max_Queue)) # Example calculation, with constraint δ≤0.1
[0286] - If Queue_R > 0.8 × Max_Queue:
[0287] θ_temp = θ_new - δ # Temporarily lower the threshold
[0288] - Otherwise: θ_temp = θ_new
[0289] Preferably, since the data distribution characteristics differ across technical fields, it is necessary to set different constraint ranges for the difficulty thresholds corresponding to each technical field, thereby further constraining the difficulty thresholds to determine the final difficulty thresholds.
[0290] The constraint range can be expressed as Equation 3-4:
[0291] θ_final = max(θ_min, min(θ_temp, θ_max)) 3-4
[0292] Where θ_final is the final determined floating difficulty threshold.
[0293] For example, the initial state is set as θ_current = 0.62, A_g = 0.92 (recent accuracy of the retrieval thinking module), A_r = 0.85 (recent accuracy of the inference thinking module), α = 0.05, retrieval thinking module load = 700 (Max_Queue = 1000, not overloaded), and inference thinking module load = 500 (not overloaded). The threshold baseline calculation steps may include:
[0294] Step 1: Accuracy Difference Driven Update
[0295] Δ = A_g - A_r = 0.07
[0296] θ_new = 0.62 + 0.05 × 0.07 = 0.6235
[0297] Step 2: Load check (no overload, no floating triggered)
[0298] θ_temp = 0.6235
[0299] Step 3: Range constraints (assuming θ ∈ [0.4, 0.8])
[0300] θ_final = 0.6235
[0301] Result: The threshold was slightly increased from 0.62 to 0.6235, which slightly expanded the processing range of the retrieval thinking module.
[0302] For example, the specific meanings of each parameter in the above steps can be selected as shown in Table 3:
[0303] Table 3: Floating Threshold Baseline Parameter Table
[0304]
[0305] Specifically, to enhance system robustness, a joint verification mechanism is designed: key entities and logical assertions output by the inference engine are validated in real-time using a knowledge graph, detecting contradictions and triggering a rerouting process. Simultaneously, the system integrates an adaptive optimization module that dynamically adjusts routing thresholds and feature weights based on historical performance data, balancing response speed and result accuracy through an exponential smoothing algorithm. In abnormal scenarios, a built-in circuit breaker mechanism automatically switches to degradation mode to ensure service continuity. This solution, through the collaborative scheduling and closed-loop optimization of heterogeneous processing engines, achieves a performance breakthrough in general question-answering scenarios by reducing average response time and improving accuracy for complex questions, significantly outperforming traditional single-path architectures.
[0306] Preferably, the method further includes:
[0307] P311: During the operation of each module, real-time collection of set monitoring index data is performed, and the current abnormal mode type is obtained based on prior judgment conditions; the abnormal mode type includes any one or more of feature drift, performance degradation, engine failure and system circuit breaker.
[0308] P312: Obtain the anomaly level corresponding to the current anomaly mode type, and take the corresponding prior repair operation according to the anomaly level; prior repair operation includes any one or more of automatic repair, degraded operation and circuit breaker handling.
[0309] Specifically, you can optionally, but are not limited to, setting several monitoring indicators and collecting data to obtain several monitoring indicator data. Then, you can determine whether the monitoring indicator data meets the prior judgment conditions. If any one of the prior judgment conditions is met, the corresponding abnormal mode type is obtained. If any one of the prior judgment conditions is not met, the corresponding abnormal type does not exist. Then, based on the abnormal mode type, the corresponding abnormal level is obtained, and corresponding repair operations are carried out.
[0310] For example, the steps for collecting monitoring indicator data can be implemented using the following code:
[0311] monitor_metrics = {
[0312] 'graph_rag': ['success_rate','avg_latency','cache_hit_ratio'],
[0313] 'reasoning': ['step_accuracy','fallback_count','cross_check_success'],
[0314] 'system': ['routing_accuracy','threshold_adjustments']
[0315] }
[0316] For example, the methods for determining the types of exception modes are shown in Table 4:
[0317] Table 4: Abnormal Mode Type Judgment Table
[0318]
[0319] For example, the types of repair operations are shown in Table 5:
[0320] Table 5: Types of Repair Operations
[0321]
[0322] Preferably, since errors may occur during system operation, it is necessary to add judgment logic reinforcement measures to prevent misjudgment. For example, high-level anomalies (such as system circuit breakers) require rapid response with only one confirmation, while low-level anomalies (such as feature drift) require three consecutive confirmations to avoid misjudgment. At the same time, it is necessary to set independent time windows (2min~10s) for different anomaly types, detect continuous event sequences instead of simple counting, which can reduce false alarms by 60%, ensure the stability of handling level 1-2 anomalies, and avoid unnecessary resource allocation triggered by momentary jitter.
[0323] The specific steps can be implemented using the following code:
[0324] 1. False alarm prevention mechanism (prevents false alarms triggered by instantaneous fluctuations or noise, ensuring the reliability of anomaly detection)
[0325] class AnomalyConfirmer:
[0326] def __init__(self):
[0327] self.required_confirmations = {
[0328] "Feature drift": 3, # Requires 3 consecutive tests for confirmation.
[0329] "Performance degradation": 2,
[0330] "Engine failure": 1,
[0331] "System Circuit Breaker": 1
[0332] }
[0333] self.confirmation_windows = {
[0334] "Feature Drift": "2min", # Confirmation completed within 2 minutes
[0335] "Performance degradation": "90s",
[0336] "Engine failure": "30s",
[0337] "System circuit breaker": "10s"
[0338] }
[0339] def confirm_anomaly(self, anomaly_type, first_detection_time):
[0340] """
[0341] Core algorithm of continuous confirmation mechanism
[0342] """
[0343] # Get the number of consecutive occurrences of this anomaly within the current time window
[0344] count = self._get_continuous_count(
[0345] anomaly_type,
[0346] since=first_detection_time,
[0347] duration=self.confirmation_windows[anomaly_type] )
[0349] # Determine if the confirmation threshold has been reached
[0350] return count>= self.required_confirmations[anomaly_type]
[0351] def _get_continuous_count(self, anomaly_type, since, duration):
[0352] """Implementation of a sliding window counter"""
[0353] # Query the time series database to retrieve abnormal events within a specified time window
[0354] events = query_time_series_db(
[0355] metric="anomaly_events",
[0356] filters={"type": anomaly_type},
[0357] start=since,
[0358] end=since + parse_duration(duration) )
[0360] # Calculate the longest continuous sequence
[0361] max_streak = 0
[0362] current_streak = 0
[0363] prev_time = None
[0364] for event in sorted(events, key=lambda x: x["timestamp"]):
[0365] if prev_time is None or (event["timestamp"] - prev_time).seconds<= 5:
[0366] current_streak += 1
[0367] else:
[0368] current_streak = 1
[0369] max_streak = max(max_streak, current_streak)
[0370] prev_time = event["timestamp"]
[0371] return max_streak
[0372] Further optimization requires that the indicator continuously exceeds the threshold throughout the entire time window to avoid accidental escalation due to instantaneous peaks. Therefore, a predefined cross-level transition rule base is also needed, where each rule contains a triplet of trigger indicator, threshold, and duration to achieve an 85% accuracy rate in predicting abnormal deterioration, intercepting system crashes on average 3-5 minutes in advance, and realizing the transformation from passive response to proactive defense. Specific steps can be implemented using the following code:
[0373] Level 2 transition detection (automatically identifies abnormal deterioration trends and proactively upgrades the response level).
[0374] class AnomalyEscalator:
[0375] ESCALATION_RULES = {
[0376] # (Current Level, Deterioration Indicator, Threshold, Duration) → Target Level
[0377] ("Feature drift", "routing_accuracy", "<0.75", "3min"): "Performance degradation",
[0378] ("Feature Drift", "queue_ratio", ">0.7", "5min"): "Performance Degradation",
[0379] ("Performance degradation", "engine_accuracy", "<0.7", "2min"): "Engine failure",
[0380] ("Engine failure", "service_availability", "<0.6", "90s"): "System circuit breaker"
[0381] }
[0382] def check_escalation(self, current_level):
[0383] "Check if the anomaly level needs to be upgraded."
[0384] for (src_level, metric, condition, duration), target_level inself.ESCALATION_RULES.items():
[0385] if src_level != current_level:
[0386] continue
[0387] # Parse conditional expressions
[0388] value, comparator = self._parse_condition(condition)
[0389] # Does the detection indicator consistently exceed the threshold?
[0390] if self._metric_violation_persistent(
[0391] metric,
[0392] comparator
[0393] value
[0394] duration ):
[0396] return target_level
[0397] return None
[0398] def _metric_violation_persistent(self, metric, comparator, value,duration):
[0399] """Is the verification indicator consistently violating the threshold?"
[0400] # Get current time series data
[0401] data = get_metric_data(metric, duration)
[0402] # Check if all data points meet the conditions
[0403] For point in data:
[0404] if not self._compare(point.value, comparator, value):
[0405] return False
[0406] return True
[0407] def _parse_condition(self, condition):
[0408] """Parse conditional expressions such as '<0.75'"""
[0409] comparators = {
[0410] "<": operator.lt,
[0411] ">": operator.gt,
[0412] <=": operator.le,
[0413] ">=": operator.ge
[0414] }
[0415] for symbol, op in comparators.items():
[0416] if condition.startswith(symbol):
[0417] return float(condition[len(symbol):]), op
[0418] raise ValueError(f"Invalid condition: {condition}")
[0419] On the other hand, the present invention also provides a computer storage medium storing executable program code; the executable program code is used to execute any of the above-mentioned dual-path generation methods based on dynamic updates of weight coefficients.
[0420] On the other hand, the present invention also provides a computer system, including a memory and a processor; the memory stores program code that can be executed by the processor; the program code is used to execute any of the above-described dual-path generation methods based on dynamic updates of weight coefficients.
[0421] For example, the program code can be divided into one or more modules / units, which are stored in the memory and executed by the processor to complete the present invention. The one or more modules / units can be a series of computer program instruction segments capable of performing a specific function, which describe the execution process of the program code in the computer system.
[0422] The computer system may be a desktop computer, laptop, handheld computer, or cloud server, etc. The computer system may include, but is not limited to, a processor and memory. Those skilled in the art will understand that the computer system may also include input / output systems, network access systems, buses, etc.
[0423] The processor can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor can be a microprocessor or any conventional processor.
[0424] The memory can be an internal storage unit of a computer system, such as a hard disk or RAM. It can also be an external storage system of the computer system, such as a plug-in hard disk, Smart Media Card (SMC), Secure Digital (SD) card, or Flash Card. Furthermore, the memory can include both internal and external storage devices. The memory is used to store the program code and other programs and data required by the computer system. The memory can also be used to temporarily store data that has been output or will be output.
[0425] The aforementioned computer storage medium and computer system are created based on the aforementioned dual-path generation method based on dynamic updates of weight coefficients. Their technical functions and beneficial effects will not be elaborated here. The technical features of the above embodiments can be combined arbitrarily. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0426] The embodiments described above are merely illustrative of several implementations of the present invention, and while the descriptions are relatively specific and detailed, they should not be construed as limiting the scope of the invention patent. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these all fall within the protection scope of the present invention. Therefore, the protection scope of this invention patent should be determined by the appended claims.
Claims
1. A dual-path generation method based on dynamic updating of weight coefficients, characterized in that, include: Obtain the user's current query question and extract the feature parameters of the query question; The system acquires the characteristic indicators of each feature parameter, and / or the operational indicators of the retrieval thinking module and the reasoning thinking module, to dynamically determine the weight coefficients of each feature parameter; this includes: acquiring the volatility of each feature parameter within a set time period; determining the baseline weight of each feature parameter based on the volatility of each feature parameter and the current weight coefficient; acquiring the load balancing degree between modules to determine the additional weight of each feature parameter; and obtaining the updated weight coefficients of each feature parameter based on the baseline weight and the additional weight. The difficulty coefficient of the query question is determined based on each feature parameter and its corresponding weight coefficient; specifically, the difficulty coefficient of the query question is obtained by summing each feature parameter according to the weight coefficient. Based on the difficulty level, the system selectively invokes either the retrieval thinking module or the reasoning thinking module to output the query results. Specifically, it determines whether the difficulty level of each query question exceeds the difficulty threshold; if not, it invokes the retrieval thinking module, and if so, it invokes the reasoning thinking module.
2. The method according to claim 1, characterized in that, The steps for obtaining the user's current query question and extracting the feature parameters of the query question include: Several standard question templates are obtained, and each standard question template and query question are encoded into an embedding representation to obtain several template embedding vectors and query embedding vectors; then the cosine similarity between each template embedding vector and query embedding vector is obtained, and the variance of each cosine similarity is calculated to obtain the semantic dispersion. Extract the statement analysis information of the query question, and obtain the number of child nodes corresponding to each sentence in the query question based on the statement analysis information, and use the maximum number of child nodes corresponding to each sentence as the syntactic complexity. The segmentation query yields several target words, which are then matched with a set of technical terms to obtain technical terms. The number of technical terms and target words is counted, and the ratio between the two is taken as the domain specificity.
3. The method according to claim 1, characterized in that, The steps for determining the baseline weights of each feature parameter based on its volatility and current weight coefficients include: Sum the volatility of each characteristic parameter to obtain the total volatility; Based on the ratio between the volatility of the current characteristic parameter and the sum of volatility, the volatility factor of the current characteristic parameter is determined, so as to obtain the inverse volatility factor of the current characteristic parameter. The benchmark weight of the current feature parameter is obtained by multiplying the current weight coefficient of the current feature parameter with the anti-fluctuation factor.
4. The method according to claim 1, characterized in that, The steps of obtaining the load balancing degree between modules and determining the additional weights of each characteristic parameter also include: Obtain the current load of each module and determine the load difference between modules; Obtain the time taken by each module to output query results, and determine the latency difference between modules; The load balance is determined based on the load difference and the latency difference. Set the adjustment coefficient for each module, calculate the product of the adjustment coefficient and the load balance of each module, and obtain the additional weight of each characteristic parameter.
5. The method according to claim 1, characterized in that, The steps for obtaining the feature indicators of each feature parameter, and / or the operational indicators of the retrieval thinking module and the reasoning thinking module, and dynamically determining the weight coefficients of each feature parameter include: Obtain the validation results of each query result in the current period and the feature parameters of the corresponding query questions, and input them into the logistic regression classifier to obtain the regression coefficients of each feature parameter; The regression coefficients of each feature parameter are fused with the current weights according to a set ratio to obtain the updated weight coefficients of each feature parameter.
6. The method according to claim 5, characterized in that, The steps of fusing the regression coefficients of each feature parameter with the current weights according to a set ratio to obtain the updated weight coefficients of each feature parameter include: The historical retained weights are obtained by multiplying the current weights by the set smoothing coefficient. Normalize the regression coefficients to obtain the characteristic contribution of each characteristic parameter; The instantaneous contribution weight is obtained by multiplying the feature contribution degree by the anti-smoothing coefficient. Summing the historical retention weights and the current contribution weights yields the updated weight coefficients for each feature parameter.
7. The method according to claim 1, characterized in that, The method further includes: Obtain the routing accuracy of each query question within a certain time period, and determine whether it is less than the accuracy threshold. If not, leave it unchanged; if so, perform a rollback operation, extract several query questions for manual classification, and return to step S1.
8. A computer storage medium, characterized in that, It stores executable program code; the executable program code is used to execute the dual-path generation method based on dynamic updating of weight coefficients as described in any one of claims 1-7.
9. A computer system, characterized in that, It includes a memory and a processor; the memory stores program code that can be executed by the processor; the program code is used to execute the dual-path generation method based on dynamic updating of weight coefficients as described in any one of claims 1-7.