Intelligent query response method and system for multi-level cache

By using multi-level caching and weighted matching processes, the cache failure problem caused by semantic differences and parameter format fluctuations in the intelligent query system was solved, achieving efficient query response and low-cost system operation.

CN122240820APending Publication Date: 2026-06-19JIANGXI CUIXING INTELLIGENT TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
JIANGXI CUIXING INTELLIGENT TECH CO LTD
Filing Date
2026-05-12
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing intelligent query systems suffer from cache invalidation due to differences in semantic representation and fluctuations in parameter format, leading to repeated calls to large language models and execution tools for matching processes, increasing operational costs and response time.

Method used

A multi-level caching mechanism is adopted. Domain control semantic fingerprints and parameter fingerprints are generated by extracting domain features of query text. Layered caching is combined with a weighted tool matching process to optimize cache hit rate and tool matching efficiency.

🎯Benefits of technology

Significantly reduce repeated calls to large language models, lower system API call costs, improve query response speed, cache hit rate and tool matching efficiency, and reduce operating costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240820A_ABST
    Figure CN122240820A_ABST
Patent Text Reader

Abstract

This invention relates to the field of natural language processing technology, specifically to a multi-level caching intelligent query response method and system, comprising: extracting domain features of the query text and determining the business domain to which the query belongs, thereby generating a corresponding intelligent agent domain identifier; performing text cleaning and feature extraction processing on the query text in conjunction with the intelligent agent domain identifier to generate a fixed-length domain-controlled semantic fingerprint; retrieving structured parsing results in the first-level cache based on the domain-controlled semantic fingerprint, and, in the case of a miss, calling a large language model to convert the query text into structured data and synchronizing it to the first-level cache. This invention achieves layered reuse of semantic parsing results and tool matching results through a two-level hierarchical caching mechanism combined with the generation mechanism of domain-controlled semantic fingerprints and domain-controlled parameter fingerprints, solving the cache failure problem caused by semantically similar expressions with different meanings or fluctuations in parameter formats in existing technologies.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of natural language processing technology, specifically to a multi-level caching intelligent query response method and system. Background Technology

[0002] With the rapid development of large-scale language model technology, enterprise-level intelligent assistants based on natural language interaction are being widely used. These systems can convert users' natural language queries into machine-executable instructions, automatically calling databases, business interfaces, and other tools to obtain results. This significantly lowers the barrier for enterprise personnel to obtain business data such as production, warehousing, and quality inspection, and improves daily operational efficiency.

[0003] Existing intelligent query systems based on large language models generally employ caching mechanisms to reduce redundant model calls. However, existing caches mostly rely on direct matching based on the original query string or complete parsed results. When user queries are semantically similar but have subtle differences in expression, or when the parsed results returned by the large language model show minor fluctuations in parameter order or field format, the cache becomes invalid. This results in a large number of semantically equivalent queries still needing to repeatedly call the large language model and execution tool matching process, increasing both the system's operational costs and the response time for high-frequency similar queries. Summary of the Invention

[0004] To address the shortcomings of existing technologies, this invention provides a multi-level caching intelligent query response method and system, which solves the problem that existing intelligent query systems are prone to cache failure due to differences in semantic representation and fluctuations in parameter format, leading to repeated calls to large models and execution tools for matching, resulting in high operating costs and long response times.

[0005] To achieve the above objectives, the present invention provides the following technical solution: a multi-level caching intelligent query response method, comprising: Extract the domain features of the query text and determine the business domain to which the query belongs, thereby generating the corresponding intelligent agent domain identifier; The query text is cleaned and its features are extracted by combining the intelligent agent domain identifier to generate a fixed-length domain control semantic fingerprint. Based on the domain controller semantic fingerprint, the structured parsing result is retrieved in the first-level cache. If no match is found, a large language model is invoked to convert the query text into structured data and synchronize it to the first-level cache. The structured data is formatted and features are extracted by combining the intelligent agent domain identifier to generate domain control parameter fingerprints; Based on the domain control parameter fingerprint, the tool matching results are retrieved in the second-level cache. If no match is found, the domain keyword weight is increased and matching is performed within the tool subset corresponding to the intelligent agent domain identifier. If the matching fails to meet the target, the matching is expanded to the entire tool set. Based on the obtained tool matching results, call the business tools and use the business tools to obtain business data.

[0006] Furthermore, the process of generating a fixed-length domain control semantic fingerprint includes: Use text cleaning rules to remove punctuation marks and meaningless particles from the query text; Extract the semantic vector of the processed query text under the business domain; The semantic vector is fused and mapped with the intelligent agent domain identifier to generate a fixed-length code that uniquely represents the semantic features within the business domain.

[0007] Furthermore, the process of generating domain controller parameter fingerprints includes: The parameter fields in the structured data are rearranged according to a preset lexicographical order; The numerical format of the sorted parameter fields is standardized and redundant information is removed. The normalized parameter fields are hashed with the intelligent agent domain identifier to generate a unique fingerprint that maps to the matching result of the tool.

[0008] Furthermore, the tool matching process includes: Within the subset of tools bound to the intelligent agent domain identifier, candidate tools are weighted based on domain attributes, and a matching score between the query requirement and the tool function description is calculated. When the matching score is lower than a preset threshold, the search scope is expanded to the entire toolset, and the domain attribute weight is reduced while the semantic similarity weight is increased for secondary matching.

[0009] Furthermore, the method also includes a cache-linked refresh step: When an update to business data is detected, the intelligent agent domain identifier that is associated with the updated business data is identified. Based on the intelligent agent domain identifier, locate and invalidate the corresponding parameter cache entry in the second-level cache; Keep the corresponding query cache entry in the first-level cache in a valid state.

[0010] Furthermore, the method also includes a fingerprint iterative update step: Collect query texts that did not hit the cache within a preset period and the corresponding tool matching results; The semantic feature extraction model and parameter normalization rules are incrementally adjusted based on the collected data. Update the fingerprint generation logic to improve fingerprint consistency across different representations of the same semantics.

[0011] Furthermore, the step of calling a large language model to convert the query text into structured data includes: Send the query text to the pre-defined large language model interface; Retrieve the returned raw message containing operation instructions and parameter information; The original message is parsed to extract structured data that conforms to a preset protocol format.

[0012] Furthermore, if a hit occurs in the first-level cache, the step of calling the large language model is skipped, and the retrieved structured parsing result is directly used as the input data for generating the domain controller parameter fingerprint.

[0013] Furthermore, the step of invoking business tools based on the obtained tool matching results includes: Locate the corresponding plugin service based on the plugin identifier in the tool's matching results; Construct a list of call parameters that meets the requirements of the plugin service; Execute the plugin call and retrieve the returned production data, warehousing data, and quality inspection data.

[0014] This invention also provides a multi-level caching intelligent query response system, comprising: The identification module is used to extract the domain features of the query text and determine the business domain to which the query belongs, thereby generating the corresponding intelligent agent domain identifier; The semantic fingerprint module is used to perform text cleaning and feature extraction on the query text in conjunction with the intelligent agent domain identifier, and generate a fixed-length domain control semantic fingerprint. The first-level cache module is used to retrieve the structured parsing results based on the domain controller semantic fingerprint. If no match is found, it triggers a large language model to convert the query text into structured data and synchronize it to the first-level cache module. The parameter fingerprint module is used to perform format regularization and feature extraction on the structured data in conjunction with the intelligent agent domain identifier to generate domain control parameter fingerprints; The second-level caching module is used to perform matching based on the matching results of the domain control parameter fingerprint retrieval tool. If no match is found, the module increases the weight of the domain keyword in the tool subset corresponding to the intelligent agent domain identifier and performs matching. If the matching fails to meet the target, the module expands to the full tool set for matching. The calling module is used to call business tools based on the obtained tool matching results, and to use the business tools to obtain business data.

[0015] Compared with the prior art, the beneficial effects of the present invention are as follows: This invention achieves layered reuse of semantic parsing results and tool matching results by combining a two-level hierarchical caching mechanism with the generation mechanism of domain-controlled semantic fingerprints and domain-controlled parameter fingerprints. This solves the cache failure problem caused by different expressions of similar semantics or fluctuations in parameter format in existing technologies. Domain-controlled semantic fingerprints integrate business domain identifiers and semantic vectors, ensuring that queries with similar semantics within the same business domain generate the same fingerprint. This significantly reduces repeated calls to large language models, lowers system API call costs, and shortens the response time of semantic parsing from hundreds of milliseconds to several milliseconds. Domain-controlled parameter fingerprints eliminate cache failures caused by fluctuations in parameter order and format in the results returned by large language models through parameter field sorting, numerical format normalization, and redundant information removal, further improving the hit rate of the second-level cache. The phased weighted tool matching process prioritizes precise matching within the tool subset of the corresponding business domain. If the matching fails, it is then expanded to the entire tool set, balancing the efficiency and accuracy of tool matching. The cache-linked refresh mechanism only invalidates the corresponding entries in the second-level cache when business data is updated, while retaining the semantic parsing results of the first-level cache. Under the premise of ensuring business data consistency, it maximizes the effectiveness of the cache, continuously reduces system operating costs, and improves the overall response speed of high-frequency similar queries. Attached Figure Description

[0016] Figure 1 This is a flowchart illustrating the overall process of multi-level caching intelligent query response in this invention. Figure 2 This is a flowchart of the domain control semantic fingerprint generation process of the present invention; Figure 3 This is a flowchart of the domain control parameter fingerprint generation process of the present invention; Figure 4 This is a flowchart of the phased tool matching process of the present invention; Figure 5 This is a flowchart of the cache linkage refresh process of the present invention; Figure 6 This is a flowchart of the fingerprint iterative update process of the present invention; Figure 7 This is a diagram of the multi-level caching intelligent query response system architecture of the present invention. Detailed Implementation

[0017] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0018] Please see Figures 1-6This invention provides a multi-level caching intelligent query response method, comprising: extracting domain features of the query text and determining the business domain to which the query belongs, thereby generating a corresponding intelligent agent domain identifier; performing text cleaning and feature extraction processing on the query text in conjunction with the intelligent agent domain identifier to generate a fixed-length domain control semantic fingerprint; retrieving structured parsing results in the first-level cache based on the domain control semantic fingerprint, and if no match is found, calling a large language model to convert the query text into structured data and synchronizing it to the first-level cache; performing format regularization and feature extraction on the structured data in conjunction with the intelligent agent domain identifier to generate a domain control parameter fingerprint; retrieving tool matching results in the second-level cache based on the domain control parameter fingerprint, and if no match is found, increasing the weight of domain keywords within the tool subset corresponding to the intelligent agent domain identifier to perform matching, and expanding to the full tool set to perform matching if the matching fails; calling business tools based on the obtained tool matching results and using the business tools to obtain business data.

[0019] Specifically, the process begins with determining the business domain of the query text. A pre-trained BERT-based domain classification model is used to extract domain keywords and contextual semantic features from the query text. These features are then input into a linear classifier to determine the business domain to which the query belongs. Business domains can include production management, warehousing and logistics, quality inspection, and equipment maintenance, etc. Each business domain corresponds to a unique agent domain identifier, which is an 8-digit hexadecimal string. Next, the query text is preprocessed to remove irrelevant characters and extract core semantic features, generating a domain control semantic fingerprint bound to the business domain. The first-level cache uses a Redis key-value pair storage structure, where the key is the domain control semantic fingerprint and the value is the corresponding structured parsing result. The default validity period for cache entries is 30 days. If a cache miss occurs, a large language model is invoked to convert natural language to structured data, and the conversion result is written to the first-level cache. Subsequently, a domain control parameter fingerprint is generated based on the structured data, and the tool matching result is retrieved from the second-level cache. The second-level cache also uses a Redis key-value pair structure, where the key is the domain control parameter fingerprint and the value is the tool matching result. If a cache miss occurs, a weighted match is first performed on the tool subset corresponding to the business domain. If the match still fails, the entire tool set is expanded. Finally, the corresponding business tool is invoked based on the matching result to obtain the required business data. This process, through a two-level caching design, addresses the issues of redundant semantic parsing and redundant tool matching, effectively reducing the computational overhead of large language model invocations and tool matching, and improving query response speed.

[0020] In a specific embodiment, the process of generating a fixed-length domain control semantic fingerprint includes: removing punctuation marks and meaningless auxiliary words from the query text using text cleaning rules; extracting the semantic vector of the processed query text in the business domain; and performing a fusion mapping of the semantic vector with the intelligent agent domain identifier to generate a fixed-length code that uniquely represents the semantic features within the business domain.

[0021] Specifically, the text cleaning rules are predefined and include removing punctuation marks such as commas, periods, question marks, exclamation marks, etc., as well as meaningless auxiliary words such as "de", "le", "ma", "ne", etc. The extraction of the semantic vector uses a BERT-base model fine-tuned for a specific business domain, and the output dimension of the model is 768 dimensions. Based on the vector hashing fusion method, a domain identifier weighting mechanism is introduced to perform a fusion mapping of the semantic vector with the intelligent agent domain identifier to generate a unique domain control semantic fingerprint. Traditional vector hashing only encodes the semantic vector and cannot distinguish queries with the same semantics but different business meanings in different business domains. In this application, by introducing a domain identifier weighting term, the same semantics in different business domains generate different fingerprints, while ensuring that queries with similar semantics within the same business domain generate the same fingerprint. The specific formula is as follows:

[0022] Where, is the generated domain control semantic fingerprint, H is the SHA-256 hash function, V is the extracted 768-dimensional semantic vector, D is the 8-bit one-hot encoded vector of the intelligent agent domain identifier, is the semantic vector weight coefficient, is the domain identifier weight coefficient. The weight coefficients are determined by the 5-fold cross-validation method. The historical query data sets of each business domain are divided into a training set and a validation set in a ratio of 8:2. The training set is further evenly divided into 5 subsets. Four subsets are sequentially selected to train the weight parameters, and the remaining 1 subset is used to verify the fingerprint consistency index. Taking the ratio of queries with the same semantics but different expressions generating the same fingerprint as the core evaluation index, the weight combination with the highest index on the validation set is selected as the final weight for this business domain. In the production management domain takes a value of 0.8, takes a value of 0.2; in the warehousing and logistics domain takes a value of 0.75, takes a value of 0.25; in the quality inspection domain takes a value of 0.85, takes a value of 0.15; in the equipment operation and maintenance domain takes a value of 0.7, takes a value of 0.3.

[0023] Taking the query "Query production output in March 2026" as an example, the calculation process is as follows: First, the query text is cleaned to obtain "Query production output in March 2026"; then, the semantic vector V is extracted using a fine-tuned BERT model, with the first 3 dimensions simplified as [0.23, 0.56, 0.19]; the agent domain identifier for the production management domain is "00000001", and the first 3 dimensions of the corresponding one-hot encoded vector D are [1, 0, 0]; substituting the weight coefficients, the fusion vector is calculated as: 0.8 × [0.23, 0.56, 0.19] + 0.2 × [1, 0, 0] = [0.384, 0.448, 0.152]; the complete 768-dimensional fusion vector is input into the SHA-256 hash function, outputting a 256-bit binary code, which is then converted into a 64-bit hexadecimal string as the final domain controller semantic fingerprint. This fusion method can effectively improve the cache hit rate and reduce repeated semantic parsing operations.

[0024] In a specific embodiment, the process of generating a domain control parameter fingerprint includes: rearranging the parameter fields in the structured data according to a preset lexicographical order; performing numerical format normalization and redundant information removal on the rearranged parameter fields; and performing a hash operation on the normalized parameter fields and the intelligent agent domain identifier to generate a unique fingerprint that maps to the tool matching result.

[0025] Specifically, the default lexicographical order is arranged according to the first letter of the parameter field names in ASCII order, ensuring that structured data with different parameter orders but identical content can generate the same fingerprint. Numerical format normalization includes standardizing date format to YYYY-MM-DD, numerical precision to two decimal places, Boolean values ​​to lowercase true or false, and time format to HH:MM:SS. Redundant information removal includes removing comment fields, null fields, and duplicate fields from the structured data that are irrelevant to tool calls. The normalized parameter fields are converted into compressed JSON strings, concatenated with the agent domain identifier string, and then hashed using the SHA-256 function to generate a 256-bit binary code, which is then converted into a 64-bit hexadecimal string as the domain controller parameter fingerprint. This process eliminates cache invalidation issues caused by fluctuations in parameter order and format in the results returned by large language models, ensuring that semantically equivalent parameter combinations generate the same fingerprint, further improving the hit rate of the second-level cache.

[0026] In a specific embodiment, the tool matching process includes: within a subset of tools bound to the intelligent agent domain identifier, weighting candidate tools based on domain attributes, and calculating a matching score between the query requirement and the tool function description; when the matching score is lower than a preset threshold, expanding the search scope to the entire tool set, reducing the weight of domain attributes and increasing the weight of semantic similarity for secondary matching.

[0027] Specifically, within the cosine similarity calculation framework, a domain attribute weighting factor is introduced to construct a phased tool matching score calculation method. Traditional pure semantic similarity matching, in multi-business domain scenarios, easily matches tools that are semantically similar but business-irrelevant. This application adjusts the weights of domain attributes and semantic similarity in stages, prioritizing accurate matching within a subset of tools relevant to the business domain, while ensuring that a suitable tool can be found among the entire set of tools when no matching result is found within the subset. The specific formula is as follows:

[0028] Where S is the final matching score. Scoring based on domain attribute matching. The semantic similarity score is calculated using α as the domain attribute weight coefficient and β as the semantic similarity weight coefficient. The weight coefficients are determined through tool matching accuracy tests across different business domains. In the tool subset matching phase, α is set to 0.7 and β to 0.3; in the full tool matching phase, α is set to 0.3 and β to 0.7. The domain attribute matching score is calculated by statistically analyzing the overlap between domain keywords in the query text and the tool function description. The semantic similarity score is calculated by calculating the cosine similarity between the query demand vector and the tool function description vector. The preset thresholds are determined by statistically analyzing at least 100,000 historical tool matching data entries accumulated over the past three months across each business domain. The statistical method involves calculating the lowest score among all correct matches, rounding it up to one decimal place, and using this as the threshold for that business domain. The preset thresholds for the production management domain are 0.6, for the warehousing and logistics domain 0.55, for the quality inspection domain 0.65, and for the equipment operation and maintenance domain 0.58.

[0029] Taking the query "Query production output in March 2026" in the production management domain as an example, the calculation process for matching "production output statistics tool" is as follows: First, extract the domain keywords "production" and "output" from the query text. The domain keywords "production," "output," and "statistics" in the tool's function description have an overlap of 2 / 3. Therefore... Calculate the cosine similarity between the query demand vector and the tool function description vector to obtain... Substituting the weight coefficients from the tool subset matching stage, the total score is calculated as: 0.7 × 0.67 + 0.3 × 0.82 = 0.715. This score is higher than the preset threshold of 0.6, therefore the tool is directly matched without needing to expand to the full tool set. This weighted matching mechanism balances matching efficiency and accuracy, reducing unnecessary full tool search operations.

[0030] In one specific embodiment, the method further includes a cache linkage refresh step: when an update to business data is detected, the intelligent agent domain identifier that is associated with the updated business data is identified; the corresponding parameter cache entry in the second-level cache is located and invalidated according to the intelligent agent domain identifier; and the corresponding query cache entry in the first-level cache is kept in a valid state.

[0031] Specifically, monitoring of business data updates is achieved through MySQL database triggers and RabbitMQ message queues. When production, warehousing, or quality inspection data in the business database is inserted, updated, or deleted, the trigger generates an update event and sends it to the message queue. The system consumes update events from the message queue and identifies the corresponding agent domain identifier based on the table name and field information of the business data. The mapping relationship between metadata information and agent domain identifiers is pre-stored in the system configuration table. Based on the identified agent domain identifier, all parameter cache entries containing that identifier in the second-level cache are traversed and marked as invalid. Query cache entries in the first-level cache remain valid because the semantic parsing results of queries do not change with updates to business data. This cache-linked refresh mechanism ensures that after business data updates, tool calls can obtain the latest business data while avoiding repeated parsing of queries with the same semantics, maximizing cache effectiveness while ensuring data consistency.

[0032] In one specific embodiment, the method further includes a fingerprint iterative update step: collecting query texts that did not hit the cache within a preset period and the corresponding tool matching results; incrementally adjusting the semantic feature extraction model and parameter normalization rules based on the collected data; and updating the fingerprint generation logic to improve the fingerprint consistency of the same semantics under different representations.

[0033] Specifically, the preset cycle can be set according to the system's operation, such as performing an iterative update once a week. The collected cache miss data includes query text, generated domain controller semantic fingerprints, domain controller parameter fingerprints, and the final tool matching results. For the semantic feature extraction model, incremental fine-tuning is used, employing the collected query text as training data and aiming to generate the same fingerprint for queries with the same semantics. The parameters of the last two fully connected network layers of the model are then fine-tuned. For parameter normalization rules, the structured data of the cache misses is analyzed, newly emerging parameter field formats and redundant information types are identified, and corresponding format normalization rules and redundant information removal rules are added.

[0034] The system automatically compares the structured data that misses the cache with the existing normalization rule base. When a parameter field of a certain format appears more than 50 times in a week and is not covered by existing rules, the system automatically generates the corresponding format normalization rule. When a redundant field appears more than 30 times in a week and is not removed by existing rules, the system automatically adds it to the redundant information removal list.

[0035] After the update is complete, the new model parameters and rules are deployed to the system, and subsequent fingerprint generation will use the updated logic. This iterative update mechanism can continuously optimize the accuracy of fingerprint generation. As the system runtime increases, the cache hit rate of queries with the same semantics but different expressions will gradually increase, further reducing the system's operating costs.

[0036] In one specific embodiment, the step of calling a large language model to convert query text into structured data includes: sending the query text to a preset large language model interface; obtaining the returned raw message containing operation instructions and parameter information; and parsing the raw message to extract structured data conforming to a preset protocol format.

[0037] The pre-defined large language model interface uses a RESTful API format. Request messages include query text, agent domain identifiers, and system prompts. These pre-defined prompts guide the large language model to return results in a specified JSON format. The original message is in JSON format, containing an `operation` field and a `parameters` field. The `operation` field specifies the type of operation to be performed, such as `query`, `statistics`, or `summary`. The `parameters` field contains all the parameters required for the operation, such as `time_range`, `product_type`, and `department_name`. The parsing process first verifies the format validity of the original message, then extracts the `operation` and `parameters` information, converting it into structured data conforming to the system's internal pre-defined protocol format. This pre-defined protocol format includes three core fields: operation type, parameter name, and parameter value, ensuring consistent processing for subsequent parameter fingerprint generation and tool invocation.

[0038] In one specific embodiment, if a hit occurs in the first-level cache, the step of calling the large language model is skipped, and the retrieved structured parsing result is directly used as the input data for generating the domain controller parameter fingerprint.

[0039] Specifically, the first-level cache hit determination is based on a complete match of the domain controller's semantic fingerprint. A cache hit occurs when a key in the cache is identical to the domain controller's semantic fingerprint generated by the current query. Cache entries have an expiration date, which can be set according to business needs, such as 30 days. When a cache entry expires, the system automatically removes it from the cache. After a cache hit, the system directly retrieves the corresponding structured parsing result, eliminating the need to call a large language model for semantic parsing. This step significantly reduces the number of calls to the large language model, lowers the system's API call cost, and shortens the semantic parsing response time from hundreds of milliseconds to several milliseconds, improving the response speed for high-frequency similar queries.

[0040] To quantify the effects of the above technologies, three sets of control experiments were conducted in a standard production environment. The experimental hardware configuration consisted of two Intel Xeon Gold 6248R processors, 256GB of DDR4 memory, and a 1TB NVMe SSD. The software environment consisted of CentOS 7.9, Redis 6.2.7, and Python 3.9.

[0041] The experiment was set up with three control groups: Option A is a cache-free mechanism, where all queries call a large language model and perform full tool matching; Option B is a traditional string caching mechanism that performs exact matching based on the original query text; Scheme C is the two-level domain controller fingerprint caching mechanism proposed in this invention.

[0042] The experiment used 100,000 high-frequency query samples from real enterprises, covering four major business domains: production management, warehousing and logistics, quality inspection, and equipment operation and maintenance. Among them, 68% of the queries were semantically equivalent but had different expressions, and 42% of the queries had fluctuating parameter formats.

[0043] The experimental results show that: The average response time of the large language model in Scheme A is 320 milliseconds, the response time of P95 is 450 milliseconds, and the response time of P99 is 680 milliseconds. Solution B has an average response time of 2 milliseconds for the first-level cache, but the overall cache hit rate is only 21%, the number of large language model calls is reduced by 19%, and the overall system API call cost is reduced by 15%. Scheme C has an average response time of 3 milliseconds for the first-level cache, 5 milliseconds for P95, and 8 milliseconds for P99, which is about 106 times faster than Scheme A.

[0044] The verification data for each business domain is as follows: The first-level cache hit rate was 85% in the production management domain, 81% in the warehousing and logistics domain, 78% in the quality inspection domain, and 80% in the equipment operation and maintenance domain. The overall first-level cache hit rate reached 82%, and the number of large language model calls was reduced by 78%.

[0045] The overall hit rate of the second-level cache reached 76%, and the average time for tool matching was reduced from 120 milliseconds to 2 milliseconds, improving tool matching efficiency by 60 times. The overall API call cost of the system was reduced by 65%, and the average response time across the entire chain was reduced from 480 milliseconds to 28 milliseconds, an improvement of approximately 17 times.

[0046] After running continuously for 3 months, the overall cache hit rate was further improved to 89% through the fingerprint iteration update mechanism, and the API call cost was reduced by 72% cumulatively.

[0047] In one specific embodiment, the step of calling the business tool based on the obtained tool matching result includes: locating the corresponding plugin service based on the plugin identifier in the tool matching result; constructing a list of calling parameters that meets the requirements of the plugin service; executing the plugin call and obtaining the returned production data, warehousing data, and quality inspection data.

[0048] Specifically, the tool matching result contains a unique plugin identifier in UUID format. The system retrieves the corresponding plugin service address and API documentation from the Nacos plugin registry based on the plugin identifier. The plugin service is deployed using Docker containers and orchestrated and managed via Kubernetes. The parameter list is constructed based on the plugin's OpenAPI interface documentation, mapping parameters in structured data to the parameter format required by the plugin interface. Parameter mapping rules are pre-stored in the plugin configuration table, supporting parameter format adaptation for different plugins. Plugin calls are executed using HTTP POST requests, sending the constructed parameter list as the request body to the plugin service address. The plugin service executes the corresponding business logic, queries the business database, and returns the results. The returned results are in JSON format, containing specific content of production data, warehousing data, or quality inspection data. After obtaining the returned results, the system transcribes them into natural language format and returns them to the user.

[0049] Please see Figure 7This invention also provides a multi-level caching intelligent query response system, comprising: an identification module for extracting domain features of query text and determining the business domain to which the query belongs, thereby generating a corresponding intelligent agent domain identifier; a semantic fingerprint module for performing text cleaning and feature extraction processing on the query text in conjunction with the intelligent agent domain identifier, generating a fixed-length domain control semantic fingerprint; a first-level caching module for retrieving structured parsing results based on the domain control semantic fingerprint, and, in the case of a no-match, triggering a large language model to convert the query text into structured data and synchronize it to the first-level caching module; a parameter fingerprint module for performing format regularization and feature extraction on the structured data in conjunction with the intelligent agent domain identifier, generating a domain control parameter fingerprint; a second-level caching module for retrieving tool matching results based on the domain control parameter fingerprint, and, in the case of a no-match, increasing the weight of domain keywords within the tool subset corresponding to the intelligent agent domain identifier to perform matching, and expanding to the full tool set to perform matching if the matching fails; and a calling module for calling business tools based on the obtained tool matching results and using the business tools to obtain business data.

[0050] Specifically, the identification module and the semantic fingerprint module communicate via an HTTP 1.1 interface, with the physical connection using Gigabit Ethernet TCP / IP protocol. The semantic fingerprint module communicates with the first-level cache module via the Redis 6.0 protocol, and the first-level cache module is deployed on a separate Redis server. The first-level cache module communicates with the parameter fingerprint module via a gRPC 1.5 interface, and the parameter fingerprint module communicates with the second-level cache module via the Redis 6.0 protocol as well. The second-level cache module and the first-level cache module are deployed in different databases within the same Redis cluster. The second-level cache module communicates with the calling module via an HTTP 1.1 interface, and the calling module communicates with the business tools via an HTTP 1.1 interface. The data flow is as follows: the query text is first input into the identification module; after generating the intelligent agent domain identifier, the identification module transmits the query text and the intelligent agent domain identifier together to the semantic fingerprint module. After generating the domain controller semantic fingerprint, the semantic fingerprint module sends it to the first-level cache module. The first-level cache module retrieves the cache. If a match is found, it returns the structured parsing result to the parameter fingerprint module. If no match is found, it triggers a large language model call, retrieves the structured data, synchronizes it to the first-level cache module, and then transmits the structured data to the parameter fingerprint module. The parameter fingerprint module generates a domain controller parameter fingerprint and sends it to the second-level cache module. The second-level cache module retrieves the cache. If a match is found, it returns the tool matching result to the calling module. If no match is found, it performs tool matching, retrieves the matching result, synchronizes it to the second-level cache module, and then transmits the matching result to the calling module. The calling module calls the business tool based on the matching result, retrieves the business data, and returns it to the user. During system operation, the hierarchical design of the two-level cache and the domain controller fingerprint generation mechanism enable efficient reuse of semantic parsing results and tool matching results. This solves the cache invalidation problem caused by semantically similar expressions with different meanings or parameter format fluctuations in existing technologies, reducing system operating costs and improving query response speed.

[0051] In summary, this invention achieves layered reuse of semantic parsing results and tool matching results by combining a two-level hierarchical caching mechanism with the generation of domain control semantic fingerprints and domain control parameter fingerprints. This solves the cache failure problem caused by different semantic expressions or parameter format fluctuations in existing technologies. Domain control semantic fingerprints integrate business domain identifiers and semantic vectors, enabling queries with similar semantics within the same business domain to generate the same fingerprint. This significantly reduces repeated calls to large language models, lowers system API call costs, and shortens the semantic parsing response time from hundreds of milliseconds to several milliseconds. Domain control parameter fingerprints, through parameter field sorting, numerical format normalization, and redundant information removal, eliminate cache failures caused by fluctuations in parameter order and format in the results returned by large language models, further improving the hit rate of the second-level cache. The phased weighted tool matching process prioritizes precise matching within the tool subset of the corresponding business domain. If the matching fails, it is then expanded to the full tool set, balancing the efficiency and accuracy of tool matching. The cache-linked refresh mechanism only invalidates the corresponding entries in the second-level cache when business data is updated, while retaining the semantic parsing results of the first-level cache. Under the premise of ensuring business data consistency, it maximizes the effectiveness of the cache, continuously reduces system operating costs, and improves the overall response speed of high-frequency similar queries.

[0052] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.

[0053] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

Claims

1. A multi-level caching intelligent query response method, characterized in that, include: Extract the domain features of the query text and determine the business domain to which the query belongs, thereby generating the corresponding intelligent agent domain identifier; The query text is cleaned and its features are extracted by combining the intelligent agent domain identifier to generate a fixed-length domain control semantic fingerprint. Based on the domain controller semantic fingerprint, the structured parsing result is retrieved in the first-level cache. If no match is found, a large language model is invoked to convert the query text into structured data and synchronize it to the first-level cache. The structured data is formatted and features are extracted by combining the intelligent agent domain identifier to generate domain control parameter fingerprints; Based on the domain control parameter fingerprint, the tool matching results are retrieved in the second-level cache. If no match is found, the domain keyword weight is increased and matching is performed within the tool subset corresponding to the intelligent agent domain identifier. If the matching fails to meet the target, the matching is expanded to the entire tool set. Based on the obtained tool matching results, call the business tools and use the business tools to obtain business data.

2. The method for intelligent query response of multi-level cache according to claim 1, characterized in that, The process of generating a fixed-length domain controller semantic fingerprint includes: Use text cleaning rules to remove punctuation marks and meaningless particles from the query text; Extract the semantic vector of the processed query text under the business domain; The semantic vector is fused and mapped with the intelligent agent domain identifier to generate a fixed-length code that uniquely represents the semantic features within the business domain.

3. The method of claim 1, wherein, The process of generating domain controller parameter fingerprints includes: The parameter fields in the structured data are rearranged according to a preset lexicographical order; The numerical format of the sorted parameter fields is standardized and redundant information is removed. The normalized parameter fields are hashed with the intelligent agent domain identifier to generate a unique fingerprint that maps to the matching result of the tool.

4. The method for intelligent query response of multi-level cache according to claim 1, characterized in that, The tool matching process includes: Within the subset of tools bound to the intelligent agent domain identifier, candidate tools are weighted based on domain attributes, and a matching score between the query requirement and the tool function description is calculated. When the matching score is lower than a preset threshold, the search scope is expanded to the entire toolset, and the domain attribute weight is reduced while the semantic similarity weight is increased for secondary matching.

5. The intelligent query response method with multi-level caching according to claim 1, characterized in that, The method also includes a cache-linked refresh step: When an update to business data is detected, the intelligent agent domain identifier that is associated with the updated business data is identified. Based on the intelligent agent domain identifier, locate and invalidate the corresponding parameter cache entry in the second-level cache; Keep the corresponding query cache entry in the first-level cache in a valid state.

6. The intelligent query response method with multi-level caching according to claim 1, characterized in that, The method also includes a fingerprint iterative update step: Collect query texts that did not hit the cache within a preset period and the corresponding tool matching results; The semantic feature extraction model and parameter normalization rules are incrementally adjusted based on the collected data. Update the fingerprint generation logic to improve fingerprint consistency across different representations of the same semantics.

7. The intelligent query response method with multi-level caching according to claim 1, characterized in that, The process of calling a large language model to convert the query text into structured data includes: Send the query text to the pre-defined large language model interface; Retrieve the returned raw message containing operation instructions and parameter information; The original message is parsed to extract structured data that conforms to a preset protocol format.

8. The intelligent query response method with multi-level caching according to claim 1, characterized in that, If a hit occurs in the first-level cache, the step of calling the large language model is skipped, and the retrieved structured parsing result is directly used as the input data for generating the domain controller parameter fingerprint.

9. The intelligent query response method with multi-level caching according to claim 1, characterized in that, The step of invoking business tools based on the obtained tool matching results includes: Locate the corresponding plugin service based on the plugin identifier in the tool's matching results; Construct a list of call parameters that meets the requirements of the plugin service; Execute the plugin call and retrieve the returned production data, warehousing data, and quality inspection data.

10. A multi-level cache intelligent query response system, used to execute the multi-level cache intelligent query response method according to any one of claims 1 to 9, characterized in that, include: The identification module is used to extract the domain features of the query text and determine the business domain to which the query belongs, thereby generating the corresponding intelligent agent domain identifier; The semantic fingerprint module is used to perform text cleaning and feature extraction on the query text in conjunction with the intelligent agent domain identifier to generate a fixed-length domain control semantic fingerprint. The first-level cache module is used to retrieve the structured parsing results based on the domain controller semantic fingerprint. If no match is found, it triggers a large language model to convert the query text into structured data and synchronize it to the first-level cache module. The parameter fingerprint module is used to perform format regularization and feature extraction on the structured data in conjunction with the intelligent agent domain identifier to generate domain control parameter fingerprints; The second-level caching module is used to perform matching based on the matching results of the domain control parameter fingerprint retrieval tool. If no match is found, the module increases the weight of the domain keyword in the tool subset corresponding to the intelligent agent domain identifier and performs matching. If the matching fails to meet the target, the module expands to the full tool set for matching. The calling module is used to call business tools based on the obtained tool matching results, and to use the business tools to obtain business data.