Intelligent question and index recommendation system and method based on large language model
By combining an enterprise knowledge base with a large language model, we have achieved accurate identification and controlled querying of users' natural language queries, solving the problems of low query efficiency and insufficient security in existing systems, and improving the flexibility and accuracy of data analysis systems.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- QINGDAO XINSHENGHUI TECH CO LTD
- Filing Date
- 2026-04-07
- Publication Date
- 2026-06-30
AI Technical Summary
Existing data analysis systems are unable to meet the needs of non-technical personnel to quickly query enterprise data, lack flexible support for temporary and diversified queries, and existing natural language query solutions are prone to inconsistencies or insufficient security in complex enterprise environments.
Build an enterprise knowledge base, use the query understanding module to identify the intent and extract entities from the natural language questions entered by users, use the prompt word engineering engine to generate optimized prompt words, combine with a large language model to conduct controlled queries, and use the intelligent recommendation module to proactively recommend relevant indicators or analysis paths.
It achieves accurate recognition of users' natural language query intent, reduces the risk of query errors, improves the system's controllability and stability in complex enterprise data environments, and enhances query efficiency and knowledge reuse capabilities.
Smart Images

Figure CN122309541A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of information technology, and specifically relates to an intelligent question and index recommendation system and method based on a large language model. Background Technology
[0002] In enterprise data analysis and business decision-making processes, business personnel and managers often need to frequently query various business indicator data. However, existing data analysis systems mostly rely on business intelligence tools or predefined reports, requiring users to master complex operation methods or possess certain data analysis capabilities, making it difficult for non-technical personnel to directly obtain the required data. Furthermore, even if queries can be completed, they typically involve repeatedly filtering through multi-level menus and numerous reports, resulting in low query efficiency and failing to meet the needs of rapid business responses.
[0003] Furthermore, traditional reports and dashboards are mostly configured based on fixed business scenarios, lacking flexible support for temporary and diverse query needs. Internal enterprise indicator systems, business terminology, and common analytical conclusions are often stored in a scattered manner, failing to form a unified knowledge carrier, making knowledge difficult to reuse and analysis results reliant on personal experience. Existing systems primarily respond passively to queries, lacking the ability to proactively recommend indicators based on user roles and business scenarios.
[0004] Existing natural language query solutions either rely on fixed templates, lacking flexibility, or directly call general-purpose language models to generate query statements, which can easily lead to inconsistencies or security vulnerabilities in complex enterprise data environments. Therefore, there is an urgent need for a data analysis solution that can understand natural language, combine enterprise-specific knowledge, and achieve controlled queries and intelligent recommendations.
[0005] A review of relevant publicly available technologies reveals several key solutions. One solution, CN103455533A, proposes a query system that improves information retrieval efficiency by optimizing the historical records of search terms. Another solution, CN117743415A, describes an enterprise data query system that uses an optimized retrieval strategy to ensure that query results accurately match the needs of the queryer. A third solution, EP1393213A1, proposes an information system suitable for company decision-makers to understand the company's situation. By setting personalized needs and permissions for each manager, the system can quickly organize report information applicable to specific users.
[0006] The above technical solutions all propose systems for filtering and retrieving enterprise information. However, in the current stage of rapid development of large language models and enterprise big data, a more efficient information retrieval system with better output results can be proposed.
[0007] The foregoing description of the background art is intended only to facilitate understanding of the invention. This description does not endorse or acknowledge any common general knowledge in the materials mentioned. Summary of the Invention
[0008] The purpose of this invention is to provide an intelligent question and indicator recommendation system and method based on a large language model. The system constructs an enterprise knowledge base to manage business indicators, business terms, and high-frequency questions in a structured manner. It utilizes a query understanding module to perform intent recognition and entity extraction on the natural language questions input by the user, achieving the conversion of natural language into structured query intent. Through a prompt word engineering engine, the system hierarchically combines the query intent with enterprise indicator definitions, calculation methods, and contextual information to generate optimized prompt words for controlled driving of the large language model, thereby obtaining structured query instructions. The execution and synthesis module completes data querying, result normalization, and semantic synthesis output based on the query instructions, and, in conjunction with an intelligent recommendation module, proactively recommends relevant indicators or analysis paths based on user roles and business scenarios. This system achieves controllability, accuracy, and knowledge reusability in the query process.
[0009] This invention adopts the following technical solution: an intelligent question-based and index-based recommendation system based on a large language model, the system comprising:
[0010] An enterprise knowledge base is used for the structured storage and unified management of business metrics, business terms, and frequently asked questions related to data analysis within the enterprise.
[0011] The query understanding module is used to perform semantic parsing, query intent recognition, and key entity extraction on the natural language questions input by the user, and to convert the natural language questions into a structured query intent description.
[0012] The prompt word engineering engine is used to combine the structured query intent description with the business indicator definitions, business semantics and contextual information in the enterprise knowledge base to generate optimized prompt words to drive the large language model;
[0013] The execution and synthesis module is used to execute data queries based on the structured query instructions output by the large language model, and to perform semantic synthesis and output of the query results;
[0014] The intelligent recommendation module is used to recommend relevant business metrics or analysis paths to users based on user roles, business scenarios, and the correlation between metrics.
[0015] The self-learning module is used to collect and analyze user question-and-answer behavior, and to learn and optimize high-frequency questions.
[0016] In this configuration, one or more of the query understanding module, prompt word engineering engine, execution and synthesis module, or intelligent recommendation module are configured to call a preset large language model on demand to complete semantic understanding, content generation, or result interpretation processing.
[0017] Preferably, the enterprise knowledge base includes:
[0018] The indicator system library is used to store the indicator definitions, calculation methods, available dimensions, and relationships of business indicators in a hierarchical structure.
[0019] A business terminology database is used to store synonyms, near-synonyms, or aliases of business concepts and to establish a mapping relationship between them and standard business metrics.
[0020] The high-frequency question database stores historical high-frequency questions along with their corresponding query intents, related metrics, and answer templates.
[0021] Preferably, the query understanding module is further configured to: construct a candidate business indicator set for the extracted indicator entities, and perform comprehensive scoring and ranking of the candidate business indicators based on semantic similarity, dimensional compatibility, contextual consistency and historical usage preferences, thereby determining the target business indicator.
[0022] Preferably, the prompt word engineering engine adopts a hierarchical prompt word construction mechanism, and the generated optimized prompt words include at least:
[0023] System-level prompts are used to define the roles and output specifications of large language models.
[0024] Contextual prompts are used to provide conversation history or user background information;
[0025] Knowledge prompts are used to inject definition information and calculation methods related to the target business metrics;
[0026] Task prompts are used to describe the specific tasks that the large language model needs to perform.
[0027] Preferably, the execution and synthesis module is configured to perform the following functions:
[0028] Receive structured query instructions output by a large language model, and perform legality and consistency checks on the structured query instructions;
[0029] After the verification is passed, the structured query instruction is converted into a data access request that can be executed by the underlying data service, and the corresponding original query result is obtained.
[0030] The original query results are processed for data normalization, and semantic explanations are generated based on preset semantic synthesis rules or by calling the large language model again, so as to form a query result output containing data results and their semantic explanations.
[0031] Preferably, the self-learning module is configured to perform the following functions:
[0032] Record users' natural language questions and corresponding query results, and identify high-frequency questions based on semantic similarity and frequency of occurrence;
[0033] For the aforementioned high-frequency problems, standard problem descriptions and corresponding processing strategies are generated or updated to improve system response efficiency when similar problems occur in the future.
[0034] Simultaneously, a method for intelligent question counting and index recommendation based on a large language model is proposed. This method is applied to the intelligent question counting and index recommendation system based on a large language model as described above. The method includes the following steps:
[0035] S100: Receive a natural language query question input by the user and perform semantic preprocessing on the natural language query question;
[0036] S200: Based on a large language model, perform query intent identification and key entity extraction on preprocessed natural language query questions to generate a structured query intent description; wherein the structured query intent description includes at least the target business indicators, analysis dimensions, and query conditions;
[0037] S300: Map and disambiguate the target business indicators with the business indicator system and business terms stored in the enterprise knowledge base to determine the target business indicators that match the current query scenario.
[0038] S400: Based on the structured query intent description and the target business indicators, obtain the relevant business indicator definitions, calculation methods and business semantic information from the enterprise knowledge base, and construct optimized prompt words to drive the large language model;
[0039] S500: Input the optimized prompt words into the large language model to obtain the structured query instructions generated by the large language model;
[0040] S600: Execute a data query operation based on the structured query instruction, obtain the corresponding original query results, and perform data normalization processing on the original query results;
[0041] S700: Perform semantic synthesis processing on the original query results to generate a final query result containing the data results and their semantic explanations, and output it to the user.
[0042] Preferably, the method further includes analyzing the final query results based on user roles, business scenarios, and indicator relationships, and recommending relevant business indicators or analysis paths to the user.
[0043] Preferably, user query behavior and corresponding query results are recorded, and high-frequency questions are learned and optimized based on historical question and answer data.
[0044] The beneficial effects achieved by this invention are:
[0045] 1. This technical solution achieves accurate identification of users' natural language query intent through the synergy of the query understanding module and the enterprise knowledge base. Combined with indicator disambiguation and caliber constraint mechanisms, it avoids query errors caused by ambiguity in business terminology or inconsistency in indicator caliber, enabling non-technical personnel to obtain accurate and reliable data results without mastering complex report structures or database languages.
[0046] 2. The system of this technical solution uses a prompt word engineering engine to layer and combine query intent, enterprise indicator definitions and business semantics to form optimized prompt words. This makes the reasoning process of the large language model limited by the enterprise knowledge boundary and data access rules, effectively reducing the risk of the model generating inaccurate query instructions or accessing data without authorization, and improving the controllability and stability of the system in complex enterprise data environments.
[0047] 3. This technical solution uses an intelligent recommendation module and a self-learning module to continuously analyze and learn from user query behavior and high-frequency questions. It can not only proactively recommend relevant indicators and analysis paths, but also cache and optimize repetitive questions, thereby reducing redundant calculations, improving system response efficiency, and gradually forming a reusable enterprise data analysis knowledge system.
[0048] 4. The various working parts of the system described in this technical solution adopt a modular design. The system can be maintained and upgraded by optimizing and replacing the working modules individually, thereby reducing the subsequent usage and upgrade costs. Attached Figure Description
[0049] The invention will be further understood from the following description taken in conjunction with the accompanying drawings. The components in the drawings are not necessarily drawn to scale, but rather the emphasis is on illustrating the principles of the embodiments. In different views, the same reference numerals designate corresponding parts.
[0050] The icon numbers in the attached diagram are as follows: 100 - System; 110 - Enterprise Knowledge Base; 112 - Indicator System Library; 114 - Business Terminology Library; 116 - High-Frequency Question Library; 120 - Query Understanding Module; 130 - Prompt Word Engineering Engine; 140 - Execution and Synthesis Module; 150 - Intelligent Recommendation Module; 160 - Self-Learning Module; 310 - System-Level Prompt Words; 320 - Context Prompt Words; 330 - Knowledge Prompt Words; 340 - Task Prompt Words; 360 - Optimization Prompt Words; 500 - Computing System; 502 - Bus; 504 - Processor; 506 - Main Memory; 508 - Read-Only Memory; 510 - Storage Device; 512 - Display; 514 - Input Device; 516 - Cursor Control Device; 518 - Network Device;
[0051] Figure 1 This is a schematic diagram of the framework of the intelligent question and index recommendation system described in this embodiment of the invention;
[0052] Figure 2 This is a schematic diagram of the logical flow of the intelligent question and index recommendation system described in this embodiment of the invention;
[0053] Figure 3 This is a schematic diagram of the architecture of the enterprise knowledge base described in this embodiment of the invention;
[0054] Figure 4 This is a schematic diagram of the workflow of the prompt word engineering engine described in this embodiment of the invention;
[0055] Figure 5 This is a schematic diagram of the user interface of the system in an embodiment of the present invention;
[0056] Figure 6 This is a schematic diagram of the computer system architecture used in an embodiment of the present invention. Detailed Implementation
[0057] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to its embodiments. It should be understood that the specific embodiments described herein are for illustrative purposes only and are not intended to limit the invention. Other systems, methods, and / or features of this embodiment will become apparent to those skilled in the art after reviewing the following detailed description. All such additional systems, methods, features, and advantages are intended to be included within this specification, within the scope of the invention, and protected by the appended claims. Further features of the disclosed embodiments are described in the following detailed description, and these features will become apparent from the following detailed description.
[0058] In the accompanying drawings of the embodiments of the present invention, the same or similar reference numerals correspond to the same or similar components. In the description of the present invention, it should be understood that if terms such as "upper," "lower," "left," and "right" indicate the orientation or positional relationship based on the orientation or positional relationship shown in the drawings, they are only for the convenience of describing the present invention and simplifying the description, and do not indicate or imply that the device or component referred to must have a specific orientation. Because the device or component is constructed and operated in a specific orientation, the terms describing positional relationships in the drawings are only for illustrative purposes and should not be construed as limiting this patent. Those skilled in the art can understand the specific meaning of the above terms according to the specific circumstances.
[0059] Example 1: For instance, an intelligent question-based and index-based recommendation system based on a large language model is proposed. The system includes:
[0060] An enterprise knowledge base is used for the structured storage and unified management of business metrics, business terms, and frequently asked questions related to data analysis within the enterprise.
[0061] The query understanding module is used to perform semantic parsing, query intent recognition, and key entity extraction on the natural language questions input by the user, and to convert the natural language questions into a structured query intent description.
[0062] The prompt word engineering engine is used to combine the structured query intent description with the business indicator definitions, business semantics and contextual information in the enterprise knowledge base to generate optimized prompt words to drive the large language model;
[0063] The execution and synthesis module is used to execute data queries based on the structured query instructions output by the large language model, and to perform semantic synthesis and output of the query results;
[0064] The intelligent recommendation module is used to recommend relevant business metrics or analysis paths to users based on user roles, business scenarios, and the correlation between metrics.
[0065] The self-learning module is used to collect and analyze user question-and-answer behavior, and to learn and optimize high-frequency questions.
[0066] In this configuration, one or more of the query understanding module, prompt word engineering engine, execution and synthesis module, or intelligent recommendation module are configured to call a preset large language model on demand to complete semantic understanding, content generation, or result interpretation processing.
[0067] Preferably, the enterprise knowledge base includes:
[0068] The indicator system library is used to store the indicator definitions, calculation methods, available dimensions, and relationships of business indicators in a hierarchical structure.
[0069] A business terminology database is used to store synonyms, near-synonyms, or aliases of business concepts and to establish a mapping relationship between them and standard business metrics.
[0070] The high-frequency question database stores historical high-frequency questions along with their corresponding query intents, related metrics, and answer templates.
[0071] Preferably, the query understanding module is further configured to: construct a candidate business indicator set for the extracted indicator entities, and perform comprehensive scoring and ranking of the candidate business indicators based on semantic similarity, dimensional compatibility, contextual consistency and historical usage preferences, thereby determining the target business indicator.
[0072] Preferably, the prompt word engineering engine adopts a hierarchical prompt word construction mechanism, and the generated optimized prompt words include at least:
[0073] System-level prompts are used to define the roles and output specifications of large language models.
[0074] Contextual prompts are used to provide conversation history or user background information;
[0075] Knowledge prompts are used to inject definition information and calculation methods related to the target business metrics;
[0076] Task prompts are used to describe the specific tasks that the large language model needs to perform.
[0077] Preferably, the execution and synthesis module is configured to perform the following functions:
[0078] Receive structured query instructions output by a large language model, and perform legality and consistency checks on the structured query instructions;
[0079] After the verification is passed, the structured query instruction is converted into a data access request that can be executed by the underlying data service, and the corresponding original query result is obtained.
[0080] The original query results are processed for data normalization, and semantic explanations are generated based on preset semantic synthesis rules or by calling the large language model again, so as to form a query result output containing data results and their semantic explanations.
[0081] Preferably, the self-learning module is configured to perform the following functions:
[0082] Record users' natural language questions and corresponding query results, and identify high-frequency questions based on semantic similarity and frequency of occurrence;
[0083] For the aforementioned high-frequency problems, standard problem descriptions and corresponding processing strategies are generated or updated to improve system response efficiency when similar problems occur in the future.
[0084] Simultaneously, a method for intelligent question counting and index recommendation based on a large language model is proposed. This method is applied to the intelligent question counting and index recommendation system based on a large language model as described above. The method includes the following steps:
[0085] S100: Receive a natural language query question input by the user and perform semantic preprocessing on the natural language query question;
[0086] S200: Based on a large language model, perform query intent identification and key entity extraction on preprocessed natural language query questions to generate a structured query intent description; wherein the structured query intent description includes at least the target business indicators, analysis dimensions, and query conditions;
[0087] S300: Map and disambiguate the target business indicators with the business indicator system and business terms stored in the enterprise knowledge base to determine the target business indicators that match the current query scenario.
[0088] S400: Based on the structured query intent description and the target business indicators, obtain the relevant business indicator definitions, calculation methods and business semantic information from the enterprise knowledge base, and construct optimized prompt words to drive the large language model;
[0089] S500: Input the optimized prompt words into the large language model to obtain the structured query instructions generated by the large language model;
[0090] S600: Execute a data query operation based on the structured query instruction, obtain the corresponding original query results, and perform data normalization processing on the original query results;
[0091] S700: Perform semantic synthesis processing on the original query results to generate a final query result containing the data results and their semantic explanations, and output it to the user.
[0092] Preferably, the method further includes analyzing the final query results based on user roles, business scenarios, and indicator relationships, and recommending relevant business indicators or analysis paths to the user.
[0093] Preferably, user query behavior and corresponding query results are recorded, and high-frequency questions are learned and optimized based on historical question and answer data.
[0094] Specifically, the exemplary functional architecture of the intelligent question-and-index recommendation system based on a large language model is attached. Figure 1 As shown. And, as attached... Figure 2The diagram shows the working logic flowchart of the intelligent query and indicator recommendation system (hereinafter referred to as System 100). System 100 constructs an enterprise indicator knowledge base and combines it with dynamic prompt word engineering to achieve accurate understanding of user natural language queries, indicator mapping, and controlled data querying. Simultaneously, it proactively recommends relevant indicators based on user roles and business scenarios, and learns and caches high-frequency questions to improve the accuracy, response efficiency, and knowledge reuse capabilities of enterprise data queries.
[0095] Preferably, the functional modules included in System 100 are not limited to fixed processing methods, but interact with the Large Language Model (LLM) as needed, based on their respective business functions and processing objectives. Specifically, each functional module in the system can provide structured information, business semantics, or contextual data as input to the Large Language Model at one or more stages of its processing flow, and receive the returned inference results or generated content to assist in completing processing tasks such as semantic understanding, content generation, or result interpretation. It should be noted that the way each module calls the Large Language Model, the timing of the call, and the degree of participation can be configured and adjusted according to specific application scenarios. The relevant calling logic and implementation methods will be described in detail in the subsequent descriptions of each module.
[0096] System 100 includes an enterprise knowledge base 110, used for structured storage, unified management, and external service of core knowledge related to data analysis within the enterprise, providing underlying data and semantic basis for subsequent query understanding, prompt word construction, and indicator recommendation. In addition to providing database functions, the enterprise knowledge base 110 also includes a structured data management module for indicator calculation, business analysis, and natural language interaction scenarios.
[0097] For example, see attached Figure 3 As shown, the enterprise knowledge base 110 includes an indicator system library 112, a business terminology library 114, and a high-frequency question library 116. The indicator system library 112 is used to store all queryable / analyzable business indicators and corresponding data information within the enterprise in a structured manner. The indicator system library 112 preferably uses a hierarchical or tree structure to organize the business indicators; each business indicator node stores information including the indicator name, unique identifier, definition, calculation method, data source, update time, parent or child indicator relationship, and related indicator information. Through this structural setup, the system 100 can accurately identify the subordinate, summary, and comparative relationships between business indicators, providing a foundation for indicator drill-down and indicator linkage analysis.
[0098] Preferably, to address the multi-dimensional analysis needs of business metrics in different business scenarios, the metric system library 112 is also used to store variable dimension information supported by each business metric. This variable dimension information includes, but is not limited to, time dimension, region dimension, organizational dimension, product dimension, or business model dimension, and can further store the enumeration value range, hierarchical relationship, and default analysis granularity corresponding to each dimension. By explicitly modeling the relationship between metrics and dimensions, the system 100 can automatically match reasonable dimension combinations based on implicit or explicit conditions in natural language when generating query commands, avoiding the generation of query requests that do not conform to the metric calculation rules.
[0099] Furthermore, the business terminology database 114 is used to store commonly used business concepts and industry terms within the enterprise, along with their corresponding indicator or field mapping relationships. The business terminology database 114 can be configured with multiple synonyms, near-synonyms, or aliases for the same business concept to address inconsistencies in how different users express themselves in natural language queries. For example, when a user asks a question using non-standard indicator names or colloquial expressions, the system can map the expression to a pre-defined standard indicator in the indicator system database through the business terminology database, thereby achieving semantic disambiguation and accurate matching.
[0100] Furthermore, the high-frequency question database 116 is used for unified management of frequently asked user questions and their corresponding answer frameworks generated during system operation. The high-frequency question database 116 stores at least the question text, the corresponding query intent type, related indicator information, historical question frequency, and verified standard answers or answer generation templates. Through continuous accumulation and updating of user question-and-answer behavior, the high-frequency question database 116 can provide the system with rapid matching and reuse capabilities, thereby reducing redundant calculations and improving overall response efficiency when the same or similar questions occur subsequently.
[0101] During the operation of system 100, the enterprise knowledge base 110 provides a unified knowledge retrieval and invocation interface for the query understanding module, prompt word engineering engine, and intelligent recommendation module to access as needed. By clearly defining the enterprise's business indicators, business semantics, and centralized management of historical knowledge, the invocation of the large language model is always controlled within the enterprise's internal knowledge boundaries, thereby improving the accuracy, stability, and interpretability of natural language query results.
[0102] In an exemplary embodiment, system 100 further includes a query understanding module 120. The query understanding module 120, as the core processing unit of system 100 for user natural language input, performs semantic parsing, intent recognition, and key entity extraction on the natural language questions submitted by user 10, thereby converting unstructured natural language expressions into structured query intents that the system can process. Located between the user interaction layer 200 and the subsequent prompt word engineering engine 130, the query understanding module 120 is a crucial link in achieving effective integration of natural language and enterprise indicator systems; its processing results directly affect the accuracy and stability of subsequent query generation.
[0103] Specifically, the query understanding module 120 first receives a question text based on natural language input by the user and performs basic semantic preprocessing operations on the question text. These semantic preprocessing operations include, but are not limited to, word segmentation, part-of-speech tagging, time expression standardization, and numerical unit identification, to eliminate ambiguous expressions and format differences in natural language. For example, for time descriptions such as "last month," "last year's same period," and "last three months" using natural language, the query understanding module 120 can convert them into standardized time interval representations, providing a unified data format for subsequent query condition construction.
[0104] After completing basic semantic preprocessing, the query understanding module 120 identifies the user's query intent based on a large language model. This query intent identification determines the analysis type corresponding to the user's current question, including but not limited to single-indicator numerical queries, indicator trend analysis, indicator comparison analysis, indicator ranking analysis, or multi-indicator comprehensive analysis. By accurately determining the query intent, the system 100 can clearly identify the data query mode and result presentation method to be invoked subsequently, avoiding the incorrect mapping of different types of business problems to the same data processing flow.
[0105] Simultaneously, the query understanding module 120 is also used to perform key entity extraction operations to identify core elements related to the data query from the natural language question. These core elements include at least indicator entities, dimension entities, and constraint entities. Specifically, indicator entities indicate the specific business indicators that the user is interested in, dimension entities limit the analytical perspective of the indicators, and constraint entities describe time ranges, regional ranges, organizational ranges, or other business filtering conditions. The entity extraction can rely on the large language model's ability to understand contextual semantics, identifying key information implicit in natural language without the need for fixed templates.
[0106] Preferably, to address the issue of homonyms, near-synonyms, or multiple meanings in a company's business metrics, the query understanding module 120 further performs metric disambiguation and mapping processing. Specifically, the query understanding module 120 compares the extracted metric entities with the metric system and business terminology in the company's knowledge base. By comprehensively considering contextual semantics, user historical behavior, and metric relationships, it determines the target metric that best suits the current query scenario. For example, when a user asks a question using non-standard names or colloquial business expressions, the system can automatically map it to a pre-defined standard metric within the company, avoiding query bias caused by inconsistent names.
[0107] Preferably, the query understanding module 120 can also parse and process complex sentence structures and combined questions based on a large language model. When a user simultaneously raises multiple analysis requirements or implies multiple query conditions in the same question, the query understanding module can decompose the question into several sub-query intents and generate a corresponding structured representation for each sub-query intent. Through this processing method, the system 100 can support relatively complex natural language query scenarios without requiring the user to explicitly split the question or follow a fixed question format.
[0108] Specifically, in an exemplary implementation, the query understanding module 120 performs entity extraction on the user's question text Q to obtain an entity set E={e m ,e d ,e t}, where e m For candidate business metric phrases, such as "open chain scale"; e d For entities of dimensions / conditions, such as "this year" or "core enterprises"; e t This refers to business terminology entities. The system retrieves a candidate set M = {m1, m2, ... m} of business metrics from the enterprise knowledge base 110. k}, where each candidate business metric m i Related to its caliber description Def(m) i Available dimensions Dim(m) i ), alias set Syn(m i ) and historical question and answer statistics Freq(m i |role). Among them, historical question and answer statistics Freq(m) i |role) indicates the specific user role (such as account manager, operations staff) for candidate metric m in historical data. i The frequency of successful matching and adoption reflects the historical preference and actual usage tendency of this indicator in similar semantic query scenarios.
[0109] Then, in the disambiguation phase, for each m i Calculate the overall matching score:
[0110] ;
[0111] Where K1 is the semantic similarity score:
[0112] ;
[0113] In the above formula, the function sim() represents semantic similarity.
[0114] K2 is the dimensional compatibility score:
[0115] ;
[0116] In the above formula, the function compat() represents dimension compatibility verification, such as whether "core enterprises" is a statistical dimension allowed by the indicator.
[0117] K3 represents the contextual consistency score:
[0118] ;
[0119] In the above formula, the function ctx() represents the contextual consistency check of terms and calibers.
[0120] K4 represents historical usage preference scores:
[0121] ;
[0122] In the above formula, the function prior(Freq()) represents the prior weight term formed based on historical question-answering statistics. It is used to reflect the historical probability that the candidate index is successfully matched and adopted by the system in the same or similar semantic query scenarios. It is the prior score value obtained by normalization or mapping function transformation.
[0123] The weight values k1, k2, k3, and k4 can be optimized and adjusted according to actual business needs.
[0124] Then, calculate each m i Overall score (m) i After that, sort them in descending order from high to low, and use them as the standard indicator input for subsequent prompt word engineering and controlled query generation.
[0125] For example, if a user queries "the scale of open-chain operations of core enterprises" using natural language, candidate metrics include "open-chain amount" and "number of open-chain enterprises." Since "scale" is synonymous with "amount / quota" in the terminology database, and the user-provided dimension "core enterprises" has the same definition as "open-chain amount," while the description of "number of open-chain enterprises" leans more towards "quantity / number of users," the calculation will result in Score(open-chain amount) > Score(number of open-chain enterprises). Therefore, "open-chain scale" will first be mapped to "open-chain amount" for intent understanding.
[0126] After completing intent recognition, entity extraction, and indicator mapping, the query understanding module 120 generates a unified and standardized structured query intent description and outputs it to the prompt word engineering engine 130. For example, the structured query intent description includes at least the target indicator identifier, analysis dimension information, time and business constraints, and query type identifier. Through the above settings, the query understanding module 120 achieves an effective transition from natural language questions to executable query intents, providing a clear and stable input foundation for the controlled invocation of the large language model and the execution of data queries.
[0127] The prompt word engineering engine 130 is a core functional module in this system used for controlled driving of the large language model. It is used to uniformly organize and dynamically combine the structured query intent description output by the query understanding module 120 with the business indicator definitions, business semantics, and contextual information in the enterprise knowledge base 110, thereby generating optimized prompt words 360 suitable for the current query scenario. Based on these, the large language model is guided to output results that conform to enterprise data specifications and business semantic constraints. Through the setting of the prompt word engineering engine 130, the reasoning process of the large language model always operates within the enterprise's controllable knowledge boundaries and logical framework, avoiding the uncertainty and uninterpretability problems caused by directly relying on general models.
[0128] In an exemplary specific implementation, as shown in the appendix Figure 4 As shown, the prompt word engineering engine 130 can adopt a hierarchical prompt word construction mechanism, dividing different types of information into multiple prompt word sub-layers according to a preset structure, and dynamically combining them based on the current query task to form the final prompt word content. The hierarchical prompt words include at least system-level prompt words 310, context prompt words 320, knowledge prompt words 330, and task prompt words 340. Each layer of prompt words is independent in terms of functional positioning and content source, but they work together on the large language model through unified combination rules.
[0129] System-level prompts (310) are used to constrain the overall role, behavioral boundaries, and output format of the large language model. System-level prompts (310) are typically relatively fixed, used to clarify the model's functional positioning within the system. For example, they might limit it to an enterprise data analysis assistant, allowing analysis and responses only based on provided indicator definitions and data structures, and standardizing the structure, language style, and format of the output content. By setting system-level prompts (310), the generation of content irrelevant to business or inconsistent with system specifications can be effectively reduced.
[0130] The contextual cue words 320 are used to provide the large language model with background information relevant to the current session. Contextual cue words 320 may include, but are not limited to, the user's historical query records, identified metrics information in the current session, and the user's department or role type. By introducing contextual cue words 320, the large model can maintain semantic coherence in multi-turn interaction scenarios and generate more relevant analysis results or supplementary explanations based on the user's historical behavior and usage habits.
[0131] The knowledge prompt term 330 is used to inject enterprise-specific knowledge directly related to the current query into the large language model. The knowledge prompt term 330 is dynamically retrieved from the enterprise knowledge base by the prompt term engineering engine 130 based on the target indicators, related indicators, and analysis dimensions identified by the query understanding module. The knowledge prompt term 130 includes at least an indicator definition, a description of the calculation method, available dimension information, and necessary business background information. By providing the above knowledge as explicit input to the large language model, the risk of misjudging the meaning of indicators can be significantly reduced, avoiding inaccurate or inconsistent analysis results due to a lack of domain knowledge.
[0132] The task prompt 340 describes the specific task that the large language model needs to perform. Based on the structured query intent output by the query understanding module, the task prompt 340 clarifies the type of operation the model needs to complete, such as generating structured query instructions, summarizing query results, comparing and analyzing results across multiple metrics, or generating suitable analytical conclusion text for display. By clearly describing the task objective, the output of the large language model is more focused on the current query goal, rather than performing generalized inference.
[0133] Preferably, during the prompt generation process, the prompt engineering engine 130 can dynamically adjust the content proportion and combination order of prompts at each layer according to different query types, indicator complexity, and user interaction stages. For example, in the initial query stage, the emphasis is placed on knowledge prompts and task prompts, while in the multi-round follow-up question stage, the weight of context prompts is strengthened. Through this dynamic construction mechanism, the prompt engineering engine 130 achieves fine-grained control over the reasoning process of the large language model, thereby improving the overall query accuracy, stability, and scalability of the system while ensuring flexibility.
[0134] Furthermore, the execution and synthesis module 140 is a key functional module that receives the output of the large language model and completes the data query execution and result generation. It is used to transform the controlled output results generated by the prompt word engineering engine 130 into executable data access operations, and after obtaining the raw data, it uniformly organizes, analyzes and expresses the query results, thereby outputting query results to users that conform to business semantics and display specifications. This module is located between the system's data semantic layer and data service layer, and is an important component in realizing the closed loop of "natural language - indicator semantics - data results".
[0135] In a specific exemplary embodiment, the execution and synthesis module 140 first receives a structured query instruction from a large language model. This structured query instruction is not direct free text, but rather a query instruction description generated based on prompt word engineering constraints. It may include information such as target indicator identifiers, query conditions, analysis dimensions, time ranges, and query types. The execution and synthesis module 140 parses and verifies the structured query instruction to confirm that it conforms to the enterprise indicator system and data access rules, avoiding undefined indicators, illegal dimensions, or disallowed data access requests.
[0136] For example, after parsing and verification, the execution and synthesis module 140 converts the structured query instructions into specific data access requests that can be executed by the underlying data services. This process may include generating standardized SQL query statements, calling predefined data interfaces, or triggering indicator calculation services in the enterprise data platform. By constructing queries based on indicator identifiers rather than original field names, the query logic is decoupled from the underlying data table structure, thereby improving the system's adaptability to changes in data sources and its overall stability.
[0137] The underlying data service refers to the basic data access and indicator calculation service layer that actually carries the data storage, calculation, and access capabilities. In an exemplary implementation, the underlying data service can be provided by a data service system associated with the enterprise knowledge base 110. The underlying data service is responsible for completing the actual data reading, aggregation, and calculation operations based on the structured query instructions generated by the upper-layer modules. The underlying data service may include relational database query services (such as SQL execution engines based on MySQL or PostgreSQL), data warehouse or data lake access services (such as Hive, ClickHouse, BigQuery), and indicator calculation interfaces or aggregation services encapsulated in the enterprise data platform. For example, when the system queries "the monthly value of a certain indicator," the underlying data service is responsible for executing the corresponding indicator calculation logic and returning the result, without directly exposing the underlying data table structure.
[0138] After the data access request is executed, the execution and synthesis module 140 also includes receiving the raw query results returned by the underlying data service. The raw query results can be in various forms, such as numerical results, time series data, grouped statistical results, or multidimensional cross-analysis data. The execution and synthesis module 140 performs unified data standardization processing on the raw query results, including data formatting and necessary unit conversions, to ensure consistency in subsequent analysis and display.
[0139] Subsequently, the execution and synthesis module 140 further performs semantic synthesis processing on the original query results. Semantic synthesis processing includes, but is not limited to, summarizing the query results, describing trends, conducting comparative analysis, or providing business interpretation. This process can be achieved by re-invoking the large language model and utilizing preset prompt word templates, allowing the large language model to convert structured data into natural language analysis results that conform to business understanding habits. Through these methods, the system can not only present the data itself to the user but also provide explanatory content with business semantics.
[0140] Preferably, the execution and synthesis module 140 is further configured to generate suitable visualization results for display based on the query result type and user preferences. The visualization results may include tables, line charts, bar charts, or other commonly used display formats for business analysis, and can be switched according to the nature of the indicators or user instructions. The execution and synthesis module uniformly encapsulates the visualization results and the semantically synthesized analysis content to form the final query result output.
[0141] Furthermore, the system 100 also includes an intelligent recommendation module 150, which, without the user explicitly submitting a query request or after completing an existing query, recommends metrics, analysis dimensions, or subsequent analysis paths that the user may be interested in based on user profiles, business scenarios, and metric relationships, thereby guiding the user to conduct more comprehensive data analysis. The intelligent recommendation module 150, through comprehensive analysis of user behavior and business semantics, achieves a shift from passive, reactive queries to proactive, guided analysis.
[0142] In a specific exemplary implementation, the intelligent recommendation module 150 acquires and sets the user's role information, department, permission scope, and historical query records, and constructs a user analysis preference description based on the current session context. Based on this description, the module filters a set of candidate business indicators from the enterprise knowledge base that are related to the user's business responsibilities or the current query indicators, using these as the basic source for recommendations. For example, when the user is a customer manager, the system 100 may prioritize indicators related to customer size, transaction amount, or ranking as recommendation candidates.
[0143] For example, the intelligent recommendation module 150 can also be used to perform contextualized recommendation processing. When the intelligent recommendation module 150 detects that a user enters a specific business page, completes a metric query, or is at a specific time point, it can automatically trigger recommendation logic to push key metrics or analytical perspectives highly relevant to the current business scenario to the user. The recommended content may include single metric recommendations, metric combination recommendations, or analytical dimension recommendations to support the user's further understanding of the business situation.
[0144] For example, the intelligent recommendation module 150 can support linked recommendations based on indicator correlations. When a user queries a target indicator, the intelligent recommendation module 150 can recommend drill-down dimensions, related comparison indicators, or historical trend analysis methods for that indicator based on predefined correlations in the indicator system, thereby guiding the user to gradually delve deeper into the analysis without requiring the user to repeatedly manually input query conditions. For example, after a user queries an indicator "open chain amount," it can recommend its drill-down dimensions, such as "by time trend," "by open chain mode," and "core enterprises."
[0145] Through the above settings, the intelligent recommendation module 150 improves the system's analysis and guidance capabilities and user experience while ensuring that the recommendation logic is controllable.
[0146] The self-learning module 160 continuously collects, analyzes, and evolves the user questioning behavior and question-and-answer results generated during system operation. This allows it to automatically identify high-frequency, common business problems and optimize their processing methods, thereby improving the response efficiency and stability of system 100 in repetitive query scenarios. By accumulating and utilizing historical question-and-answer data, the self-learning module 160 enables system 100 to continuously self-optimize, reducing its reliance on repetitive calculations and reasoning.
[0147] For example, the self-learning module 160 includes recording natural language questions submitted by users at different times and in different business scenarios, as well as the corresponding query results and analysis conclusions output by the system 100. The recorded content may include the question text, the parsed query intent, the target indicator identifier, the query conditions, the result summary, and the user's role information. By uniformly storing the above data, the self-learning module 160 constructs a question-and-answer sample set covering multiple business scenarios.
[0148] For example, the self-learning module 160 further includes semantic clustering and frequency analysis of the question-and-answer sample set. By calculating the similarity between the question text and the structured query intent, the self-learning module 160 can group questions with different semantic expressions but consistent business meanings into the same question category and count the frequency of occurrence of each category within a preset time window. For example, when the frequency of a certain question category reaches a set threshold, the system 100 can identify it as a candidate high-frequency question.
[0149] In a preferred embodiment, for identified high-frequency questions, the self-learning module 160 can automatically generate or update corresponding standard question descriptions and their processing strategies. The processing strategies may include pre-calculating query results, caching historical query results, or generating optimal prompt word templates for this type of question to reduce the number of subsequent calls to the large language model and underlying data services. When a user asks a question that highly matches the high-frequency question again, the system can directly apply the corresponding processing strategy and quickly return the result.
[0150] Furthermore, the self-learning module 160 can also update and manage the invalidation of cached results for high-frequency questions by combining user roles, frequency of indicator changes, and data timeliness, in order to avoid returning expired or inconsistent results. Through the above settings, the self-learning module 160 realizes continuous learning and optimization of high-frequency question-and-answer scenarios, enabling the system to gradually form a stable and reusable enterprise data question-and-answer knowledge system during long-term operation.
[0151] Ultimately, the system 100 can be configured as shown in the appendix. Figure 5 The user interface shown interacts with user 10 to achieve intelligent questioning and indicator recommendation functions.
[0152] Example 2: This example should be understood as including at least all the features of any of the foregoing examples, and further improving upon them;
[0153] In an exemplary implementation, the prompt word engineering engine 130 includes a decision process using a prompt word dynamic construction to automatically select the prompt word level, determine the injected content and weight / order in different query scenarios, and output the final optimized prompt word 360.
[0154] The input Iuput for the prompt word engineering engine 130 is:
[0155] ;
[0156] The output is an optimized prompt word P, which is the result of the ordered concatenation of system-level prompt word 310, context prompt word 320, knowledge prompt word 330, and task prompt word 340. final .
[0157] Where Intent refers to the structured query intent, M is the candidate set of business metrics, Cond refers to the performance / metric conditions, Role is the user role, CtxState is the session state, Policy is the data source constraint, and H is the historical hit statistics.
[0158] For example, the decision-making process includes the following steps:
[0159] E100: Query type is categorized, that is, the structured query intent (Intent) is classified. For example, it can be divided into: A-single indicator data retrieval; B-trend / comparison; C-ranking / TopN; D-drill down / multi-dimensional; E-explanation / attribution, and a task template identifier (TID) is generated.
[0160] E200: Determines the risk level of the query based on the intensity of controlled prompts. For example, the risk level R can be set as follows: if it involves cross-departmental sensitive dimensions, permission boundaries, or unknown indicators / dimensions, then R is high; if the indicators and dimensions are all resolvable in the knowledge base and permissions are granted, then R is medium / low, further subdivided according to the content of the query.
[0161] E300: Enables the prompt level selection switch. For example, it determines whether to enable each level of the prompt engineering engine 130 based on the Intent and R level. The enabling of each level can be determined according to the following rules:
[0162] System-level prompt word 310: Always enabled;
[0163] Contextual hint 320: Enabled if CtxState is a session state that is not empty and has contextual content;
[0164] Knowledge hint 330: Forced to be enabled if R is not a low level and there are caliber / dimension constraints; Forced to enable knowledge hint is to constrain the reasoning space of a large language model by explicitly injecting indicator definitions and calculation caliber in query scenarios with uncertain semantics or complex constraints, so as to prevent caliber drift, dimension overweighting or indicator misjudgment.
[0165] Task prompt word 340: Always enabled.
[0166] E400: Decision on determining the granularity of knowledge injection. For example, the expression for the granularity G of knowledge injection is set as: G∈{G1,G2,G3};
[0167] in,
[0168] G1: Inject only the indicator definition + summary of the scope (≤N1 items);
[0169] G2: Injection of metric definition + dimension enumeration + related metrics (≤N2 items);
[0170] G3: Adds example query snippets / disabled items (≤N3 items) based on G2;
[0171] Wherein, N1 represents the maximum number of indicator definitions and scope summary entries allowed to be injected at the lowest knowledge injection granularity G1, typically used in query scenarios where the indicator semantics are relatively clear and the risk is low, to avoid redundant information interfering with model inference; N2 represents the maximum number of indicator definitions, dimension enumerations, and associated indicator entries allowed to be injected at the medium granularity, used in query scenarios with some semantic ambiguity or dimensional constraints; N3 represents the upper limit of the number of complete knowledge entries allowed to be injected at the highest granularity G3, including example query fragments or disabling rules, used in high-risk or high-complexity query scenarios.
[0172] E500: The weight and order decision of the final output prompt words. For example, the weight vector for the four levels of prompt words is set as W=[w sys ,w ctx ,w kn ,w task These correspond to system-level prompts 310, context prompts 320, knowledge prompts 330, and task prompts 340, respectively; the order of the prompts is O; therefore, the following exemplary rules can be set, for example:
[0173] If the risk level R is high, then the order O is set to system level → knowledge → task → context, and the weight w is increased. kn The value;
[0174] If the task involves analyzing the trend / comparison of indicators, then increase w. taskThe task prompts explicitly require the output of structured content, such as tables, charts, or conclusions.
[0175] If the user asks multiple follow-up questions during the conversation: Increase w ctx And inject the most recent valid conclusion summary into contextual cue word 320.
[0176] E600: Injects constraint prompts and compiles data source constraint policies into constraint fragments (C). guard For example: only indicators and dimensions given in the knowledge prompts are allowed; the output must include "indicator identifier, time range, dimension, and unit"; generating unauthorized fields / table names is prohibited.
[0177] Finally, C guard Place it at the end of system-level prompt word 310 or knowledge prompt word 330.
[0178] E700: Execute the assembly of the prompt words, assembling the content of each layer in sequence O, that is:
[0179] P final =Concat o {P sys , P ctx ,P kn ,P task C guard};
[0180] Following a predetermined order O, prompt word fragments from different sources and carrying different constraint functions are sequentially spliced together to form the final complete prompt word sent into the large language model; among them, the Concat() operation represents the organization of language according to the grammatical logic of natural language.
[0181] E800: Perform an execution self-test, check P final Does it include: indicator ID, time range, dimension set, and output format constraints? If any items are missing, return to step E400 or E500 for enhancement.
[0182] Subsequently, the prompt engineering engine 130 outputs the final optimized prompt word P. final In subsequent processes, the system can update the hit statistics for the corresponding TID, Role, and Intent in the historical hit statistics H based on statistical feedback such as whether the current execution hit the correct business indicators and whether error correction is needed, for subsequent weight fine-tuning.
[0183] Example 3: This example should be understood as including at least all the features of any of the foregoing examples, and further improving upon them;
[0184] For example, as shown in the appendix Figure 6The following diagram illustrates an implementation of the computer system 500 used in the system 100; the computer system 500 can be applied to the data storage, computation, and result output processes of each working module in the identification and judgment system.
[0185] For example, computer system 500 includes bus 502 or other communication mechanism for transmitting information, and one or more processors 504 coupled to bus 502 for processing information; processor 504 may be, for example, one or more general-purpose microprocessors.
[0186] Computer system 500 also includes main memory 506, such as random access memory (RAM), cache and / or other dynamic storage devices, coupled to bus 502 for storing information and instructions to be executed by processor 504; main memory 506 can also be used to store temporary variables or other intermediate information during the execution of instructions executed by processor 504; when these instructions are stored in storage media accessible to processor 504, they present computer system 500 as a dedicated machine customized to perform the operations specified in the instructions;
[0187] The computer system 500 may also include a read-only memory (ROM) 508 or other static storage device coupled to the bus 502 for storing static information and instructions of the processor 504; among which, storage devices 510 such as disks, optical disks or USB drives (flash drives) will be coupled to the bus 502 for storing information and instructions.
[0188] Furthermore, the bus 502 may also include a display 512 for displaying various information, data, media, etc., and an input device 514 for allowing users of the computer system 500 to control, manipulate, and / or interact with the computer system 500.
[0189] A preferred method of interacting with the management system may be through a cursor control device 516, such as a computer mouse or a similar control / navigation mechanism;
[0190] Furthermore, the computer system 500 may also include a network device 518 coupled to the bus 502; wherein the network device 518 may include components such as wired network cards, wireless network cards, switching chips, routers, switches, etc.
[0191] Generally speaking, the terms “engine,” “component,” “system,” and “database” used in this article can refer to the logic embodied in hardware or firmware, or to a set of software instructions that may have entries and exit points, written in programming languages such as Java, C, or C++; software components can be compiled and linked into executable programs, installed in dynamic link libraries, or written in interpreted programming languages such as BASIC, Perl, or Python; it should be understood that software components can be called from other components or from themselves, and / or can be called in response to detected events or interrupts;
[0192] Software components configured to execute on a computing device may be provided on computer-readable media, such as optical discs, digital video discs, flash drives, magnetic disks, or any other tangible media, or as digital downloads (and may be initially stored) in compressed or installable formats that require installation, decompression, or decryption prior to execution; such software code may be stored, in part or in whole, on a memory device executing the computing device; software instructions may be embedded in firmware, such as EPROM; it should also be understood that hardware components may consist of connected logic units (e.g., gates and flip-flops), and / or may consist of programmable units (e.g., programmable gate arrays or processors);
[0193] Computer system 500 includes technologies described herein that can be implemented using custom hardwired logic, one or more ASICs or FPGAs, firmware and / or program logic, which, when combined with the computer system, enables computer system 500 to become a dedicated computing device.
[0194] According to one or more embodiments, the techniques described herein are executed by computer system 500 in response to processor 504 executing one or more sequences of one or more instructions contained in main memory 506; such instructions may be read into main memory 506 from another storage medium such as storage device 510; execution of the sequence of instructions contained in main memory 506 causes processor 504 to perform the processing steps described herein; in alternative embodiments, hardwired circuitry may be used in place of or in combination with software instructions.
[0195] As used herein, the term "non-transitory medium" and similar terms refer to any medium that stores data and / or instructions that enable a machine to operate in a particular manner; such non-transitory medium may include non-volatile medium and / or volatile medium; non-volatile medium includes, for example, optical discs or magnetic disks, such as storage device 510; volatile medium includes dynamic memory, such as main memory 506.
[0196] Common forms of non-transitory media include, for example, floppy disks, hard disks, solid-state drives, magnetic tapes or any other magnetic data storage media, CD-ROMs, any other optical data storage media, any physical media with a hole pattern, RAM, PROM and EPROM, FLASH-EPROM, NVRAM, any other memory chips or cartridges and their network versions.
[0197] Non-transient media differ from transmission media, but can be used in conjunction with transmission media; transmission media participate in information transmission between non-transient media; for example, transmission media include coaxial cables, copper wires, and optical fibers, including the wires constituting bus 502; transmission media can also take the form of sound waves or light waves, such as radio waves and infrared data communication.
[0198] While the invention has been described above with reference to various embodiments, it should be understood that many changes and modifications can be made without departing from the scope of the invention. That is, the methods, systems, and devices discussed above are examples. Various configurations can be appropriately omitted, substituted, or added to various processes or components. For example, in alternative configurations, methods can be performed in a different order than those described, and / or various components can be added, omitted, and / or combined. Moreover, features described with respect to certain configurations can be combined in various other configurations, such as different aspects and elements of the configuration can be combined in a similar manner. Furthermore, the elements therein can be updated as the technology develops; that is, many elements are examples and do not limit the scope of this disclosure or the claims.
[0199] Specific details are provided in the specification to offer a thorough understanding of exemplary configurations, including implementations. However, configurations can be practiced without these specific details; for example, well-known circuits, processes, algorithms, structures, and techniques have been shown without unnecessary detail to avoid obscuring the configuration. This description provides only exemplary configurations and does not limit the scope, applicability, or configuration of the claims. Rather, the foregoing description of the configurations will provide those skilled in the art with an enabling description for implementing the described techniques. Various changes can be made to the function and arrangement of the elements without departing from the spirit or scope of this disclosure.
[0200] In summary, the above detailed description is intended to be illustrative rather than restrictive, and it should be understood that these embodiments are for illustrative purposes only and not for limiting the scope of protection of the invention. After reading the description of this invention, those skilled in the art can make various alterations or modifications to the invention, and these equivalent changes and modifications also fall within the scope defined by the claims of this invention.
Claims
1. A large language model-based intelligent question and index recommendation system, characterized in that, The system includes: An enterprise knowledge base is used for the structured storage and unified management of business metrics, business terms, and frequently asked questions related to data analysis within the enterprise. The query understanding module is used to perform semantic parsing, query intent recognition, and key entity extraction on the natural language questions input by the user, and to convert the natural language questions into a structured query intent description. The prompt word engineering engine is used to combine the structured query intent description with the business indicator definitions, business semantics and contextual information in the enterprise knowledge base to generate optimized prompt words to drive the large language model; The execution and synthesis module is used to execute data queries based on the structured query instructions output by the large language model, and to perform semantic synthesis and output of the query results; The intelligent recommendation module is used to recommend relevant business metrics or analysis paths to users based on user roles, business scenarios, and the relationship between metrics. The self-learning module is used to collect and analyze user question-and-answer behavior, and to learn and optimize high-frequency questions. In this module, one or more of the query understanding module, prompt word engineering engine, execution and synthesis module, or intelligent recommendation module are configured to call a preset large language model on demand to complete semantic understanding, content generation, or result interpretation processing.
2. The system of claim 1, wherein, The enterprise knowledge base includes: The indicator system library is used to store the indicator definitions, calculation methods, available dimensions, and relationships of business indicators in a hierarchical structure. A business terminology database is used to store synonyms, near-synonyms, or aliases of business concepts and to establish a mapping relationship between them and standard business metrics. The high-frequency question database stores historical high-frequency questions along with their corresponding query intents, related metrics, and answer templates.
3. The system of claim 1, wherein, The query understanding module is also configured to: construct a candidate business indicator set for the extracted indicator entities, and comprehensively score and rank the candidate business indicators based on semantic similarity, dimensional compatibility, contextual consistency and historical usage preferences, thereby determining the target business indicator.
4. The system of claim 1, wherein, The prompt word engineering engine employs a hierarchical prompt word construction mechanism, and the generated optimized prompt words include at least the following: System-level prompts are used to define the roles and output specifications of large language models. Contextual prompts are used to provide conversation history or user background information; Knowledge prompts are used to inject definition information and calculation methods related to the target business metrics; Task prompts are used to describe the specific tasks that the large language model needs to perform.
5. The system of claim 1, wherein, The execution and synthesis module is configured to perform the following functions: Receive structured query instructions output by a large language model, and perform legality and consistency checks on the structured query instructions; After the verification is passed, the structured query instruction is converted into a data access request that can be executed by the underlying data service, and the corresponding original query result is obtained. The original query results are processed for data normalization, and semantic explanations are generated based on preset semantic synthesis rules or by calling the large language model again, so as to form a query result output containing data results and their semantic explanations.
6. The system of claim 1, wherein, The self-learning module is configured to perform the following functions: Record users' natural language questions and corresponding query results, and identify high-frequency questions based on semantic similarity and frequency of occurrence; For the aforementioned high-frequency problems, standard problem descriptions and corresponding processing strategies are generated or updated to improve system response efficiency when similar problems occur in the future.
7. A large language model-based intelligent question and index recommendation method, characterized in that, The method is applied to an intelligent question and index recommendation system based on a large language model as described in any one of claims 1 to 6; the method includes the following steps: S100: Receive a natural language query question input by the user and perform semantic preprocessing on the natural language query question; S200: Based on a large language model, query intent identification and key entity extraction are performed on preprocessed natural language query questions to generate a structured query intent description; The structured query intent description mentioned therein includes at least the target business metrics, analysis dimensions, and query conditions; S300: Map and disambiguate the target business indicators with the business indicator system and business terms stored in the enterprise knowledge base to determine the target business indicators that match the current query scenario. S400: Based on the structured query intent description and the target business indicators, obtain the relevant business indicator definitions, calculation methods and business semantic information from the enterprise knowledge base, and construct optimized prompt words to drive the large language model; S5 00: Input the optimized suggestion words into the large language model to obtain the structured query instructions generated by the large language model; S600: Execute a data query operation based on the structured query instruction, obtain the corresponding original query results, and perform data normalization processing on the original query results; S700: Perform semantic synthesis processing on the original query results, generate a final query result containing data results and their semantic explanations, and then output it to the user.
8. The method of claim 7, wherein, The method also includes analyzing the final query results based on user roles, business scenarios, and the correlation between indicators, and recommending relevant business indicators or analysis paths to users.
9. The method of claim 7, wherein, The system records user query behavior and corresponding query results, and learns and optimizes high-frequency questions based on historical question and answer data.