Data processing methods, apparatus, devices, and readable storage media based on large models
By using a data processing method based on a large model to dynamically generate SQL query statements or request parameters, the problem of fixed query paths in existing technologies is solved, and intelligent adaptation to query intent and improved accuracy of data acquisition are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA UNIVERSAL ASSET MANAGEMENT CO LTD
- Filing Date
- 2026-03-30
- Publication Date
- 2026-06-30
AI Technical Summary
Existing data research tools have fixed query paths and cannot dynamically adapt to the optimal data acquisition strategy based on the complexity of the query intent, resulting in an inability to intelligently distinguish between simple and complex data requests.
By using a large-scale model-based data processing method, the system parses the user's natural language questions to generate structured query intents, constructs executable query information using the first large-scale language model, and dynamically selects the optimal data acquisition path by calling the database or query engine through the MCP service.
It enables dynamic adaptation of the optimal data acquisition strategy based on the complexity of the query intent, improving the intelligence, accuracy, and systematic scalability of data queries, and solving the problem of a single and fixed query path.
Smart Images

Figure CN122309540A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of artificial intelligence technology, and in particular to a data processing method, apparatus, device, and readable storage medium based on a large model. Background Technology
[0002] Current mainstream data research tools suffer from significant limitations in their technical architecture and interaction methods: their query paths are singular and fixed. Specifically, whether through keyword search or conditional filtering, the underlying data processing logic of existing systems can typically only generate and execute instructions in a predetermined format. This rigid mechanism leads to two major technical problems: First, it cannot intelligently differentiate the complexity of queries: the system can only handle simple data requests and complex analysis requests in the same way. Second, it lacks dynamically adaptable data acquisition strategies: because the underlying system only supports a single query mode, it cannot dynamically select the optimal data acquisition path based on the true intent implied in the user's natural language questions.
[0003] There is currently no effective solution to the shortcomings of existing technologies, such as fixed data query paths and the inability to dynamically adapt the optimal data acquisition strategy according to the complexity of the query intent. Summary of the Invention
[0004] The purpose of this invention is to provide a data processing method, apparatus, device, and readable storage medium based on a large model, which can solve the defects of the prior art in that the data query path is fixed and the optimal data acquisition strategy cannot be dynamically adapted according to the complexity of the query intent.
[0005] According to one aspect of the present invention, a data processing method based on a large model is provided, the method comprising:
[0006] Receive and parse natural language questions input by users to obtain structured query intents; wherein, the query intents include query subject, query indicators, and query time range; Based on the query intent, query suggestions are generated; Based on the query suggestion words, an executable query information is constructed using a pre-set first language model; wherein the executable query information is configured as: an SQL query statement for directly querying a pre-set database, or request parameters for causing a pre-set query engine to generate a query request to the database; Based on the executable query information, the pre-configured MCP service determines the target data generated within the query time range, which is associated with the query subject and used to characterize the query indicators.
[0007] Optionally, the step of constructing executable query information based on the query suggestion words using a pre-set first language model includes: Based on a pre-defined field mapping table, the query metric is mapped to a field in the database that matches the query metric; wherein, the field mapping table is used to establish a mapping relationship between natural language metrics and fields in the database; Determine whether the field mapped to the query metric is a preset atomic field; wherein, the atomic field is a field stored in the database that is calculated without relying on other fields in the database; If the fields mapped to the query metric are all atomic fields, and the target data is obtained directly by querying the field values of the fields mapped to the query metric, then the SQL query statement is generated.
[0008] Optionally, the step of constructing executable query information based on the query suggestion words using a pre-set first language model further includes: If the field mapped to the query metric includes a non-atomic field, then the request parameters are generated; If the fields mapped to the query metric are all atomic fields, and the target data is obtained by performing a preset data analysis operation on the field values obtained after querying the field values mapped to the query metric, then the request parameters are generated.
[0009] Optionally, the step of determining target data generated within the query time range, associated with the query subject, and used to characterize the query metric, based on the executable query information using a pre-set MCP service, includes: When the executable query information is the SQL query statement, the SQL query statement is executed by calling the preset database query service through the MCP service, so as to directly query the target data from the database.
[0010] Optionally, the step of determining target data generated within the query time range, associated with the query subject, and used to characterize the query metric, based on the executable query information using a pre-set MCP service, includes: When the executable query information is the request parameter, the query engine is invoked through the MCP service, and the request parameter is passed to the query engine; The query engine generates the query request based on the request parameters, queries the database based on the query request to obtain intermediate data, and performs data analysis operations on the intermediate data in association with the query request to generate the target data.
[0011] Optionally, after determining the target data generated within the query time range, associated with the query subject, and used to characterize the query metric based on the executable query information using a preset MCP service, the method further includes: The target data is input into a pre-set second language model; wherein the second language model has a built-in interpretation rule base, which includes the business meaning and evaluation rules of each field in the database, and the evaluation rules are used to evaluate the business level represented by the field based on the field value; Based on the interpretation rule base, the second language model identifies key fields from the target data that represent risk indicators and have business meaning. Based on the field values and evaluation rules of the key fields, the business level of the key fields is evaluated to obtain a risk assessment result for the query subject.
[0012] To achieve the above objectives, the present invention also provides a data processing apparatus based on a large model, the apparatus comprising: The natural language processing module is used to receive and parse the natural language questions input by the user to obtain a structured query intent; wherein, the query intent includes the query subject, query indicators and query time range; The suggestion word generation module is used to generate query suggestion words based on the query intent; The first major model module is used to construct executable query information based on the query prompt words using a pre-set first major language model; wherein, the executable query information is configured as: an SQL query statement for directly querying a pre-set database, or request parameters for causing a pre-set query engine to generate a query request to the database; The data query module is used to determine, based on the executable query information and using a pre-set MCP service, target data generated within the query time range that is associated with the query subject and used to characterize the query indicators.
[0013] Optionally, the first large model module is specifically used for: Based on a pre-defined field mapping table, the query metric is mapped to a field in the database that matches the query metric; wherein, the field mapping table is used to establish a mapping relationship between natural language metrics and fields in the database; Determine whether the field mapped to the query metric is a preset atomic field; wherein, the atomic field is a field stored in the database that is calculated without relying on other fields in the database; If the fields mapped to the query metric are all atomic fields, and the target data is obtained directly by querying the field values of the fields mapped to the query metric, then the SQL query statement is generated.
[0014] To achieve the above objectives, the present invention also provides a computer device comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the large model-based data processing method described above.
[0015] To achieve the above objectives, the present invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, is used to implement the steps of the large-model-based data processing method described above.
[0016] The data processing method, apparatus, device, and readable storage medium based on a large model provided by this invention intelligently parses the user's query intent through a first large language model and dynamically generates two essentially different executable query information: an SQL statement that directly queries the database or request parameters that drive a dedicated query engine. These are executed by calling the corresponding resources via the MCP (Model Context Protocol) service, thereby achieving accurate differentiation of query complexity and dynamic selection of the optimal data acquisition path. This overcomes the shortcomings of existing technologies where query paths are single and fixed, and cannot adapt to complex analysis needs, significantly improving the intelligence level, accuracy, and systematic scalability of data queries. Attached Figure Description
[0017] Various other advantages and benefits will become apparent to those skilled in the art upon reading the following detailed description of preferred embodiments. The accompanying drawings are for illustrative purposes only and are not intended to limit the invention. Furthermore, the same reference numerals denote the same parts throughout the drawings. In the drawings: Figure 1 A flowchart of the data processing method based on a large model provided in Example 1; Figure 2 A block diagram of the data processing device based on a large model provided in Embodiment 2; Figure 3 A block diagram of a computer device suitable for implementing a large-model-based data processing method, provided in Embodiment 3. Detailed Implementation
[0018] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without inventive effort are within the scope of protection of this invention.
[0019] Example 1 Embodiment 1 of the present invention provides a data processing method based on a large model, such as... Figure 1 As shown, the data processing method based on the large model may include steps S1 to S4, wherein: Step S1: Receive and parse the natural language question input by the user to obtain a structured query intent; wherein, the query intent includes the query subject, query indicators, and query time range.
[0020] The execution system in this embodiment can be built on the Dify platform, whose overall architecture includes a user interaction layer, a data processing layer, and an output layer. Users interact with the system through natural language, and the system processes the data across multiple layers to ultimately return accurate target data. The Dify platform integrates a Natural Language Processing (NLP) module, which serves as an internal processing module within the user interaction layer. This module parses the user's input natural language question into a structured query intent. The query subject refers to the entity object of interest in the natural language question; the query metric refers to the specific data items or analytical dimensions the user wishes to obtain regarding the query subject; and the query time range refers to the explicit or implicit time constraints in the natural language question. The query metric directly determines the information theme contained in the target data, and all subsequent data processing stages of the system aim to satisfy the information needs represented by this metric. For example, if the query metric is the Sharpe ratio, then regardless of the path taken, the final target data obtained must be a value or conclusion related to the risk-adjusted return of the fund.
[0021] For example, a user might input: "What is the Sharpe ratio of the XX Consumer Sector Mixed Fund over the past three years?" The parsed query would be: "XX Consumer Sector Mixed Fund", the query indicator would be: Sharpe ratio, and the query time range would be: "the past three years". "The past three years" can be converted to a specific time range, such as January 1, 2023 to December 31, 2025.
[0022] Optionally, receiving and parsing the natural language question input by the user to obtain a structured query intent includes: The problem of receiving natural language input from users; Determine whether the natural language question is related to a preset domain; If so, the pre-built natural language processing module will parse the natural language question input by the user to obtain a structured query intent.
[0023] Specifically, the system can receive user-inputted question text in natural language via a dialog window or voice interface provided by the application. Before initiating semantic parsing, the system first performs preliminary domain screening of the user's question to determine whether it belongs to the system's preset professional domain. This step mainly aims to improve system efficiency, avoid sending irrelevant questions into the subsequent complex parsing and query process, save computing resources, and ensure that questions entering the deep parsing module are within the domain context, reducing ambiguity.
[0024] The judgment logic is usually based on keyword matching. The system maintains a preset keyword library. If no preset keywords are detected in the user's question, the system will directly return a guiding prompt. For example, if the preset domain is fund research, the preset keyword library includes core domain keywords such as fund, net asset value, rate of return, fund manager, subscription, and holdings. The guiding prompt would be: "Your question may not fall within the scope of fund research; please re-enter." As another example, if the preset domain is medical and health consultation, its keyword library may include core domain terms such as symptoms, diagnosis, medication, dosage, department, examination, treatment plan, course of treatment, adverse reactions, and follow-up visits. If the user enters a question recommending a stock, the system will return a guiding prompt: "Your question may not fall within the scope of medical and health consultation; please re-enter."
[0025] Step S2: Based on the query intent, generate query suggestion words.
[0026] To enable the largest language model to reliably perform highly structured tasks in specific domains, such as generating database query instructions, explicit roles, task objectives, and contexts can be assigned to it through prompt word engineering.
[0027] Prompts can include: role definition, task instructions, context and constraints, examples, etc. The role definition defines the role the large model plays in this task, such as a senior financial data analyst. Task instructions clearly describe the specific tasks the large model needs to complete, such as generating a corresponding data query instruction based on the user's question. Context and constraints include: a list of database fields, query intent, and output format rules. The database field list lists the available field definitions for querying, such as field mapping tables and atomic field definitions. Examples provide one or more examples of similar user questions leading to correct instructions for the model to learn from. Prompts may also include the user's input in natural language.
[0028] Step S3: Construct executable query information based on the query prompt words using a preset first language model; wherein, the executable query information is configured as: an SQL query statement for directly querying a preset database, or request parameters for causing a preset query engine to generate a query request to the database.
[0029] Specifically, the first major language model understands context and constraints, analyzes query intent, makes path decisions, and generates corresponding instructions through prompt words. First, the model understands its assigned role, clarifies the scope and meaning of available data fields, and confirms the format specifications to be followed in the output. Second, based on the query intent explicitly given in the prompt words, it analyzes the specific nature of the query metrics and determines which query path corresponds to the current query intent. Specifically, when a simple query is required, the executed query path is to generate an SQL query statement that directly queries the database; when a complex query is required, the executed query path is to generate a request parameter, which the query engine uses to generate a query request to the database.
[0030] The request parameters described in this embodiment are not directly executable query commands, but rather a structured task order or calling specification used to drive a specific query engine. Their core purpose is to translate the user's complex, semantic analytical intent into a standardized set of instructions that the downstream dedicated engine can accurately understand and execute. Optionally, the request parameters may include: engine identifier, operation instructions, and execution parameters. The engine identifier specifies which dedicated engine should handle this query request; the operation instructions specify the specific operation or calculation function that the query engine needs to perform; and the execution parameters contain the input information required to execute the above operation instructions, such as the query subject, query metrics, query time range, and filtering / sorting conditions.
[0031] Optionally, the step of constructing executable query information based on the query suggestion words using a pre-set first language model includes: Based on a pre-defined field mapping table, the query metric is mapped to a field in the database that matches the query metric; wherein, the field mapping table is used to establish a mapping relationship between natural language metrics and fields in the database; Determine whether the field mapped to the query metric is a preset atomic field; wherein, the atomic field is a field stored in the database that is calculated without relying on other fields in the database; If the fields mapped to the query metric are all atomic fields, and the target data is obtained directly by querying the field values of the fields mapped to the query metric, then the SQL query statement is generated.
[0032] Specifically, the definition of atomic fields clearly defines which fields in the database are atomic fields. For example, fund codes and net asset values per unit are atomic fields; the Sharpe ratio, whose value needs to be calculated from the return series and the risk-free rate, is a non-atomic field.
[0033] The type of query indicator also determines whether the target data is the original value stored in the database, a derived value calculated based on the original value, or a filtered and sorted set. For example, when the query indicator is the original value, the target data is usually one or more field values retrieved directly from the database; when the query indicator is the derived value, the target data is a derived value generated by performing preset data analysis processing on one or more field values in the database; when the query indicator implies set operations (such as top ten returns, maximum drawdown), the target data is obtained by filtering, sorting, and comparing the set of fields retrieved from the database based on that query indicator.
[0034] When the query metric is a single metric, it is mapped to a single field; when the query metric includes multiple metrics, it is mapped to multiple fields. For each field mapped to a query metric, it is necessary to determine whether it is an atomic field. If all fields mapped to the query metric are atomic fields, it indicates that this query is a simple query, and a single SQL query statement can be generated directly.
[0035] Optionally, the step of constructing executable query information based on the query suggestion words using a pre-set first language model further includes: If the field mapped to the query metric includes a non-atomic field, then the request parameters are generated; If the fields mapped to the query metric are all atomic fields, and the target data is obtained by performing a preset data analysis operation on the field values obtained after querying the field values mapped to the query metric, then the request parameters are generated.
[0036] Specifically, the first language model infers based on query suggestions and generates request parameters when any of the following conditions are met: Condition 1: After matching with the field mapping table, one or more fields mapped to the query metric contain at least one non-atomic field, thus generating request parameters. Condition 2: Even if all fields mapped to the query metric are atomic fields, the model determines that the target data required to satisfy the query intent cannot be obtained simply by querying and directly returning these field values. Instead, preset data analysis operations must be performed on these queried field values. These preset data analysis operations include at least one of the following: calculating one or more field values to generate derived metrics; comparing different field values within the same data record or between different data records; filtering a set of data records according to specified conditions or sorting the set according to specified rules; and performing aggregate statistics on a set of data records. Once any of the above conditions is triggered, the first language model constructs a structured request parameter to explicitly communicate to the MCP service that a specific engine needs to be invoked, specifying the task to be performed by the engine and the specific input required to generate the query request.
[0037] This embodiment introduces a field mapping table and predefined rules for atomic fields, enabling the first large-scale language model to simulate the judgment logic of professional analysts, accurately deconstruct and evaluate user query intent, and achieve intelligent and automated decision-making for query paths. It generates efficient database query statements only when the required data is directly obtainable basic facts; once it identifies that the request involves derived calculations or complex analysis, it automatically converts to parameterized instructions that call a dedicated engine. This solves the problem of existing technologies being unable to distinguish between simple searches and complex requests due to their single and fixed query paths, achieving dynamic optimal adaptation of data acquisition strategies and significantly improving the system's query accuracy, processing efficiency, and domain adaptability.
[0038] Step S4: Based on the executable query information, the preset MCP service determines the target data generated within the query time range, which is associated with the query subject and used to characterize the query indicators.
[0039] The MCP service layer receives executable query information generated by the first large-scale language model and, based on its type, calls the corresponding underlying data service or computing engine to ultimately acquire, parse, and verify the target data. Specifically, the MCP service initiates different execution paths depending on the specific type of the received executable query information.
[0040] Optionally, the step of determining target data generated within the query time range, associated with the query subject, and used to characterize the query metric, based on the executable query information using a pre-set MCP service, includes: When the executable query information is the SQL query statement, the SQL query statement is executed by calling the preset database query service through the MCP service, so as to directly query the target data from the database.
[0041] Specifically, the MCP service receives SQL query statements from upstream sources and routes them to a dedicated database query service based on pre-configured settings. This database query service is a module that encapsulates the logic for connecting and interacting with the underlying database. The database query service receives the SQL query statement, establishes a connection with the database, and executes the query. After processing the query, the database engine returns the raw query result set to the database query service. The parser built into the database query service or the MCP service parses the raw result set, for example, transforming the row and column data returned by the database into a unified, easily processed structured data object within the system. Further checks are performed to ensure data integrity, guaranteeing successful query execution and data completeness. After these verifications, the structured data is identified as the target data for this query.
[0042] Optionally, the step of determining target data generated within the query time range, associated with the query subject, and used to characterize the query metric, based on the executable query information using a pre-set MCP service, includes: When the executable query information is the request parameter, the query engine is invoked through the MCP service, and the request parameter is passed to the query engine; The query engine generates the query request based on the request parameters, queries the database based on the query request to obtain intermediate data, and performs data analysis operations on the intermediate data in association with the query request to generate the target data.
[0043] Specifically, the MCP service receives structured request parameters from upstream. First, it parses the parameters, extracts the key routing identifier, and based on this identifier, determines the query engine to be invoked for this request from a pre-configured engine registry. Second, the MCP service establishes a connection with the query engine and passes the complete request parameter package to it. Third, upon receiving the request parameters, the invoked query engine initiates its internal pre-configured specialized processing logic, such as generating a query request, retrieving intermediate data, and performing data analysis operations to generate the target data.
[0044] Generating a query request involves the query engine parsing the operation instructions and execution parameters in the request parameters. Based on its internal business logic, the query engine may first need to retrieve the necessary basic data from the database for calculation or analysis. Then, the query engine dynamically generates one or more query requests to the database based on the query intent. It should be noted that these query requests may be SQL statements or call instructions conforming to a specific data source API specification.
[0045] Obtaining intermediate data involves the query engine executing the query request generated in the previous step through its internal database access module or by calling the general database query interface provided by the MCP service, and retrieving the original or preliminarily processed data set from the database, i.e., intermediate data.
[0046] Performing data analysis operations to generate target data includes: the query engine loading the acquired intermediate data into its core processing module; and executing pre-defined data analysis operations based on the operation instructions defined in the request parameters. These operations encapsulate specialized business logic and algorithms. For example, calculation operations calculate derived indicators such as the P / E ratio and annualized volatility using formulas from the business domain; analysis operations sort, filter, or compare datasets; and model operations run specific risk assessment or prediction models. After completing the data analysis operations associated with the request parameters, the engine obtains the final processing result, which is the target data that satisfies the user's complex query intent.
[0047] In this embodiment, when the executable query information is an SQL statement, MCP directly calls the database query service to execute it, completing the simple query with the shortest path, achieving accurate resource matching and optimal response efficiency. When the executable query information is a request parameter, MCP calls a dedicated query engine, which completes the entire process of generating the query request, obtaining intermediate data, and performing data analysis, professionalizing and modularizing complex analysis tasks. The combination of these two approaches allows the system to dynamically select the optimal execution path based on query complexity, ensuring both efficiency for simple retrieval and professional accuracy for complex analysis. This fundamentally solves the problem of traditional architectures having a single, fixed query path that cannot adapt to requests of varying complexity, achieving efficient scheduling of data processing resources and a significant improvement in overall system performance.
[0048] Optionally, after determining the target data generated within the query time range, associated with the query subject, and used to characterize the query metric based on the executable query information using a preset MCP service, the method further includes: The target data is input into a pre-set second language model; wherein the second language model has a built-in interpretation rule base, which includes the business meaning and evaluation rules of each field in the database, and the evaluation rules are used to evaluate the business level represented by the field based on the field value; Based on the interpretation rule base, the second language model identifies key fields from the target data that represent risk indicators and have business meaning. Based on the field values and evaluation rules of the key fields, the business level of the key fields is evaluated to obtain a risk assessment result for the query subject.
[0049] Specifically, business meaning is used to characterize the interpretation of a field within a business context. For example, the business meaning of the maximum drawdown field is defined as the maximum decline in a fund's net asset value from its highest to its lowest point, used to measure historical extreme risk. Evaluation rules define objective standards for assessing the business performance of a field based on its numerical value. The core of the evaluation rules is to evaluate the business performance represented by the specific numerical value of the field. For example, for the maximum drawdown field, the evaluation rule might be: an absolute value less than 10% indicates excellent risk control, between 10% and 20% indicates moderate risk control, and greater than 20% indicates high risk.
[0050] First, the model analyzes the input target data and, based on the interpretation rule base, filters out fields from all fields in the target data that have business meaning and are directly used to characterize risk indicators. For example, in target data containing fields such as Sharpe ratio, maximum drawdown, and annualized return, the model will identify Sharpe ratio and maximum drawdown as key fields directly related to risk assessment, while annualized return may be regarded as a return indicator rather than a pure risk indicator.
[0051] Secondly, for each identified key field, the model obtains the specific value of that field in the target data, calls the corresponding evaluation rule from the interpretation rule base, and evaluates the value according to the standards defined in the evaluation rule to arrive at a business level judgment result. For example, if the maximum drawdown value is -15% and the absolute value is 15%, then according to the above rules, the model concludes that the business level is moderate in risk control; for a Sharpe ratio value of 1.2, according to its rules, it may conclude that the risk-adjusted return is good.
[0052] Finally, based on the business level assessment results of each key field, the model determines a comprehensive result as the risk assessment result for the query subject. For example, the comprehensive result is determined by weighted averaging of the business level assessment results of each key field.
[0053] Optionally, the second language model can also output analysis reports. These reports may include: the performance and meaning of risk-related indicators, the performance and meaning of return-related indicators, and analyses of other dimensions. For example, a systematic analysis of risk-related indicators such as the Sharpe ratio, maximum drawdown, volatility, and downside standard deviation can be provided. The report not only lists specific values but also explains their meaning and evaluates their performance level in conjunction with a rule-based interpretation library. Detailed interpretations of return-related indicators such as annualized return, historical return curves, peer rankings, and excess returns can be provided, including the stability and sources of returns, and comparisons with benchmarks. Based on risk assessment results and user risk preferences, targeted recommendations can be generated. For example, for risk-averse investors, cautious participation is advised.
[0054] Optionally, the output content of the analysis report can be dynamically switched. Users can specify their current analysis dimension preferences in the conversation via natural language or by selecting preset tags. For example, natural language could be "I want to prioritize returns," or preset tags could be "Return Priority," "Risk Priority," or "Liquidity Priority." Analysis dimension preferences are input as control signals along with the target data into the second language model. Upon receiving the analysis dimension preferences, the model dynamically invokes the corresponding analysis framework template and generates the analysis report based on that template. For example, if the user specifies returns as the priority, the model can establish the report's logical main line as return performance and evaluation: placing historical return data comparisons, return trend analysis, ranking of similar returns, and sources of excess returns at the core, and allocating more space for in-depth interpretation. Through this mechanism, the system can generate objective and professional analysis reports with different focuses based on the same target data, achieving personalized service for users and meeting the needs of users with different risk preferences and investment goals. The analysis report can also be output in a graphical format, automatically generating charts.
[0055] The large-model-based data processing method provided in this embodiment systematically addresses the key shortcomings of existing tools in data retrieval, interpretation, and adaptation. Addressing the issue of one-sided data retrieval in existing technologies, this embodiment achieves accurate understanding of user intent and precise correlation retrieval of multi-dimensional data through intelligent natural language interaction and query generation mechanisms, avoiding the limitations of traditional tools that only provide isolated indicators and lack a holistic view. Addressing the issue of strong subjectivity in human interpretation in existing technologies, this embodiment, based on an objective interpretation rule base and large-model prompt word technology, solidifies professional knowledge and evaluation standards into the system, ensuring that analytical conclusions are generated based on criteria and effectively eliminating biases that may arise from reliance on human experience and sales guidance. Addressing the lack of personalized adaptation in existing technologies, the system supports customized interpretation and graphical display for different investment objectives through configurable user preference settings and dynamic output modules, improving the tool's universality and user experience. This embodiment achieves a second-level response from natural language questioning to result output through full-process automation, significantly improving research efficiency and ease of use, and lowering the user threshold.
[0056] Example 2 Embodiment 2 of the present invention provides a data processing device based on a large model, such as Figure 2 As shown, the large-model-based data processing device 20 specifically includes: Natural Language Processing Module 201 is used to receive and parse natural language questions input by the user to obtain a structured query intent; wherein, the query intent includes the query subject, query indicators and query time range; The prompt word generation module 202 is used to generate query prompt words based on the query intent; The first major model module 203 is used to construct executable query information based on the query prompt words using a preset first major language model; wherein, the executable query information is configured as: an SQL query statement for directly querying a preset database, or request parameters for causing a preset query engine to generate a query request to the database; The data query module 204 is used to determine, based on the executable query information and through a preset MCP service, target data generated within the query time range that is associated with the query subject and used to characterize the query indicators.
[0057] Optionally, the first large model module is specifically used for: Based on a pre-defined field mapping table, the query metric is mapped to a field in the database that matches the query metric; wherein, the field mapping table is used to establish a mapping relationship between natural language metrics and fields in the database; Determine whether the field mapped to the query metric is a preset atomic field; wherein, the atomic field is a field stored in the database that is calculated without relying on other fields in the database; If the fields mapped to the query metric are all atomic fields, and the target data is obtained directly by querying the field values of the fields mapped to the query metric, then the SQL query statement is generated.
[0058] Optionally, the first large model module is further used for: If the field mapped to the query metric includes a non-atomic field, then the request parameters are generated; If the fields mapped to the query metric are all atomic fields, and the target data is obtained by performing a preset data analysis operation on the field values obtained after querying the field values mapped to the query metric, then the request parameters are generated.
[0059] Optionally, the data query module is specifically used for: When the executable query information is the SQL query statement, the SQL query statement is executed by calling the preset database query service through the MCP service, so as to directly query the target data from the database.
[0060] Optionally, the data query module is specifically used for: When the executable query information is the request parameter, the query engine is invoked through the MCP service, and the request parameter is passed to the query engine; The query engine generates the query request based on the request parameters, queries the database based on the query request to obtain intermediate data, and performs data analysis operations on the intermediate data in association with the query request to generate the target data.
[0061] Optionally, the device further includes: An input module is used to input the target data into a preset second language model after the executable query information is determined through a preset MCP service, the target data generated within the query time range, associated with the query subject, and used to characterize the query indicator; wherein the second language model has a built-in interpretation rule base, the interpretation rule base includes the business meaning and evaluation rules of each field in the database, and the evaluation rules are used to evaluate the business level represented by the field based on the field value; The second major model module is used to identify key fields with business meaning used to characterize risk indicators from the target data based on the interpretation rule base using the second major language model. Based on the field value of the key field and the evaluation rules of the key field, the business level of the key field is evaluated to obtain the risk assessment result of the query subject.
[0062] Example 3 This embodiment also provides a computer device, such as a smartphone, tablet computer, laptop computer, desktop computer, rack server, blade server, tower server, or cabinet server (including a standalone server or a server cluster composed of multiple servers), etc., capable of executing programs. Figure 3 As shown, the computer device 30 in this embodiment includes, but is not limited to, a memory 301 and a processor 302 that are communicatively connected to each other via a system bus. It should be noted that... Figure 3 Only a computer device 30 with components 301-302 is shown; however, it should be understood that it is not required to implement all of the components shown, and more or fewer components may be implemented instead.
[0063] In this embodiment, the memory 301 (i.e., the readable storage medium) includes flash memory, hard disk, multimedia card, card-type memory (e.g., SD or DX memory), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 301 may be an internal storage unit of the computer device 30, such as the hard disk or memory of the computer device 30. In other embodiments, the memory 301 may also be an external storage device of the computer device 30, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc., equipped on the computer device 30. Of course, the memory 301 may include both the internal storage unit and the external storage device of the computer device 30. In this embodiment, the memory 301 is typically used to store the operating system and various application software installed on the computer device 30. In addition, the memory 301 may also be used to temporarily store various types of data that have been output or will be output.
[0064] In some embodiments, processor 302 may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other data processing chip. This processor 302 is typically used to control the overall operation of the computer device 30.
[0065] Specifically, in this embodiment, the processor 302 is used to execute the program of the data processing method based on the large model stored in the memory 301. When the program of the data processing method based on the large model is executed, it performs the following steps: Receive and parse natural language questions input by users to obtain structured query intents; wherein, the query intents include query subject, query indicators, and query time range; Based on the query intent, query suggestions are generated; Based on the query suggestion words, an executable query information is constructed using a pre-set first language model; wherein the executable query information is configured as: an SQL query statement for directly querying a pre-set database, or request parameters for causing a pre-set query engine to generate a query request to the database; Based on the executable query information, the pre-configured MCP service determines the target data generated within the query time range, which is associated with the query subject and used to characterize the query indicators.
[0066] For a detailed description of the above method steps, please refer to Example 1. This example will not be repeated here.
[0067] Example 4 This fourth embodiment also provides a computer-readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (e.g., SD or DX memory), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, disk, optical disk, server, App application store, etc., which stores a computer program. When the computer program is executed by a processor, it implements the steps of a data processing method based on a large model. Receive and parse natural language questions input by users to obtain structured query intents; wherein, the query intents include query subject, query indicators, and query time range; Based on the query intent, query suggestions are generated; Based on the query suggestion words, an executable query information is constructed using a pre-set first language model; wherein the executable query information is configured as: an SQL query statement for directly querying a pre-set database, or request parameters for causing a pre-set query engine to generate a query request to the database; Based on the executable query information, the pre-configured MCP service determines the target data generated within the query time range, which is associated with the query subject and used to characterize the query indicators.
[0068] For a detailed description of the above method steps, please refer to Example 1. This example will not be repeated here.
[0069] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.
[0070] The sequence numbers of the above embodiments of the present invention are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.
[0071] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method.
[0072] The above are merely preferred embodiments of the present invention and do not limit the scope of the patent. Any equivalent structural or procedural transformations made based on the description and drawings of the present invention, or direct or indirect applications in other related technical fields, are similarly included within the scope of patent protection of the present invention.
Claims
1. A large model-based data processing method, characterized by, The method includes: Receive and parse natural language questions input by users to obtain structured query intents; wherein, the query intents include query subject, query indicators, and query time range; Based on the query intent, query suggestions are generated; Based on the query suggestion words, an executable query information is constructed using a pre-set first language model; wherein the executable query information is configured as: an SQL query statement for directly querying a pre-set database, or request parameters for causing a pre-set query engine to generate a query request to the database; Based on the executable query information, the pre-configured MCP service determines the target data generated within the query time range, which is associated with the query subject and used to characterize the query indicators.
2. The large model-based data processing method according to claim 1, characterized in that, The step of constructing executable query information based on the query suggestion words using a pre-set first language model includes: Based on a pre-defined field mapping table, the query metric is mapped to a field in the database that matches the query metric; wherein, the field mapping table is used to establish a mapping relationship between natural language metrics and fields in the database; Determine whether the field mapped to the query metric is a preset atomic field; wherein, the atomic field is a field stored in the database that is calculated without relying on other fields in the database; If the fields mapped to the query metric are all atomic fields, and the target data is obtained directly by querying the field values of the fields mapped to the query metric, then the SQL query statement is generated.
3. The large model-based data processing method according to claim 2, characterized in that, The step of constructing executable query information based on the query suggestion words using a pre-set first language model also includes: If the field mapped to the query metric includes a non-atomic field, then the request parameters are generated; If the fields mapped to the query metric are all atomic fields, and the target data is obtained by performing a preset data analysis operation on the field values obtained after querying the field values mapped to the query metric, then the request parameters are generated. 4.The large model-based data processing method according to claim 1, wherein, The step of determining target data generated within the query time range, associated with the query subject and used to characterize the query indicators, based on the executable query information and using a pre-set MCP service, includes: When the executable query information is the SQL query statement, the SQL query statement is executed by calling the preset database query service through the MCP service, so as to directly query the target data from the database.
5. The large model-based data processing method according to claim 1, wherein, The step of determining target data generated within the query time range, associated with the query subject and used to characterize the query indicators, based on the executable query information and using a pre-set MCP service, includes: When the executable query information is the request parameter, the query engine is invoked through the MCP service, and the request parameter is passed to the query engine; The query engine generates the query request based on the request parameters, queries the database based on the query request to obtain intermediate data, and performs data analysis operations on the intermediate data in association with the query request to generate the target data.
6. The large model-based data processing method according to claim 1, wherein, After determining, through a pre-configured MCP service and based on the executable query information, the target data generated within the query time range, associated with the query subject, and used to characterize the query metric, the method further includes: The target data is input into a pre-set second language model; wherein the second language model has a built-in interpretation rule base, which includes the business meaning and evaluation rules of each field in the database, and the evaluation rules are used to evaluate the business level represented by the field based on the field value; Based on the interpretation rule base, the second language model identifies key fields from the target data that represent risk indicators and have business meaning. Based on the field values and evaluation rules of the key fields, the business level of the key fields is evaluated to obtain a risk assessment result for the query subject.
7. A large model-based data processing apparatus, characterized by, The device includes: The natural language processing module is used to receive and parse the natural language questions input by the user to obtain a structured query intent; wherein, the query intent includes the query subject, query indicators and query time range; The suggestion word generation module is used to generate query suggestion words based on the query intent; The first major model module is used to construct executable query information based on the query prompt words using a pre-set first major language model; wherein, the executable query information is configured as: an SQL query statement for directly querying a pre-set database, or request parameters for causing a pre-set query engine to generate a query request to the database; The data query module is used to determine, based on the executable query information and using a pre-set MCP service, target data generated within the query time range that is associated with the query subject and used to characterize the query indicators.
8. The large model-based data processing apparatus according to claim 7, wherein, The first major model module is specifically used for: Based on a pre-defined field mapping table, the query metric is mapped to a field in the database that matches the query metric; wherein, the field mapping table is used to establish a mapping relationship between natural language metrics and fields in the database; Determine whether the field mapped to the query metric is a preset atomic field; wherein, the atomic field is a field stored in the database that is calculated without relying on other fields in the database; If the fields mapped to the query metric are all atomic fields, and the target data is obtained directly by querying the field values of the fields mapped to the query metric, then the SQL query statement is generated.
9. A computer device, the computer device comprising: A memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that the processor executes the computer program to implement the steps of the method according to any one of claims 1 to 6.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it is used to implement the steps of the method according to any one of claims 1 to 6.