Methods, devices, equipment, media, and program products for generating SQL statements.

By identifying the target cognitive group of educational users, and utilizing semantic cognitive frequency network and multi-scale parser to generate time-aware enhanced SQL statements, the problems of accuracy and timeliness of SQL statement generation in the education field are solved, and personalized and dynamic SQL statement generation is realized.

CN122309545APending Publication Date: 2026-06-30CHINA MOBILE CHENGDU INFORMATION & TELECOMM TECH CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINA MOBILE CHENGDU INFORMATION & TELECOMM TECH CO LTD
Filing Date
2026-02-11
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing intelligent natural language to SQL statement conversion methods suffer from misunderstanding and inaccurate generation when faced with complex queries, especially in the education field. They lack dynamic adaptability and multi-level relationship modeling capabilities, the static knowledge base is not timely enough, and modular collaboration mechanisms are missing.

Method used

By identifying the target cognitive group of the querying user, structured semantic content is obtained using the target semantic cognitive frequency network. A matching parser is selected based on complexity score, a relation graph is constructed, and time-aware enhanced SQL statements are generated by dynamically retrieving local knowledge base, thus realizing a semantic understanding-relationship parsing-knowledge enhancement chain architecture.

Benefits of technology

It improves the accuracy of SQL statement generation, solves the problems of insufficient parsing of complex queries and the timeliness of knowledge base in educational scenarios, realizes personalized cognitive adaptation and dynamic knowledge retrieval, and improves the overall processing efficiency and accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309545A_ABST
    Figure CN122309545A_ABST
Patent Text Reader

Abstract

This application provides a method, apparatus, device, medium, and program product for generating SQL statements, relating to the field of artificial intelligence technology. This application can accurately extract entity and intent features from natural language, improving the accuracy of target structured semantic content. This application can match accurate parsing strategies for natural language of varying complexity, realizing the construction of precise relational graphs for complex educational queries. This application solves the problem that existing static knowledge base retrieval methods are difficult to adapt to changes in educational query patterns, improving retrieval accuracy and thus improving the accuracy of the final generated time-aware enhanced SQL statements. This application proposes a SQL statement generation method based on a semantic understanding-relational parsing-knowledge enhancement chain architecture, realizing semantic-driven cascading processing and closed-loop optimization.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of artificial intelligence technology, and in particular to a method, apparatus, device, medium and program product for generating SQL statements. Background Technology

[0002] Natural Language to Structured Query Language (SQL) statement technology is an important research direction in the field of database querying. With the rapid development of Large Language Models (LLMs), LLMs have demonstrated excellent flexibility in handling diverse query needs due to their powerful semantic understanding capabilities. However, because it is difficult for LLMs to achieve full-domain, full-scenario, and highly timely knowledge coverage during the pre-training stage, existing LLMs often exhibit comprehension biases or generate incorrect SQL statements when faced with specialized queries in specific domains (such as education and healthcare).

[0003] In traditional semantic enhancement systems, by building a domain-specific knowledge base and semantic understanding module, the domain concepts most relevant to the user query are first identified based on semantic similarity. Then, the powerful language processing capabilities of the large language model are used to combine this semantic information with the user query, ultimately generating an SQL statement that is both accurate and in line with business logic.

[0004] Existing intelligent natural language to SQL conversion methods often employ a fixed processing flow for semantic understanding. This fixed flow lacks dynamic adaptability, and when the user's input natural language is complex, the understanding of that complex natural language may be flawed. If the semantic understanding is inaccurate, subsequent SQL statements may be generated based on incorrect semantic understanding information. Summary of the Invention

[0005] This application provides a method, apparatus, device, medium, and program product for generating SQL statements, in order to solve the defects of inaccurate intelligent SQL statement generation in the prior art and improve the accuracy of generated SQL statements.

[0006] Firstly, this application provides a method for generating SQL statements, including: Identify the target audience of the users making the query; Input the natural language of the querying user into the target semantic cognition frequency network of the target cognitive group, and obtain the target structured semantic content output by the target semantic cognition frequency network; Determine the complexity score corresponding to the target structured semantic content; Parser selection based on complexity score; A matching-based parser parses the target structured semantic content to obtain a relation graph; Based on the knowledge and time requirements of the relational graph, the local knowledge base is searched to obtain time-aware enhanced SQL statements in natural language.

[0007] In one embodiment, determining the target cognitive group of the querying user includes: Obtain the static attribute characteristics and dynamic behavioral characteristics of the querying user; Based on static attribute features and dynamic behavioral features, generate the cognitive vector and its confidence level of the query user; Compare the vector similarity between the cognitive vector of the query user and the cognitive vector of at least one cognitive group, and obtain the cognitive group with the highest vector similarity. When the confidence level is greater than or equal to the confidence level threshold, the cognitive group with the highest vector similarity is taken as the target cognitive group. When the confidence level is less than the confidence threshold, the general cognitive group is used as the target cognitive group.

[0008] In one embodiment, the target structured semantic content includes at least one query entity, and determining the complexity score corresponding to the target structured semantic content includes: Calculate the semantic similarity between each query entity and each abstract concept in the abstract concept library; When the semantic similarity of the query entity is greater than the semantic similarity threshold, the query entity is identified as a matching abstract concept. The complexity score corresponding to the target structured semantic content is calculated based on the number of matched abstract concepts, the number of association tables for matched abstract concepts, and the maximum reasoning depth for matched abstract concepts.

[0009] In one embodiment, selecting a matching parser based on a complexity score includes: When the complexity score exceeds the first complexity score threshold, the parsers that are determined to be matched include microscale relation parsers, mesoscale relation parsers, and macroscale relation parsers. A matching-based parser parses the target structured semantic content to obtain a relation graph, including: The macro-scale relation parser is used to decompose the matching abstract concept and obtain the decomposition result; Based on the mesoscale relation parser, the decomposition results are mapped to business logic to obtain the business logic mapping results; Based on the micro-scale relation parser, the query entities and query intents of the target structured semantic content are mapped to table fields to obtain the mapping relationship between the query entities and query intents, so as to obtain the relation graph.

[0010] In one embodiment, based on the knowledge and time requirements of the relational graph, a local knowledge base is retrieved to obtain time-aware enhanced SQL statements in natural language, including: Transform knowledge requirements into structured semantic vectors; Calculate the knowledge similarity between the structured semantic vector and each semantic vector in the local knowledge base; Calculate the temporal similarity between the time requirements and the temporal information of each semantic vector in the local knowledge base; We calculate the demand score for each semantic vector by weighted summation of knowledge similarity and time similarity; the demand score represents the matching of knowledge demand and time demand between the semantic vector and the relation graph. When the demand score of a semantic vector is greater than the set score, the semantic vector will be identified as a matching semantic vector of the relation graph. Time-aware enhanced SQL statements are obtained based on at least one matching semantic vector.

[0011] In one embodiment, the target semantic cognitive frequency-coherence network is trained based on the following method: The historical natural language of the target cognitive group is used as the sample natural language, and the historical structured semantic content of the target cognitive group is used as the structured semantic content label; the structured semantic content label includes intent label and entity label; The natural language of the samples is labeled based on structured semantic content tags to obtain training samples with tags; The target semantic cognition frequency network is trained based on the training samples, the target loss function, and the target reward function. The target loss function is determined based on the intent recognition loss, entity recognition loss, and consistency loss of the preset network; the consistency loss is determined based on the consistency between the predicted intent and the predicted entity output by the preset network; and the target reward function is determined based on the accuracy of the predicted intent, the accuracy of the predicted entity, the consistency between the predicted intent and the predicted entity, and the performance improvement of the preset model.

[0012] In one embodiment, after obtaining the time-aware enhanced SQL statement based on at least one matching semantic vector, the method further includes: The knowledge coverage of time-aware enhanced SQL statements is determined by the ratio of the number of entities in the target structured semantic content to the number of matching semantic vectors. Based on the time requirements of the relational graph and the timestamp of at least one matching semantic vector, calculate the time error of the time-aware enhanced SQL statement; When the knowledge coverage is lower than the coverage threshold, or the time error exceeds the time window threshold, or the complexity score exceeds the second complexity score threshold, search the external knowledge base and obtain the external knowledge base search results. Update time-aware enhanced SQL statements based on external knowledge base search results.

[0013] Secondly, this application also provides an apparatus for generating SQL statements, comprising: The first semantic understanding module is used to determine the target cognitive group of the querying user; The second semantic understanding module is used to input the natural language of the querying user into the target semantic cognition frequency network of the target cognitive group, and obtain the target structured semantic content output by the target semantic cognition frequency network. The first parsing module is used to determine the complexity score corresponding to the target structured semantic content; The second parsing module is used to select a matching parser based on a complexity score; The third parsing module is used by a matching-based parser to parse the target structured semantic content and obtain a relation graph. The retrieval module is used to retrieve time-aware SQL statements in natural language based on knowledge and time requirements from the local knowledge base, according to the relational graph.

[0014] Thirdly, this application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement any of the SQL statement generation methods described above.

[0015] Fourthly, this application also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements any of the SQL statement generation methods described above.

[0016] Fifthly, this application also provides a computer program product, including a computer program that, when executed by a processor, implements a method for generating SQL statements as described above.

[0017] This application provides a method, apparatus, device, medium, and program product for generating SQL statements. The process involves: identifying the target cognitive group of the querying user; inputting the querying user's natural language into a target semantic cognitive frequency network of the target cognitive group to obtain the target structured semantic content output by the target semantic cognitive frequency network; determining the complexity score corresponding to the target structured semantic content; selecting a matching parser based on the complexity score; parsing the target structured semantic content using the matching parser to obtain a relationship graph; and retrieving a local knowledge base based on the knowledge and time requirements of the relationship graph to obtain a time-aware enhanced SQL statement corresponding to the natural language. This application matches the target semantic cognitive frequency network to the target cognitive group, enabling accurate feature extraction of entities and intentions from natural language based on the unique cognitive patterns and language habits of the target cognitive group, thus improving the accuracy of the target structured semantic content. Furthermore, this application selects a matching parser based on the complexity score to parse the target structured semantic content, enabling accurate parsing strategies for natural language of varying complexity. This solves the problem of insufficient parsing in traditional SQL statement generation methods when handling complex relationships in educational scenarios, achieving the construction of accurate relationship graphs for complex educational queries. This application dynamically retrieves local knowledge bases based on knowledge and time requirements, addressing the problem that existing static knowledge base retrieval methods struggle to adapt to changing educational query patterns. This improves retrieval accuracy and consequently enhances the accuracy of the generated time-aware augmented SQL statements. This application proposes a semantic understanding-relational parsing-knowledge augmentation chain architecture for SQL statement generation, achieving semantic-driven cascading processing and closed-loop optimization. Ultimately, this application realizes end-to-end intelligent agent collaboration in the educational field, from natural language to time-aware augmented SQL. Attached Figure Description

[0018] To more clearly illustrate the technical solutions in this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0019] Figure 1 This is a flowchart illustrating the SQL statement generation method provided in this application.

[0020] Figure 2 This is a flowchart illustrating the semantic understanding of natural language provided in this application.

[0021] Figure 3 This is a flowchart illustrating the process of relation parsing of target structured semantic content provided in this application.

[0022] Figure 4This is a flowchart illustrating the process of generating time-aware enhanced SQL statements provided in this application.

[0023] Figure 5 This is a schematic diagram of the structure of the SQL statement generation device provided in this application.

[0024] Figure 6 This is a schematic diagram of the structure of the electronic device provided in this application. Detailed Implementation

[0025] To make the objectives, technical solutions, and advantages of this application clearer, the technical solutions of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0026] The application areas of this application include generating SQL statements from natural language in the field of education.

[0027] There are three main methods for converting natural language to SQL statements: (1) rule-based template-based conversion scheme; (2) end-to-end generation scheme based on a pure large language model; and (3) hybrid architecture scheme, which combines the semantic understanding module with the code generation capability of a large language model. However, these three technical approaches have their own drawbacks.

[0028] Transformation schemes based on rule templates heavily rely on predefined rules, limiting their flexibility and making it difficult to adapt to complex query requirements. On one hand, rule templates are typically built based on limited scenarios, making it difficult to cover all possible query patterns or new business needs. This results in high manual maintenance costs, long update cycles, and difficulty adapting to rapidly changing educational scenarios. On the other hand, the transformation logic of rule templates depends on predefined mapping relationships, making it difficult to dynamically adjust query styles or handle complex semantics. Furthermore, unlike intelligent agents, they cannot understand context through memory, potentially leading to generated SQL statements that deviate from the user's actual needs.

[0029] End-to-end generation solutions based on Large Language Models (LLM) rely on the model's own knowledge, which carries the risk of comprehension bias and low controllability. LLM may generate SQL statements with syntax or logical errors, producing seemingly reasonable but unexecutable SQL statements, especially in complex query scenarios in education where the risk is even higher. Secondly, the knowledge of pre-trained models has poor timeliness; its knowledge is up-to-date with the training data and cannot be automatically updated, requiring additional fine-tuning or knowledge base supplementation.

[0030] In hybrid architecture solutions, semantic understanding modules often employ fixed processing flows, lacking dynamic adaptability. When the natural language input by the user is highly complex, this semantic understanding module may exhibit comprehension biases when processing complex queries. If the semantic understanding is inaccurate, subsequent modules may generate SQL statements based on erroneous information. Secondly, traditional positional encoding methods lack the ability to model the hierarchical relationships specific to educational scenarios.

[0031] This application proposes a natural language to SQL intelligent agent system for the education field based on the Semantic-Parsing-Enhancement Chain (SPEC), aiming to solve the following problems.

[0032] a. Limitations of Cognitive Differentiation in Semantic Understanding: This approach overcomes the technical bottleneck of mutual interference between intent recognition and entity extraction in traditional end-to-end methods, improving the semantic understanding bias caused by differences in user cognitive background when handling complex queries in the education field. Specifically, it addresses the intent-entity inconsistency problem and the lack of personalized cognitive adaptation capabilities inherent in traditional semantic parsing methods in educational scenarios.

[0033] b. Insufficient multi-level relationship modeling capabilities: This approach avoids the limitations of traditional natural language to SQL (Text-to-SQL) methods in handling abstract concepts in educational scenarios, and improves upon existing SQL statement generation methods' shortcomings in capturing complex mapping relationships between abstract concepts and specific database fields. It also addresses the insufficient modeling capabilities of existing single-scale parsing methods in areas such as hierarchical relationships and cross-table joins.

[0034] c. Static knowledge bases are outdated: Existing Text-to-SQL systems mostly use static knowledge base structures, lacking dynamic real-time update mechanisms, making it difficult to adapt to the rapidly changing data needs in the education field. When new policy changes, enrollment updates, or other time-sensitive information emerge, the system cannot automatically acquire and integrate the latest knowledge, leading to a decrease in the accuracy and timeliness of query results.

[0035] d. Lack of modular collaboration mechanism: The existing SQL statement generation methods are relatively independent in each processing step, lacking an effective cross-module collaborative optimization mechanism. They fail to realize a semantic-centric chain architecture to completely connect the natural language to SQL processing process, resulting in the overall processing efficiency and accuracy not reaching the optimal level.

[0036] The following is combined Figures 1-6 This application describes the method, apparatus, and electronic device for generating SQL statements.

[0037] Figure 1 This is a flowchart illustrating the SQL statement generation method provided in this application, as shown below. Figure 1As shown, the method for generating SQL statements includes steps S100 to S600, and the specific steps are as follows.

[0038] S100: Determine the target cognitive group of the querying users.

[0039] The target cognitive group is the cognitive group that matches the query user. A cognitive group is a group of users with similar cognitive patterns. For example... Figure 2 As shown, based on a large amount of historical user data, the k-means clustering algorithm is used to group users with similar cognitive patterns, resulting in multiple pre-constructed cognitive groups.

[0040] The similarity between the cognitive characteristics of the query user and the cognitive characteristics of each cognitive group is compared, and the cognitive group with the highest similarity is selected as the target cognitive group for the query user. Cognitive characteristics include static attribute characteristics and dynamic behavioral characteristics.

[0041] S200: Input the natural language of the query user into the target semantic cognitive frequency network of the target cognitive group, and obtain the target structured semantic content output by the target semantic cognitive frequency network.

[0042] This application constructs a semantic cognitive resonance network for each cognitive group. Using historical user data from each cognitive group, a pre-trained network is performed to obtain the corresponding semantic cognitive resonance network for each cognitive group. The semantic cognitive resonance network for each cognitive group is fully adapted to the cognitive patterns and language habits of that cognitive group. This network is used to extract entities and intentions from the natural language of that cognitive group, resulting in structured semantic content.

[0043] The target semantic cognitive resonance network comprises a semantic cognitive resonance network for processing the natural language of the target cognitive group. The semantic cognitive resonance network includes an input layer, an intent recognition layer, an entity extraction layer, a collaborative learning layer, and an output layer. The input layer receives the natural language of the query user. The intent recognition layer identifies the unique intent expression habits of the query user's natural language. The entity extraction layer identifies unique entities in the query user's natural language. The collaborative learning layer facilitates mutual reinforcement between the intent recognition and entity extraction tasks. The output layer outputs the target structured semantic content (including the identified entities and intents).

[0044] The target structured semantic content is the structured semantic content corresponding to the natural language of the query user. The target structured semantic content includes the query entity and query intent in natural language. The query intent is the goal or purpose in natural language. The query entity is a key information fragment in natural language used to clarify or refine the query intent.

[0045] like Figure 2As shown, before inputting natural language into the target semantic cognitive frequency network, the natural language needs to be preprocessed. Through simple word segmentation, part-of-speech tagging, and removal of redundant information, the natural language is uniformly processed into a standardized text format. An embedding model is then used to vectorize it, and a structured query feature vector is extracted based on the standardized text, including query complexity, frequency of use of technical terms, and temporal expression preferences.

[0046] The preprocessed natural language is input into the target semantic cognition frequency network to obtain the target structured semantic content output by the target semantic cognition frequency network.

[0047] S300: Determine the complexity score corresponding to the target structured semantic content.

[0048] Complexity scores characterize the complexity of the target structured semantic content. A higher complexity score indicates more complex target structured semantic content and a more macroscopic level of the corresponding matching abstract concept. A lower complexity score indicates simpler target structured semantic content and a more microscopic level of the corresponding matching abstract concept. The semantic similarity between the target structured semantic content and each abstract concept in the abstract concept library is calculated. Abstract concepts with semantic similarity exceeding a similarity threshold are used as matching abstract concepts. The complexity score of the target structured semantic content is then calculated based on the matching abstract concepts.

[0049] The matching abstract concepts are those that match the target structured semantic content. These abstract concepts are pre-constructed conceptual ontologies within the education domain. For example, a library of abstract concepts in the education domain could be constructed. These abstract concepts include "teaching quality," "learning outcomes," "employment prospects," and "professional competence." Each abstract concept defines its corresponding set of quantifiable indicators. For instance, the set of quantifiable indicators for "teaching quality" includes {student academic performance, course evaluation scores, teacher teaching level, course pass rate, school ranking, and graduate employment rate}. "Teacher teaching level" can further correspond to {teacher professional title level, subject grades, task completion rate, and student satisfaction}.

[0050] S400: A parser that selects matches based on complexity scores.

[0051] Parsers include different types of scale relation parsers. Parsers are used to resolve the relationships between query entities and query intents within the target structured semantic content. For example, parsers include micro-scale relation parsers, meso-scale relation parsers, and macro-scale relation parsers.

[0052] Parser selection based on complexity score. The parser is selected based on the comparison results with the first complexity score threshold and the third complexity score threshold. For example, if the third complexity score threshold is less than the first complexity score threshold, the first complexity score threshold is 1.5, and the third complexity score threshold is 1.0. If the complexity score is greater than or equal to the first complexity score threshold (i.e., ...), the parser is selected based on the comparison results with the first complexity score threshold. When the complexity score is ≥1.5, the matching parsers include macro-scale relation parsers, meso-scale relation parsers, and micro-scale relation parsers. When the third complexity score threshold ≤ complexity score < the first complexity score threshold (i.e., 1.0 ≤...), the matching parsers include macro-scale relation parsers, meso-scale relation parsers, and micro-scale relation parsers. When the complexity score is less than 1.5, the matching parser includes both mesoscale relation parsers and microscale relation parsers. When the complexity score is less than the third complexity score threshold (i.e., When <1.0), the matching parser includes the microscale relation parser.

[0053] S500: A matching-based parser that parses the target structured semantic content to obtain a relation graph.

[0054] A matching-based parser parses the target structured semantic content to obtain a relationship graph. A micro-scale relationship parser parses the relationships between query entities and query intents within the target structured semantic content. A meso-scale relationship parser parses moderately abstract matching concepts. A macro-scale relationship parser parses highly abstract matching concepts.

[0055] like Figure 3 As shown, when the matching parser includes a microscale relation parser, the query intent of the target structured semantic content and the relationship between the query entities are parsed according to the microscale relation parser to obtain the relation graph.

[0056] like Figure 3 As shown, when the matching parser includes a mesoscale relation parser and a microscale relation parser, the matched abstract concept is a medium-level abstract concept. First, the medium-level abstract concept is parsed using the mesoscale relation parser, and the parsing result is passed to the microscale relation parser. Based on the parsing result of the medium-level abstract concept, the microscale relation parser analyzes the query intent of the target structured semantic content and the relationships between query entities to obtain a relation graph.

[0057] like Figure 3As shown, when the matching parser includes a macro-scale relation parser, a meso-scale relation parser, and a micro-scale relation parser, the matched abstract concept is a highly abstract concept. First, the highly abstract concept is decomposed into a medium-level abstract concept and concrete entities by the macro-scale relation parser, and then these medium-level abstract concepts and concrete entities are passed to the meso-scale relation parser. The meso-scale relation parser parses the medium-level abstract concept and passes the parsing result and concrete entities to the micro-scale relation parser. Based on the parsing result of the medium-level abstract concept and the concrete entities, the micro-scale relation parser analyzes the query intent of the target structured semantic content and the relationships between the query entities, obtaining a relation graph.

[0058] S600: Based on knowledge and time requirements of relational graphs, it retrieves local knowledge bases and obtains time-aware enhanced SQL statements in natural language.

[0059] The system performs knowledge and time requirements analysis on the relationship graph, and intelligently identifies the types of knowledge that need to be supplemented and the time frame required.

[0060] The local knowledge base includes multiple knowledge entries carrying time vector encoding and validity information. The retrieval strategy is dynamically adjusted based on knowledge and time requirements. Using the adjusted retrieval strategy, the local knowledge base is searched to obtain time-aware enhanced SQL statements in natural language.

[0061] The SQL statement generation method provided in this application involves: determining the target cognitive group of the querying user; inputting the querying user's natural language into the target semantic cognitive frequency network of the target cognitive group to obtain the target structured semantic content output by the target semantic cognitive frequency network; determining the complexity score corresponding to the target structured semantic content; selecting a matching parser based on the complexity score; parsing the target structured semantic content based on the matching parser to obtain a relationship graph; and retrieving a local knowledge base based on the knowledge and time requirements of the relationship graph to obtain a time-aware enhanced SQL statement corresponding to the natural language. This application matches the target semantic cognitive frequency network to the target cognitive group, enabling accurate feature extraction of entities and intentions from natural language based on the unique cognitive patterns and language habits of the target cognitive group, thus improving the accuracy of the target structured semantic content. Furthermore, this application selects a matching parser based on the complexity score to parse the target structured semantic content, enabling accurate parsing strategies for natural language of varying complexity. This solves the problem of insufficient parsing in traditional SQL statement generation methods when handling complex relationships in educational scenarios, and achieves the construction of accurate relationship graphs for complex educational queries. This application dynamically retrieves local knowledge bases based on knowledge and time requirements, addressing the problem that existing static knowledge base retrieval methods struggle to adapt to changing educational query patterns. This improves retrieval accuracy and consequently enhances the accuracy of the generated time-aware augmented SQL statements. This application proposes a semantic understanding-relational parsing-knowledge augmentation chain architecture for SQL statement generation, achieving semantic-driven cascading processing and closed-loop optimization. Ultimately, this application realizes end-to-end intelligent agent collaboration in the educational field, from natural language to time-aware augmented SQL.

[0062] This application proposes an intelligent agent system for generating SQL statements from natural language in the education field based on the Semantic-Parsing-Enhancement Chain (SPEC). Based on the innovative architecture of the Semantic-Parsing-Enhancement Chain, the system identifies the user's query intent through the following three steps based on the natural language provided by the user, meets the user's personalized needs, and finally generates accurate executable SQL statements.

[0063] (1) Cognitive semantic understanding stage. The natural language input by the query user is semantically understood by the cognitive semantic understander (i.e., the target semantic cognitive frequency network of the target cognitive group) to convert the natural language into target structured semantic content containing query intent and query entity.

[0064] (2) Multi-scale relation parsing stage. The target structured semantic content is parsed and multi-scale fused by the matching parser, and the relation graph of query intent and query entity based on natural language is output.

[0065] (3) Dynamic knowledge enhancement stage. The relation graph is enhanced by a knowledge-driven enhancer, and the enhanced semantic representation is finally output. Based on the enhanced semantic representation, time-aware enhanced SQL statements are generated.

[0066] Based on the above embodiments, determining the target cognitive group of the querying user includes the following steps: Obtain the static attribute characteristics and dynamic behavioral characteristics of the querying user; Based on static attribute features and dynamic behavioral features, generate the cognitive vector and its confidence level of the query user; Compare the vector similarity between the cognitive vector of the query user and the cognitive vector of at least one cognitive group, and obtain the cognitive group with the highest vector similarity. When the confidence level is greater than or equal to the confidence level threshold, the cognitive group with the highest vector similarity is taken as the target cognitive group. When the confidence level is less than the confidence threshold, the general cognitive group is used as the target cognitive group.

[0067] like Figure 2 As shown, the target cognitive group of the querying user is determined. This application adopts an incremental cognitive group identification strategy to determine the target cognitive group of the querying user, avoiding repeated calculations for each query. When a new querying user (e.g., with less than 10 historical interactions) queries for the first time, a complete cognitive group confirmation process will be executed. Old querying users will only trigger cognitive group updates in the following situations: (1) more than one month has passed since the last cognitive group update; (2) the querying user's role, professional field, and other basic information have changed. In other cases, the stored target cognitive group will be directly obtained from the querying user's cognitive profile. The confirmation of the querying user's target cognitive group depends on the querying user's static attribute characteristics and dynamic behavioral characteristics.

[0068] Whether it's a new or returning user, static attribute features are extracted from the user's profile. These static attribute features include education level, professional field, and role / identity. Education level is quantified using ordinal coding, professional field is coded using numerical values ​​related to the educational informatization profession, and role / identity is coded using permission levels. All features are then normalized.

[0069] The extraction of dynamic behavioral features employs different extraction strategies for new and old query users.

[0070] For returning users, due to ample historical interaction data, dynamic behavioral characteristics can be directly extracted. These characteristics include query complexity preferences, terminology usage habits, time expression preferences, query frequency, and feedback frequency. Query complexity preferences are obtained by normalizing the average historical query complexity score of the user. Terminology usage habits are obtained by calculating the percentage of technical terms used by the user relative to the total number of words. Time expression preferences are obtained by normalizing the frequency of users using time-related terms (e.g., year, month, week, day, hour, minute, second). Query frequency and feedback frequency are obtained by normalizing the average number of daily interactions and the number of likes or corrections, respectively.

[0071] like Figure 2 As shown, for new users, historical interaction data is relatively insufficient to directly extract dynamic behavioral features. New users entering the system for the first time are mapped to predefined role groups based on their static attribute features. The statistical mean of dynamic behavioral features within these role groups is then used as the initial value for the new user's dynamic behavioral features for a cold start. This is gradually updated as subsequent interactions increase, and the user is assigned to the most suitable cognitive group. For new users with existing historical interactions but whose interaction count has not reached a set threshold, Bayesian inference is used to estimate the Bayesian inference probability distribution across cognitive dimensions based on their static attribute features and limited historical data. Specifically, the system first obtains the statistical distribution of the dynamic behavioral features of the assigned role group as a prior probability. Then, combining this with the new user's limited historical data, a Bayesian update is used to calculate the posterior probability distribution. Finally, the expected value of the probability distribution is taken as the numerical value of the dynamic behavioral feature.

[0072] Based on static attribute features and dynamic behavioral features, a cognitive vector of the query user is generated. Optionally, the cognitive vector is ultimately represented as an 8-dimensional vector, C=[c1, c2, ..., c8], where the first 3 dimensions are static attribute features, the last 5 dimensions are dynamic behavioral features, and C is the cognitive vector of the query user.

[0073] Since the cognitive vectors of new query users are obtained through inference, while those of old query users are based on historical statistics, their reliability differs. This application quantifies the credibility of cognitive vectors by calculating a confidence score, which serves as an important basis for subsequent personalized adaptation. The confidence score ranges from [0, 1]. A higher confidence score indicates a more credible cognitive vector for the query user, while a lower confidence score indicates a less credible cognitive vector. The confidence score for old query users is calculated based on the sufficiency and timeliness of their historical interaction data. The formula for calculating the confidence score for old query users is as follows.

[0074] ; in, For confidence level, To obtain the minimum of the two, To query the number of historical interactions of older users, This refers to the number of days since the last update of the cognitive group for existing users.

[0075] The confidence level of new query users is assessed through uncertainty evaluation using Bayesian inference, and the credibility of the inference result is quantified using information entropy. The formula for calculating the confidence level of new query users is as follows.

[0076] ; Here, 5 represents the five dimensions of dynamic behavioral characteristics. For confidence level, For the first A dynamic behavioral characteristic, For the first One dynamic behavioral characteristic is The probability, For the first The number of possible values ​​for a dynamic behavioral feature. The information entropy is the probability distribution of dynamic behavioral characteristics.

[0077] The calculated confidence level will be used to assess the accuracy of subsequent cognitive group assignments. The higher the confidence level of the query user, the more accurately the target cognitive group assigned to them will reflect their cognitive characteristics.

[0078] Compare the vector similarity between the query user's cognitive vector and the cognitive vectors of each cognitive group, and obtain the cognitive group with the highest vector similarity. Optionally, compare the distance (e.g., cosine distance) between the query user's cognitive vector and the cognitive vectors of each cognitive group, and select the cognitive group with the closest distance as the cognitive group with the highest vector similarity.

[0079] When the confidence level is greater than or equal to a confidence threshold, the cognitive group with the highest vector similarity is selected as the target cognitive group. For example, the confidence threshold is 0.3. When the confidence level is greater than or equal to 0.3, the cognitive group with the highest vector similarity (or the closest cognitive group) is selected as the target cognitive group. Optionally, an update threshold can be set, for example, an update threshold of 0.8. When the confidence level exceeds the update threshold, the target semantic cognitive resonance network of the target cognitive group is updated based on newly added query users. When the confidence threshold is less than or equal to the update threshold (for example, the confidence threshold is between 0.3 and 0.8), newly added query users are marked as boundary users of the target cognitive group, and these boundary users will not participate in the subsequent updates of the target semantic cognitive resonance network of the target cognitive group.

[0080] When the confidence level is less than the confidence threshold (e.g., 0.3), the general cognitive group is used as the target cognitive group to avoid the risk of mismatch.

[0081] This application determines the cognitive vector of a query user based on their static attribute characteristics and dynamic behavioral characteristics. This comprehensive approach, taking into full account multiple features of the query user, improves the accuracy and completeness of the cognitive vector, thus enhancing the accuracy of subsequent identification of the target cognitive group. The reliability of the cognitive vector is evaluated using confidence scores, which in turn determines whether to include the cognitive group with the highest vector similarity as the target cognitive group, further improving the reliability of the target cognitive group.

[0082] Based on the above embodiments, the target semantic cognition frequency network is trained in the following way: The historical natural language of the target cognitive group is used as the sample natural language, and the historical structured semantic content of the target cognitive group is used as the structured semantic content label; the structured semantic content label includes intent label and entity label; The natural language of the samples is labeled based on structured semantic content tags to obtain training samples with tags; The target semantic cognition frequency network is trained based on the training samples, the target loss function, and the target reward function. The target loss function is determined based on the intent recognition loss, entity recognition loss, and consistency loss of the preset network; the consistency loss is determined based on the consistency between the predicted intent and the predicted entity output by the preset network; and the target reward function is determined based on the accuracy of the predicted intent, the accuracy of the predicted entity, the consistency between the predicted intent and the predicted entity, and the performance improvement of the preset model.

[0083] This application constructs a semantic cognitive resonance network for each cognitive group. Each semantic cognitive resonance network is obtained through offline training using historical user data of that cognitive group, ensuring that the parameters of the semantic cognitive resonance network are fully adapted to the cognitive patterns and language habits of that cognitive group. The specific training method for the target semantic cognitive resonance network is as follows.

[0084] The historical natural language of the target cognitive group is used as the sample natural language, and the historical structured semantic content of the target cognitive group is used as the structured semantic content tag. The structured semantic content tag includes intent tags and entity tags.

[0085] The natural language of the samples is labeled based on structured semantic content tags to obtain labeled training samples.

[0086] The pre-built network comprises five core layers: an initial input layer, an initial intent recognition layer, an initial entity extraction layer, an initial co-learning layer, and an initial output layer. The initial input layer receives sample natural language. The initial intent recognition layer identifies unique intent expression habits within the sample natural language. The initial entity extraction layer identifies unique entities within the sample natural language. The initial co-learning layer facilitates mutual reinforcement between intent recognition and entity extraction tasks. The initial output layer outputs predicted structured semantic content (including predicted entities and predicted intents). Both the initial intent recognition and initial entity extraction layers employ a multilayer perceptron architecture, each containing two hidden layers. The output of the initial intent recognition layer is a probability distribution of intent categories, supporting categories such as query, statistical, ranking, and filtering. The initial entity extraction layer identifies key entities such as major names, course names, time expressions, and personnel information. During the pre-training phase of the pre-built network, the initial co-learning layer achieves mutual reinforcement between intent recognition and entity extraction through a target loss function.

[0087] The labeled training samples are input into a pre-defined network to obtain the predicted structured semantic content output by the network. The predicted structured semantic content includes predicted entities and predicted intentions.

[0088] Based on the intent recognition loss, entity recognition loss, and consistency loss of the pre-defined network, a target loss function is constructed. The intent recognition loss is determined based on the error between the predicted intent and the intent label. The entity recognition loss is determined based on the error between the predicted entity and the entity label. The consistency loss is determined based on the distribution difference between the predicted entity and the predicted intent. The formula for calculating the target loss function is as follows.

[0089] ; in, Let be the target loss function. In order to identify the loss, For entity recognition loss, For consistency loss, For KL divergence calculation, For the number of intent tags, The number of training samples, Training samples output by the preset model Belongs to intent tag The probability, For training samples The intent label includes a single-hot encoding. These are the weighting coefficients. To predict the number of entities or entity tags, The number of entity tags, For training samples The Entity tags, each consisting of a single-hot encoding. For training samples The Predicting that an entity belongs to an entity label The probability, This represents the normalized distribution of the predicted intent probability output by the initial intent recognition layer. This is the normalized distribution of the predicted entity probabilities output by the initial entity extraction layer.

[0090] During the training of the pre-defined network, Generalized Reward Policy Optimization (GRPO) reinforcement learning is used to optimize the network. The objective reward function is obtained by weighted summation based on the accuracy of predicted intent, the accuracy of predicted entity, the consistency between predicted intent and predicted entity, and the performance improvement of the pre-defined model. Specifically, the accuracy of predicted intent is derived from the comparison between predicted intent and intent label. The accuracy of predicted entity is derived from the comparison between predicted entity and entity label. The consistency between predicted intent and predicted entity is determined based on the degree of matching between the predicted structured semantic content and the sample natural language. The performance improvement of the pre-defined model is determined based on the performance enhancement before and after training.

[0091] During the training process of training samples, the reward value of the training sample is determined according to the target reward function, and the loss value of the training sample is determined according to the target loss function. The parameters of the preset network are iteratively updated based on the reward value and loss value to obtain the final target semantic cognitive frequency-synchronized network. Similarly, the semantic cognition frequency networks of each cognitive group are all obtained by training the pre-defined network of the cognitive group based on the training samples, target loss function, and target reward function of that cognitive group.

[0092] Furthermore, by periodically using newly added data from the cognitive group, the semantic cognitive resonance network of the cognitive group is incrementally updated, and the trained semantic cognitive resonance network of the cognitive group is deployed independently. Based on the target cognitive group of the querying user, the corresponding target semantic cognitive resonance network is directly invoked to perform semantic understanding of the natural language and obtain the target structured semantic content. This application can convert the natural language of different cognitive groups into structured semantic content containing query intent and query entities, providing high-quality input for the subsequent relation parsing stage.

[0093] This application utilizes K-means clustering to segment and match cognitive groups, and pre-trains a semantic cognitive resonance network for each cognitive group. This network integrates intent recognition and entity extraction tasks through a collaborative learning mechanism and employs Generalized Reward Policy Optimization (GRPO) for reinforcement learning training. This achieves deep fusion of cognitive features (the cognitive vector of the query user) and query semantics (the meaning of natural language), resolving the semantic gap between traditional intent recognition and entity extraction, and supporting semantic understanding adaptation for different cognitive groups. Through this semantic cognitive resonance network, this application achieves mutual promotion between intent recognition and entity extraction. Compared to traditional semantic understanding methods, it improves the accuracy of understanding natural language intent and entities, while effectively addressing the problem of traditional SQL statement generation methods ignoring the cognitive differences of query users.

[0094] Based on the above embodiments, the target structured semantic content includes at least one query entity. Determining the complexity score corresponding to the target structured semantic content includes the following steps: Calculate the semantic similarity between each query entity and each abstract concept in the abstract concept library; When the semantic similarity of the query entity is greater than the semantic similarity threshold, the query entity is identified as a matching abstract concept. The complexity score corresponding to the target structured semantic content is calculated based on the number of matched abstract concepts, the number of association tables for matched abstract concepts, and the maximum reasoning depth for matched abstract concepts.

[0095] The innovation of this application's relation resolution stage lies in multi-scale relation resolution, which addresses the limitations of traditional SQL statement generation methods when handling abstract concepts and complex relationships in the education field. Based on the query intent and query entities extracted in the previous stage, it automatically identifies matching abstract concepts in natural language, decomposes them into calculable specific indicators, and establishes cross-level relation mappings, thereby supporting more intelligent and accurate SQL generation. The relation resolution stage process is as follows: Figure 3 As shown.

[0096] Based on the target structured semantic content output from the cognitive semantic understanding stage, at least one query entity is extracted. The semantic similarity between each query entity and each abstract concept in a pre-built abstract concept library for the education domain is calculated. When the semantic similarity of a query entity is greater than a semantic similarity threshold, the query entity is identified as a matching abstract concept.

[0097] The complexity score of the target structured semantic content is calculated based on the number of matched abstract concepts, the number of association tables for matched abstract concepts, and the maximum inference depth of matched abstract concepts. The formula for calculating the complexity score is as follows.

[0098] ; in, Score the complexity. , and These are all weighting coefficients; for example, they can be set to... =0.4, =0.3, =0.3, To match the number of abstract concepts, To match the number of related tables for abstract concepts, To match the maximum reasoning depth of abstract concepts.

[0099] The number of association tables matching an abstract concept refers to the total number of tables involved in the abstract concept base for that matching abstract concept. The abstract concept base also includes the ontology tree for each abstract concept. The maximum inference depth is determined based on the decomposition level of the matching abstract concept in the ontology tree.

[0100] This application calculates the complexity score of the target structured semantic content based on the number of matched abstract concepts, the number of association tables, and the maximum inference depth, thereby quantifying the complexity of the target structured semantic content and providing a reference for subsequent selection of the matching parser.

[0101] Based on the above embodiments, selecting a matching parser based on complexity scores includes the following steps: When the complexity score exceeds the first complexity score threshold, the parsers that are determined to be matched include microscale relation parsers, mesoscale relation parsers, and macroscale relation parsers. A matching-based parser parses the target structured semantic content to obtain a relation graph, including the following steps: The macro-scale relation parser is used to decompose the matching abstract concept and obtain the decomposition result; Based on the mesoscale relation parser, the decomposition results are mapped to business logic to obtain the business logic mapping results; Based on the micro-scale relation parser, the query entities and query intents of the target structured semantic content are mapped to table fields to obtain the mapping relationship between the query entities and query intents, so as to obtain the relation graph.

[0102] When the complexity score exceeds the first complexity score threshold, the parsers used for matching include micro-scale relation parsers, meso-scale relation parsers, and macro-scale relation parsers. For example, the first complexity score threshold is 1.5. The user input in natural language is "Which majors have good job prospects?" The query entities of the target structured semantic content output by the cognitive semantic understanding stage include "major" and "job prospects". "Job prospects" is identified as an abstract concept (quantity 1), and may then involve student, employment quality, and major information tables (the number of association tables matching abstract concepts is 3). The maximum reasoning depth in the ontology tree is 2 ("job prospects" → "employment rate"). Therefore, the complexity score of this target structured semantic content is 1×0.4+3×0.3+2×0.3=1.9. Since 1.9>1.5, the parsers used for matching this target structured semantic content include macro-scale relation parsers, meso-scale relation parsers, and micro-scale relation parsers.

[0103] like Figure 3 As shown, a macro-scale relation parser decomposes the matched abstract concepts to obtain the decomposition results. The macro-scale relation parser specifically handles relationships between abstract concepts in the education field, with highly abstract concepts (such as "employment prospects" and "professional strength") as input. The macro-scale relation parser decomposes the highly abstract concepts (matching abstract concepts) into abstract indicators. It decomposes the matching abstract concepts into quantifiable abstract indicators using a local tree, then assigns weights to each abstract indicator based on expert knowledge in the education field, and calculates a comprehensive score through weighted summation. Finally, it adds sorting and grouping constraints, ultimately outputting a business relationship graph (decomposition result). For example, the business relationship graph = {abstract indicator (concept), weight (weight), function (function)}. For instance, if a user's natural language query is "top 5 schools in teaching quality," the matching abstract concept is "teaching quality." This is decomposed into the following concepts and assigned weights: {course rating: 0.4, employment rate: 0.3, teacher quality: 0.3}. The weighted sum is used as the comprehensive score, and finally, the top five are sorted.

[0104] The business logic mapping result is obtained by mapping the decomposition results using a mesoscale relation parser. The mesoscale relation parser handles relationships at the educational business logic level, and its input is a business relationship graph (the decomposition result of the macroscale relation parser). The mesoscale relation parser performs business rule matching on the decomposition results, mapping abstract indicators to predefined business calculation rules through cross-table joins, and then determines the calculation path for multi-table joins (e.g., "average score" needs to be joined with the student table, course selection table, and grade table). Finally, it identifies the constraints of filtering and grouping conditions in the business logic, and outputs the business logic mapping result. For example, the business logic mapping result = {(entity, table attribute, domain), (table1, table2), join condition}.

[0105] A microscale relation parser maps business logic mapping results, target structured semantic content query entities, and query intents to table fields, obtaining the mapping relationships between query entities and query intents to generate a relation graph. The microscale relation parser handles direct relationships between query entities and query intents at the database table level. It typically does not involve parsing abstract concepts. The inputs to the microscale relation parser are business logic mapping results, specific query entities, and specific query intents. First, schema matching maps query entities to specific table fields. Then, it identifies inter-table join paths to recognize inter-table relationships. Finally, based on the query intent type, it determines aggregation operations such as Count, Sum, and AVG, ultimately outputting the relation graph. For example, the relation graph could be: {(entity, table attributes, domain), (table1, table2), join condition}.

[0106] like Figure 3As shown, different parsers are used to parse the target structured semantic content with different complexity scores. For high complexity scores (complexity score greater than the first complexity score threshold of 1.5), a macro-scale relation parser is activated to decompose highly abstract concepts into medium-sized abstract indicators and concrete entities. The decomposition results are then passed to the meso-scale and micro-scale relation parsers for further processing. Finally, the parsing results from the three levels are integrated. For medium complexity scores (complexity score between the first complexity score threshold of 1.5 and the third complexity score threshold of 1.0), a meso-scale relation parser is activated to process the abstract indicators. The abstract indicators are converted into specific business calculation logic and then passed to the micro-scale relation parser for specific field mapping and relation identification. Low complexity scores (complexity score less than the third complexity score threshold of 1.0) are directly processed by the micro-scale relation parser to complete the direct mapping of query entities to fields and the identification of basic relationships.

[0107] In this application's hierarchical processing, the output of the upper-level parser serves as the input to the lower-level parser. The system needs to handle the integration of cross-scale relationship results. This involves: first, establishing correspondences between the concepts decomposed at the upper level and the specific fields at the lower level; second, integrating table join paths generated at different levels and removing redundant joins; and third, unifying the multi-level calculation rules. After integration, the system outputs a relationship graph based on the query user's natural language query intent and the query entities, providing structured input for the next stage of knowledge enhancement.

[0108] This application designs a three-tiered parsing architecture—micro, meso, and macro—and dynamically selects the optimal parsing strategy through complexity scoring. The complexity score is calculated based on the number of matched abstract concepts, the number of related tables, and the maximum inference depth, supporting multi-level mapping and progressive parsing from abstract concept table-level relationships to educational abstract concepts. Through layered processing and cross-scale result integration, it solves the problem of insufficient parsing in traditional SQL statement generation methods when handling complex relationships in educational scenarios, achieving accurate relational graph construction for complex educational queries. This application, through its multi-scale relational parsing mechanism, can accurately handle hierarchical relationships and abstract concepts in the education field, effectively solving the problem of insufficient expression in traditional SQL statement generation methods when handling multi-level relationships in educational scenarios, and improving the success rate of SQL statement generation in complex scenarios.

[0109] Based on the above embodiments, and based on the knowledge and time requirements of the relational graph, the local knowledge base is retrieved to obtain the time-aware enhanced SQL statement corresponding to natural language, including the following steps: Transform knowledge requirements into structured semantic vectors; Calculate the knowledge similarity between the structured semantic vector and each semantic vector in the local knowledge base; Calculate the temporal similarity between the time requirements and the temporal information of each semantic vector in the local knowledge base; We calculate the demand score for each semantic vector by weighted summation of knowledge similarity and time similarity; the demand score represents the matching of knowledge demand and time demand between the semantic vector and the relation graph. When the demand score of a semantic vector is greater than the set score, the semantic vector will be identified as a matching semantic vector of the relation graph. Time-aware enhanced SQL statements are obtained based on at least one matching semantic vector.

[0110] In the dynamic knowledge enhancement stage, a dynamic time-aware knowledge base (local knowledge base) is constructed to solve the problems of lagging knowledge updates, lack of timely information, and inability to adapt to rapid changes in the education field in traditional natural language to SQL statement systems.

[0111] like Figure 4 As shown, upon receiving the multi-scale relationship graph output from the multi-scale relationship parsing stage, the system first performs knowledge and time requirement analysis, intelligently identifying the types and time ranges of knowledge that need to be supplemented. The time information in the relationship graph is first divided into two categories for processing: absolute years and dates, such as "2025" or "the last three years," and periodic times such as semesters and quarters, such as "spring semester" or "first half of the year." The analysis of time requirements employs a combination of named entity recognition and rule matching. The system first uses a pre-trained time entity recognition model to identify the time expressions in the relationship graph, and then uses a time normalization module to convert the time expressions of the relationship graph into a standard timestamp format.

[0112] For implicit time requirements (e.g., concepts like "employment prospects" and "development trends" typically require up-to-date data), the system maintains a time-sensitive concept dictionary, assigning time sensitivity scores to keywords such as "latest," "current," and "trend." The time sensitivity score ranges from 0 to 1, representing the degree to which a concept depends on time information; higher timeliness corresponds to a higher time sensitivity score. The time sensitivity score is used to determine the time window for knowledge retrieval. Concepts with high time sensitivity scores prioritize retrieving recent knowledge, while concepts with low time sensitivity scores broaden the retrieval time range. If the query does not involve time information, the latest available knowledge is retrieved by default.

[0113] After completing the analysis of knowledge requirements and time requirements, the local knowledge base retrieval mechanism is initiated. This mechanism uses time-aware knowledge graph retrieval to dynamically adjust the knowledge retrieval strategy according to knowledge requirements and time requirements. Each knowledge entry in the local knowledge base of this application carries time information (including timestamp or time vector encoding) and validity period information. Therefore, the matching strategy during retrieval includes: (1) Knowledge matching: Vectorize the knowledge requirements of the relation graph output in the previous stage to obtain structured semantic vectors. Calculate the knowledge similarity between the structured semantic vectors and the semantic vectors of the local knowledge base. (2) Time matching: Calculate the time similarity between the time requirements and the time vector encoding of the semantic vectors of the local knowledge base. Time matching is divided into absolute time matching and periodic time matching. Absolute time matching is based on the timestamp range, and the closer to the query time, the higher the weight. Periodic time matching uses periodic encoding, which is mainly used to match knowledge of semesters and quarters. The time vector encoding uses an improved version of sine position encoding. The calculation formula of time vector encoding is as follows.

[0114] ; in, and Encode for time vectors, The timestamp carried by knowledge entries in the local knowledge base. Number the dimensions of the structured semantic vector. The range of values ​​is , This represents the total number of dimensions of the structured semantic vector.

[0115] Through time vector encoding, the system is able to capture the periodic and continuous characteristics of time information (such as spring semester, autumn semester, etc.).

[0116] We calculate the demand score for each semantic vector by weighted summation of knowledge similarity and time similarity. The formula for calculating the demand score is as follows.

[0117] ; in, Rate the demand. For knowledge similarity, For time similarity, For structured semantic vectors, For semantic vectors of the local knowledge base, To meet the time requirements of structured semantic vectors, Encoding the semantic vector into a time vector. These are the balancing parameters.

[0118] The demand score in this application actually considers both knowledge similarity and time similarity, with a value range of [0, 1]. Users can set their own score, and only semantic vectors exceeding the set score will be retrieved. When the demand score of a semantic vector is greater than the set score, the semantic vector is identified as a matching semantic vector of the relational graph. Time-aware enhanced SQL statements are obtained based on at least one matching semantic vector.

[0119] This application employs an improved sinusoidal positional encoding method to vectorize time factors (timestamps), constructing a dynamic local knowledge base in the form of a time-aware knowledge graph. Through a demand scoring mechanism based on knowledge similarity and time similarity, this application achieves accurate knowledge retrieval, facilitating continuous optimization and evolution of knowledge and addressing the problem that existing static knowledge bases struggle to adapt to changing educational query patterns. This application constructs a dynamic local knowledge base stored in a timeline format using time information (including time vector encoding and timestamps), supporting local personalized knowledge editing. This application improves the matching accuracy of time-sensitive queries, and the periodic design of the time vectors captures the regular characteristics of educational scenarios such as semesters.

[0120] Based on the above embodiments, after obtaining the time-aware enhanced SQL statement based on at least one matching semantic vector, the following steps are also included: The knowledge coverage of time-aware enhanced SQL statements is determined by the ratio of the number of entities in the target structured semantic content to the number of matching semantic vectors. Based on the time requirements of the relational graph and the timestamp of at least one matching semantic vector, calculate the time error of the time-aware enhanced SQL statement; When the knowledge coverage is lower than the coverage threshold, or the time error exceeds the time window threshold, or the complexity score exceeds the second complexity score threshold, search the external knowledge base and obtain the external knowledge base search results. Update time-aware enhanced SQL statements based on external knowledge base search results.

[0121] After completing the local knowledge base retrieval, the knowledge coverage, time error, and complexity scores of the time-aware enhanced SQL statements were evaluated. A comprehensive evaluation of the knowledge sufficiency of time-aware enhanced SQL statements is conducted. The knowledge coverage of the time-aware enhanced SQL statements is determined based on the ratio of the number of entities in the target structured semantic content to the number of matching semantic vectors. The time error of the time-aware enhanced SQL statements is calculated based on the time requirements of the relational graph and the time information of at least one matching semantic vector.

[0122] like Figure 4As shown, when the knowledge coverage rate is lower than the coverage threshold, the time error exceeds the time window threshold, or the complexity score exceeds the second complexity score threshold, a mismatch in the time-aware enhanced SQL statement is determined, triggering the external knowledge acquisition process and initiating the intelligent agent's internet search plugin. The search plugin employs an intelligent search strategy based on a large language model, understanding the relationship graph and generating targeted search keywords. Based on the search keywords, it searches the external knowledge base and obtains the search results.

[0123] Before merging the time-aware enhanced SQL statements retrieved from the external knowledge base with those retrieved from the local knowledge base, a consistency check is required. If no conflict is detected, the external knowledge base search results are processed as a union. If a conflict is detected, a multi-dimensional credibility assessment mechanism is employed, combining source authority (30% weight, categorized as official data, academic materials, and ordinary web pages), time range difference (40% weight, examining the difference between the retrieved knowledge and the time involved in the query question; the smaller the difference, the higher the score), and multi-source consistency (30% weight, represented by the number of data sources retrieving similar content; the more sources, the higher the score). A weighted sum is then applied to the external knowledge base search results to obtain a dynamic score, retaining the external knowledge base search results with higher scores. Finally, the time-aware enhanced SQL statements retrieved from the external and local knowledge bases are merged to obtain the updated time-aware enhanced semantic representation.

[0124] In the dynamic knowledge enhancement stage of this application, time requirements are innovatively introduced as an important dimension of the structured semantic vector. Simultaneously, a local knowledge base with a timeline is constructed, allowing users to retrieve the local knowledge base according to time requirements. Furthermore, a real-time external knowledge with timely information is acquired through an intelligent agent internet search plugin, providing comprehensive knowledge support for query understanding and SQL statement generation. This achieves Retrieval-Augmented Generation (RAG) of the query intent, query entities, and relation graph extracted in the previous two stages. The workflow of the dynamic knowledge enhancement stage is as follows: Figure 4 As shown.

[0125] This application constructs a highly integrated intelligent agent system through an innovative architecture of Semantic Understanding-Parsing-Enhancement Chain (SPEC) in the form of a workflow. The system organically links the three core stages in the form of a workflow, forming a complete semantic transformation link from natural language input to structured SQL output.

[0126] The innovation of the SPEC architecture lies in its semantic-driven cascading processing mechanism. For example... Figure 2 As shown, the intent-entity semantic representation established in the semantic understanding stage provides precise semantic anchors for the relation parsing stage. For example... Figure 3As shown, the multi-scale relation graph constructed in the relation resolution phase provides structured guidance for knowledge requirement assessment in the knowledge enhancement phase. Figure 4 As shown, the time-aware knowledge output by the knowledge enhancement stage provides comprehensive support for the final SQL generation. Through the forward propagation of semantic information and the tight coupling of output and input between stages, the three stages achieve a deep semantic understanding and accurate query transformation capability that is difficult to achieve in traditional Text-to-SQL systems.

[0127] This application achieves a highly efficient module collaboration mechanism through the SPEC chain-based intelligent agent architecture. By leveraging the concepts of intelligent agent environmental interaction, memory, and feedback, it enables environmental perception to understand user query intent, cognitive processing to parse query structural relationships, knowledge-driven enhancement of query semantic information, and ultimately, the chain-based collaboration of multiple modules to convert natural language into SQL statements.

[0128] This application proposes a natural language to SQL intelligent agent system for the education field based on the SPEC chain. Through core technologies such as cognitive semantic understanding, relational scale parsing, and knowledge-driven enhancement, it achieves intelligent data query and conversion in educational scenarios. The SQL statement generation method of this application can significantly reduce the technical threshold for educational data querying and improve the accuracy and efficiency of query conversion. It can be applied to schools and educational institutions to launch intelligent data query and conversion platforms (paid platforms, customized or charged per query), and to offer paid solutions such as educational big data analysis and teaching quality assessment to education administrators. With the accelerated advancement of educational informatization, the demand for educational data querying is increasing daily. The SQL statement generation method proposed in this application can effectively solve the pain points of existing query systems, achieve commercial operation through diversified charging models, and has broad market application prospects. The SQL statement generation method of this application has broad application prospects in fields such as educational big data analysis and intelligent academic management.

[0129] In the wave of educational informatization, the structured data generated in university teaching, research, and management scenarios is growing exponentially. However, traditional Business Intelligence (BI) tools require professional SQL writing skills, posing a high technical barrier to entry. Furthermore, the data analysis process involves a lengthy cycle of requirement submission, development, and verification, failing to meet the agile needs of educational administrators for real-time decision-making. The SQL statement generation method, device, and product of this application can help university users easily gain data insights, improve decision-making efficiency and quality, and promote the digital transformation of education. The SQL statement generation method, device, and product of this application can be applied to educational management departments and schools, targeting school administrators and teachers. The SQL statement generation method, device, and product of this application can achieve one-click conversion of natural language into complex SQL, significantly lowering the data analysis threshold and creating the first AI-powered intelligent data analysis platform for the education industry from an operator.

[0130] The SQL statement generation apparatus provided in this application will be described below. The SQL statement generation apparatus described below can be referred to in correspondence with the SQL statement generation method described above.

[0131] like Figure 5 As shown, an apparatus for generating SQL statements includes: The first semantic understanding module 501 is used to determine the target cognitive group of the querying user; The second semantic understanding module 502 is used to input the natural language of the query user into the target semantic cognition frequency network of the target cognitive group, and obtain the target structured semantic content output by the target semantic cognition frequency network. The first parsing module 503 is used to determine the complexity score corresponding to the target structured semantic content; The second parsing module 504 is used to select a matching parser based on a complexity score; The third parsing module 505 is used by a matching-based parser to parse the target structured semantic content and obtain a relation graph. The retrieval module 506 is used to retrieve the local knowledge base based on knowledge and time requirements of the relational graph and obtain the corresponding time-aware SQL statements in natural language.

[0132] The SQL statement generation apparatus provided in this application determines the target cognitive group of the querying user; inputs the querying user's natural language into the target semantic cognitive frequency network of the target cognitive group to obtain the target structured semantic content output by the target semantic cognitive frequency network; determines the complexity score corresponding to the target structured semantic content; selects a matching parser based on the complexity score; parses the target structured semantic content based on the matching parser to obtain a relationship graph; and retrieves a local knowledge base based on the knowledge and time requirements of the relationship graph to obtain a time-aware enhanced SQL statement corresponding to the natural language. This application matches the target semantic cognitive frequency network to the target cognitive group, enabling accurate feature extraction of entities and intentions from natural language based on the unique cognitive patterns and language habits of the target cognitive group, thus improving the accuracy of the target structured semantic content. Furthermore, this application selects a matching parser based on the complexity score to parse the target structured semantic content, enabling accurate parsing strategies for natural language of varying complexity. This solves the problem of insufficient parsing in traditional SQL statement generation methods when handling complex relationships in educational scenarios, and achieves the construction of accurate relationship graphs for complex educational queries. This application dynamically retrieves local knowledge bases based on knowledge and time requirements, addressing the problem that existing static knowledge base retrieval methods struggle to adapt to changing educational query patterns. This improves retrieval accuracy and consequently enhances the accuracy of the generated time-aware augmented SQL statements. This application proposes a semantic understanding-relational parsing-knowledge augmentation chain architecture for SQL statement generation, achieving semantic-driven cascading processing and closed-loop optimization. Ultimately, this application realizes end-to-end intelligent agent collaboration in the educational field, from natural language to time-aware augmented SQL.

[0133] In one embodiment, the first semantic understanding module 501 is configured to: acquire the static attribute features and dynamic behavioral features of the query user; generate the query user's cognitive vector and its confidence level based on the static attribute features and dynamic behavioral features; compare the vector similarity between the query user's cognitive vector and the cognitive vectors of at least one cognitive group, and acquire the cognitive group with the highest vector similarity; when the confidence level is greater than or equal to a confidence level threshold, use the cognitive group with the highest vector similarity as the target cognitive group; when the confidence level is less than a confidence level threshold, use the general cognitive group as the target cognitive group.

[0134] In one embodiment, the target structured semantic content includes at least one query entity, and the first parsing module 503 is used to: calculate the semantic similarity between each query entity and each abstract concept in the abstract concept library; when the semantic similarity of the query entity is greater than the semantic similarity threshold, identify the query entity as a matching abstract concept; and calculate the complexity score corresponding to the target structured semantic content based on the number of matching abstract concepts, the number of association tables of matching abstract concepts, and the maximum inference depth of matching abstract concepts.

[0135] In one embodiment, the second parsing module 504 is used to: when the complexity score exceeds a first complexity score threshold, determine that the matching parser includes a micro-scale relation parser, a meso-scale relation parser, and a macro-scale relation parser. The third parsing module 505 is used to: decompose the matching abstract concept based on the macro-scale relation parser to obtain the decomposition result; perform business logic mapping on the decomposition result based on the meso-scale relation parser to obtain the business logic mapping result; and perform table field mapping on the business logic mapping result, the query entity of the target structured semantic content, and the query intent based on the micro-scale relation parser to obtain the mapping relationship between the query entity and the query intent, thereby obtaining a relation graph.

[0136] In one embodiment, the retrieval module 506 is configured to: convert knowledge requirements into structured semantic vectors; calculate the knowledge similarity between the structured semantic vectors and each semantic vector in the local knowledge base; calculate the time similarity between the time requirements and the time information of each semantic vector in the local knowledge base; perform a weighted summation of the knowledge similarity and time similarity to obtain a requirement score for each semantic vector; the requirement score represents the matching status of the knowledge requirements and time requirements of the semantic vector and the relation graph; when the requirement score of a semantic vector is greater than a set score, the semantic vector is identified as a matching semantic vector of the relation graph; and obtain a time-aware enhanced SQL statement based on at least one matching semantic vector.

[0137] In one embodiment, the second semantic understanding module 502 is used to: use the historical natural language of the target cognitive group as sample natural language, and use the historical structured semantic content of the target cognitive group as structured semantic content labels; the structured semantic content labels include intent labels and entity labels; identify the sample natural language according to the structured semantic content labels to obtain training samples carrying labels; train a preset network according to the training samples, the target loss function, and the target reward function to obtain a target semantic cognitive frequency network; wherein, the target loss function is determined based on the intent recognition loss, entity recognition loss, and consistency loss of the preset network; the consistency loss is determined based on the consistency between the predicted intent and the predicted entity output by the preset network; the target reward function is determined based on the accuracy of the predicted intent, the accuracy of the predicted entity, the consistency between the predicted intent and the predicted entity, and the performance improvement of the preset model.

[0138] In one embodiment, the retrieval module 506 is further configured to: determine the knowledge coverage of the time-aware enhanced SQL statement based on the ratio of the number of entities in the target structured semantic content to the number of matching semantic vectors; calculate the time error of the time-aware enhanced SQL statement based on the time requirement of the relational graph and the timestamp of at least one matching semantic vector; search an external knowledge base and obtain external knowledge base search results when the knowledge coverage is lower than the coverage threshold, or the time error exceeds the time window threshold, or the complexity score exceeds the second complexity score threshold; and update the time-aware enhanced SQL statement based on the external knowledge base search results.

[0139] Figure 6 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 6 As shown, the electronic device may include a processor 610, a communications interface 620, a memory 630, and a communication bus 640. The processor 610, communications interface 620, and memory 630 communicate with each other via the communication bus 640. The processor 610 can call logical instructions in the memory 630 to execute a method for generating SQL statements. This method includes: determining the target cognitive group of the querying user; inputting the querying user's natural language into the target semantic cognitive frequency network of the target cognitive group to obtain the target structured semantic content output by the target semantic cognitive frequency network; determining the complexity score corresponding to the target structured semantic content; selecting a matching parser based on the complexity score; parsing the target structured semantic content based on the matching parser to obtain a relationship graph; and retrieving a local knowledge base based on the knowledge and time requirements of the relationship graph to obtain a time-aware enhanced SQL statement corresponding to the natural language.

[0140] Furthermore, the logical instructions in the aforementioned memory 630 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0141] On the other hand, this application also provides a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer can execute the SQL statement generation method provided by the above methods. The method includes: determining the target cognitive group of the querying user; inputting the natural language of the querying user into the target semantic cognitive frequency network of the target cognitive group to obtain the target structured semantic content output by the target semantic cognitive frequency network; determining the complexity score corresponding to the target structured semantic content; selecting a matching parser based on the complexity score; parsing the target structured semantic content based on the matching parser to obtain a relationship graph; and retrieving a local knowledge base based on the knowledge requirements and time requirements of the relationship graph to obtain a time-aware enhanced SQL statement corresponding to the natural language.

[0142] Furthermore, this application also provides a non-transitory computer-readable storage medium storing a computer program thereon. When executed by a processor, the computer program implements the SQL statement generation method provided by the above-described methods. This method includes: determining the target cognitive group of the querying user; inputting the natural language of the querying user into the target semantic cognitive frequency network of the target cognitive group to obtain the target structured semantic content output by the target semantic cognitive frequency network; determining the complexity score corresponding to the target structured semantic content; selecting a matching parser based on the complexity score; parsing the target structured semantic content based on the matching parser to obtain a relational graph; and retrieving a local knowledge base based on the knowledge and time requirements of the relational graph to obtain a time-aware enhanced SQL statement corresponding to the natural language.

[0143] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0144] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.

[0145] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application.

Claims

1. A method for generating SQL statements, characterized in that, include: Identify the target audience of the users making the query; The natural language of the querying user is input into the target semantic cognitive frequency network of the target cognitive group, and the target structured semantic content output by the target semantic cognitive frequency network is obtained. Determine the complexity score corresponding to the target structured semantic content; The parser is selected based on the complexity score; Based on the matching parser, the target structured semantic content is parsed to obtain a relation graph; Based on the knowledge and time requirements of the relational graph, the local knowledge base is searched to obtain the time-aware enhanced SQL statement corresponding to the natural language.

2. The method for generating SQL statements according to claim 1, characterized in that, The determination of the target cognitive group of the querying user includes: Obtain the static attribute features and dynamic behavioral features of the queried user; Based on the static attribute features and the dynamic behavior features, a cognitive vector and its confidence level of the query user are generated. Compare the vector similarity between the cognitive vector of the query user and the cognitive vector of at least one cognitive group to obtain the cognitive group with the highest vector similarity; When the confidence level is greater than or equal to the confidence threshold, the cognitive group with the maximum vector similarity is taken as the target cognitive group; When the confidence level is less than the confidence threshold, the general cognitive group is taken as the target cognitive group.

3. The method for generating SQL statements according to claim 1, characterized in that, The target structured semantic content includes at least one query entity, and determining the complexity score corresponding to the target structured semantic content includes: Calculate the semantic similarity between each of the query entities and each abstract concept in the abstract concept library; When the semantic similarity of the query entity is greater than the semantic similarity threshold, the query entity is identified as a matching abstract concept; The complexity score corresponding to the target structured semantic content is calculated based on the number of matched abstract concepts, the number of association tables for matched abstract concepts, and the maximum reasoning depth for matched abstract concepts.

4. The method for generating SQL statements according to claim 1, characterized in that, The parser that selects matches based on the complexity score includes: When the complexity score exceeds the first complexity score threshold, the matching parser is determined to include a microscale relation parser, a mesoscale relation parser, and a macroscale relation parser. The parser based on the matching parses the target structured semantic content to obtain a relation graph, including: Based on the macro-scale relationship parser, the matching abstract concept is decomposed to obtain the decomposition result; Based on the mesoscale relation parser, the decomposition results are mapped to business logic to obtain the business logic mapping results. Based on the microscale relationship parser, the table fields of the business logic mapping result, the query entity and query intent of the target structured semantic content are mapped to obtain the mapping relationship between the query entity and the query intent, so as to obtain the relationship graph.

5. The method for generating SQL statements according to claim 1, characterized in that, Based on the knowledge and time requirements of the relational graph, the local knowledge base is searched to obtain the time-aware enhanced SQL statement corresponding to the natural language, including: The knowledge requirements are converted into structured semantic vectors; Calculate the knowledge similarity between the structured semantic vector and each semantic vector in the local knowledge base; Calculate the temporal similarity between the time requirement and the temporal information of each semantic vector in the local knowledge base; The knowledge similarity and the time similarity are weighted and summed to obtain the demand score for each semantic vector; the demand score represents the matching of the knowledge demand and time demand between the semantic vector and the relationship graph. When the demand score of the semantic vector is greater than the set score, the semantic vector is identified as the matching semantic vector of the relationship graph; The time-aware enhanced SQL statement is obtained based on at least one of the matching semantic vectors.

6. The method for generating SQL statements according to claim 1, characterized in that, The target semantic cognitive frequency-coordinated network was trained based on the following method: The historical natural language of the target cognitive group is used as the sample natural language, and the historical structured semantic content of the target cognitive group is used as the structured semantic content tag; the structured semantic content tag includes intent tags and entity tags; The sample natural language is labeled according to the structured semantic content tags to obtain training samples carrying tags; The target semantic cognition frequency network is trained based on the training samples, the target loss function, and the target reward function to obtain the target semantic cognition frequency network. The target loss function is determined based on the intent recognition loss, entity recognition loss, and consistency loss of the preset network; the consistency loss is determined based on the consistency between the predicted intent and the predicted entity output by the preset network; and the target reward function is determined based on the accuracy of the predicted intent, the accuracy of the predicted entity, the consistency between the predicted intent and the predicted entity, and the performance improvement of the preset model.

7. The method for generating SQL statements according to claim 5, characterized in that, After obtaining the time-aware enhanced SQL statement based on at least one of the matching semantic vectors, the method further includes: The knowledge coverage of the time-aware enhanced SQL statement is determined based on the ratio of the number of entities in the target structured semantic content to the number of matching semantic vectors. Based on the time requirements of the relation graph and the timestamp of at least one matching semantic vector, the time error of the time-aware enhanced SQL statement is calculated. When the knowledge coverage is lower than the coverage threshold, or the time error exceeds the time window threshold, or the complexity score exceeds the second complexity score threshold, search the external knowledge base and obtain the external knowledge base search results. The time-aware enhanced SQL statement is updated based on the search results from the external knowledge base.

8. An apparatus for generating SQL statements, characterized in that, include: The first semantic understanding module is used to determine the target cognitive group of the querying user; The second semantic understanding module is used to input the natural language of the querying user into the target semantic cognitive frequency network of the target cognitive group, and obtain the target structured semantic content output by the target semantic cognitive frequency network. The first parsing module is used to determine the complexity score corresponding to the target structured semantic content; The second parsing module is used to select a matching parser based on the complexity score; The third parsing module is used to parse the target structured semantic content based on the matching parser to obtain a relation graph; The retrieval module is used to retrieve the time-aware enhanced SQL statement corresponding to the natural language from the local knowledge base based on the knowledge and time requirements of the relation graph.

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, When the processor executes the computer program, it implements the method for generating SQL statements as described in any one of claims 1 to 7.

10. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method for generating SQL statements as described in any one of claims 1 to 7.

11. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the method for generating SQL statements as described in any one of claims 1 to 7.