A method for iterative optimization of SQL query based on natural language intent constraint
By using structured intent parsing and iterative optimization methods, natural language queries are converted into SQL statements, which solves the problems of semantic understanding bias and inconsistent generated results in Chinese scenarios, and achieves interpretability and accuracy in the SQL generation process.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HUNAN UNIV
- Filing Date
- 2026-03-24
- Publication Date
- 2026-06-23
AI Technical Summary
Existing Text2SQL methods suffer from semantic comprehension biases, lack of explicit semantic intermediate representations in the generation process, difficulty in verifying consistency of generated results, and a lack of targeted error correction in Chinese query scenarios.
Natural language queries are converted into structured intent representations through structured intent parsing, candidate SQL statements are generated and executed, consistency checks are performed, deviations are identified and iterative optimization is carried out until SQL statements that match the user's intent are generated.
It improves the interpretability and accuracy of the SQL generation process, ensures that the generated results are consistent with the user's intent, and reduces the blindness of error correction.
Smart Images

Figure CN122262284A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of natural language processing and database query optimization technology, specifically involving a Text-to-SQL iterative optimization method for converting natural language queries into Structured Query Language (SQL). Background Technology
[0002] With the development of information technology, relational database systems have become a crucial infrastructure for data storage and management across various industries, widely used in finance, healthcare, education, e-commerce, and government. Structured Query Language (SQL), as the standard query language for accessing and manipulating relational databases, enables the retrieval, updating, and management of database data. However, SQL has a relatively strict syntax and logical rules, typically requiring users to possess certain database knowledge and programming skills, which presents a significant barrier to entry for business personnel lacking a technical background.
[0003] To reduce the difficulty of using databases, researchers have proposed Text-to-SQL (Text2SQL) technology. This technology aims to automatically convert natural language queries into SQL statements, allowing users to describe their data query needs in natural language, and the system to automatically generate executable SQL statements, thereby improving the convenience and efficiency of database queries.
[0004] In recent years, with the development of deep learning technology, neural network models based on the sequence-to-sequence learning framework have been introduced into the Text2SQL task, achieving end-to-end generation of natural language to SQL statements through an encoder-decoder structure. With the development of Large Language Model (LLM) technology, the Text2SQL method has been further researched and applied in complex query processing and cross-domain applications.
[0005] However, existing deep learning-based Text2SQL methods still have certain limitations. For example, some end-to-end generation methods directly map natural language queries to SQL statements, lacking an explicit semantic intermediate representation layer, resulting in weak interpretability of the generation process; some methods based on large language models rely on probabilistic prediction mechanisms, and their generation process lacks verifiable semantic constraints; furthermore, some existing SQL correction methods only regenerate based on simple feedback such as SQL execution failure, making it difficult to accurately identify the specific type and location of errors. Additionally, some Text2SQL methods are primarily designed for English corpora, and may suffer from semantic comprehension biases or insufficient extraction of key information in Chinese query scenarios, thus affecting the accuracy of SQL generation. Summary of the Invention
[0006] To address the problems of existing Text2SQL technology in converting natural language queries into SQL statements, such as the lack of effective semantic constraints, difficulty in verifying the consistency of query results, and lack of targeted error correction, this invention provides an iterative optimization method for SQL queries based on natural language intent constraints.
[0007] The method includes the following steps: First, semantic parsing is performed on the input natural language query to construct a structured intent representation to express the user's query needs; then, candidate SQL statements are generated based on the structured intent representation and executed in the database to obtain query results; then, the query results are checked for consistency with the structured intent representation; when the consistency check finds a deviation in the query results, the candidate SQL statements are corrected and regenerated, thereby achieving iterative optimization of the SQL query.
[0008] 1. Structured Intent Parsing
[0009] This invention uses structured intent parsing to convert user-input natural language queries into a structured intent representation. This representation explicitly expresses the user's query object, query conditions, and related query requirements, and serves as an intermediate representation between natural language and SQL statements. The specific implementation steps are as follows:
[0010] 1.1. Text Preprocessing: The input natural language query is preprocessed, including word segmentation and part-of-speech tagging, so that the original query is divided into several words and the part-of-speech information of each word is obtained, providing basic data for subsequent analysis.
[0011] For example, if a user enters the query: "Query the names of students in the student table whose scores are greater than 80, and sort them by scores from highest to lowest", 1) the system first performs word segmentation on the query, breaking it down into several words, such as: query / student table / score / greater than / 80 points / student / name / sort 2) and records the part-of-speech information of each word, for example: "student table" is a noun, "greater than" is a comparison word, "80 points" is a numerical value, and "sort" is an action word;
[0012] 1.2. Query type identification: Determine the type of query statement based on the keywords appearing in the query statement.
[0013] (1) Search type: contains keywords such as "query", "find", "list", etc., and is used to extract specific field data from the database;
[0014] (2) Statistical type: contains keywords such as "statistics", "calculation", "sum" and "average", and is used to aggregate and calculate data;
[0015] (3) Comparison type: contains comparison words such as "greater than", "less than", "equal to", "highest", "lowest", etc., and is used for condition filtering and extreme value query.
[0016] In the query example in step 1.1: (1) the appearance of words such as "query" indicates that the statement belongs to the retrieval type query; (2) the appearance of "score greater than 80 points" indicates that the query contains comparative condition filtering; (3) the appearance of "sort" indicates that the query results contain sorting requirements. The system determines that the statement belongs to the data retrieval type query and contains condition filtering and sorting requirements by identifying the frequency of occurrence of various keywords.
[0017] 1.3. Semantic Element Extraction: After obtaining the word segmentation results of the query statement, the words and their part-of-speech information in the query statement are analyzed. Based on the nouns, attribute descriptors, and words indicating comparison or sorting relationships appearing in the query statement, information such as the query object, the data attributes to be returned, the query conditions, and the sorting requirements are determined from the query statement, and the above information is organized into semantic elements.
[0018]
[0019] 1.4. Database schema mapping: The extracted semantic elements are matched with the data structure in the database. By comparing the similarity between the words in the semantic elements and the database table names, field names and their related descriptions, the data table and related fields corresponding to the query object are determined.
[0020] In the structured intent representation in step 1.3: (1) the “Students table” is matched with the Students table in the database; (2) “Name” and “Score” are matched with the Name and Score fields in the Student table; (3) “Score greater than 80” corresponds to the filtering condition of the Score field in the database; (4) “Sort by score from high to low” corresponds to the descending sorting of the Score field. 1.5. Integrity and Consistency Verification: The generated structured intent representation is checked. First, it is checked whether the data tables and fields involved exist in the database. Second, it is checked whether the relationship between the query conditions, fields and data tables is reasonable.
[0021] 2. Candidate SQL Generation and Execution
[0022] The purpose of the candidate SQL generation and execution step is to generate executable SQL statements based on the structured intent representation obtained in the previous stage, and to obtain query results through actual execution, providing a basis for subsequent consistency verification. This stage mainly includes two steps: SQL generation and SQL execution.
[0023] 2.1. Candidate SQL Generation: The system generates candidate SQL statements based on the structured intent representation. First, it converts the structured intent representation into constraint descriptions that the model can understand. Then, it combines the original natural language query and database schema information to generate candidate SQL statements. During the generation process, the structure of the SQL statement is limited according to the structured intent representation, including the query objects, return fields, query conditions, and sorting methods, thereby reducing the generation of irrelevant or erroneous SQL statements.
[0024] To improve the stability of the generated results, the system can generate multiple candidate SQL statements, such as multiple SQL versions that differ in field aliases, sorting methods, or conditional expressions, for subsequent verification steps.
[0025]
[0026] 2.2. Candidate SQL Execution: The system executes the generated candidate SQL statements in the target database environment and collects the execution results. During execution, the following information is mainly recorded: whether the SQL statement was successfully executed; the result data returned by the query; the field structure information of the returned results; and the execution time required for the SQL statement.
[0027] 3. Meaning Figure 1 Consistency check
[0028] meaning Figure 1The purpose of the consistency verification step is to compare the SQL execution result with the structured intent representation to determine whether the generated SQL statement truly meets the user's query requirements. Unlike traditional methods that only check the SQL syntax, this step performs multi-faceted checks on the query results to determine whether the semantics of the query results match the user's original intent. Verification mainly focuses on four aspects: returned fields, query conditions, sorting method, and the number of returned fields.
[0029] 3.1. Return Field Integrity Validation: Obtain field information from the returned data in the SQL execution result and compare it with the returned fields recorded in the Structured Intent representation. Check whether the number and names of the returned fields are consistent, and determine whether there are any missing or redundant fields.
[0030] Assuming the structured intent requires the returned field to be "name": (1) If the SQL execution result returns the field "name", then the field is considered to match; (2) If the returned result contains other fields or is missing the "name" field, then the field is considered to be inconsistent.
[0031] 3.2. Constraint Satisfaction Verification: This checks whether the query results satisfy the query conditions in the structured intent representation. The system can check the result data item by item or verify it through sampling to determine whether the data conforms to the conditional expression.
[0032] Assuming the query condition in the structured intent representation is "score greater than 80": (1) The system checks each record in the returned results and confirms that the value of its score field is greater than 80. (2) If there is a record whose score does not meet the condition, it is determined that the condition is not met.
[0033] 3.3. Sorting Correctness Verification: If the structured intent representation includes sorting requirements, the system checks whether the SQL execution results are arranged according to the specified fields and sorting method. The system determines whether the result data conforms to the specified sorting order by comparing the field values of adjacent data.
[0034] Assuming the sorting requirement in the structured intent representation is "sort in descending order according to the score field": (1) The system checks the score values of adjacent records in the returned results and confirms that the data is arranged in descending order of scores; (2) If the order is found to be inconsistent, it is determined that the sorting is inconsistent.
[0035] 3.4. Return Quantity Accuracy Verification: Count the number of records in the SQL execution result and compare it with the return quantity requirement in the Structured Intent representation. If the result quantity does not match the requirement, it is determined as a quantity mismatch.
[0036] For example: (1) When the query requests the return of the first N records, the system checks whether the number of records returned is N; (2) When the query requests the return of all data that meet the conditions, the system checks whether the results have been truncated.
[0037] Finally, the system synthesizes the results of the above checks and performs a consistency judgment on the candidate SQL statement: if all checks pass, the SQL statement is considered to be consistent with the user's query intent. Figure 1 If the result is consistent with the previous result, it will be output as a valid result; if there is any inconsistency, the corresponding deviation information will be recorded and used as input for subsequent SQL correction steps.
[0038] 4. Deviation Identification and Iterative Correction
[0039] The goal of deviation identification and iterative correction is to identify deviations in the query results and make targeted corrections to gradually optimize the query results, ensuring they align with the user's query intent. This step receives the deviation set output from the consistency verification phase, identifies the specific deviation types, generates correction constraints, and then triggers the next round of SQL generation, forming a closed-loop optimization.
[0040] 4.1. Correction Constraint Generation: Receive the deviation set generated in step 3, identify the specific deviation type based on the verification dimension, and generate corresponding correction constraints. Common deviation types and corresponding correction methods include:
[0041] (1) Field-related deviations: When a SQL return field is missing or redundant, constraints are generated to force inclusion or exclusion to ensure that the returned field is consistent with the structured intent representation. For example, if the "Name" field is missing, it is required to include the field in the next round of SQL.
[0042] (2) Constraint-related deviations: When query conditions are missing or do not meet the requirements, mandatory predicate constraints or operator-corrected constraints are generated to ensure that the query results meet the conditions. For example, if the condition "score greater than 80" is missing, the system will add this condition to the next round of SQL.
[0043] (3) Sorting requirement related deviations: When the SQL is not executed according to the specified sorting method, a forced sorting constraint is generated to ensure that the sorting is performed according to the correct fields and order. For example, if the sorting field "score" is missing, the system will force the addition of the sorting condition in the next round of SQL.
[0044] (4) Return quantity related deviation: When the number of records returned by the SQL exceeds the quantity limit in the structured intent representation, a quantity constraint is generated, requiring the next round of SQL statements to add the corresponding quantity limit conditions.
[0045] The generated corrected constraints will be appended to the input of the next round of SQL generation, taking precedence over the original query intent constraints, to ensure that errors are corrected rather than simply retried.
[0046] 4.2. SQL Iterative Generation: After generating the modified constraints, the system takes the modified constraints and the original structured intent representation as input and re-enters the candidate SQL generation process in step 2 to generate new candidate SQL statements.
[0047] Subsequently, the system executes the generated SQL statement again, and follows the instructions in step 3. Figure 1 The consistency verification method verifies the query results: if the verification passes, the current SQL statement is output as the final query result; if the verification fails, the corrective constraints are generated based on the new deviation information, and the above process is repeated until the verification passes or the preset maximum number of iterations is reached. Attached Figure Description
[0048] Figure 1 A schematic diagram of the overall process of an iterative optimization method for SQL queries based on natural language intent constraints.
[0049] Figure 2 A flowchart illustrating the sub-processes of the structured intent parsing steps.
[0050] Figure 3 . meaning Figure 1 Logic diagram of consistency verification and deviation identification Detailed Implementation
[0051] The hardware environment of this invention is mainly a PC host, configured with an AMD Ryzen 5 5600H processor (6 cores, 12 threads, 3.30 GHz), 16GB of memory, and a 64-bit Windows operating system.
[0052] The software is implemented on the Windows 11 platform and developed using the Python 3.10 programming language. The system utilizes the PyTorch deep learning framework and the Transformers model library for model inference, and incorporates a Chinese word segmentation tool for preprocessing natural language queries. Additionally, it generates SQL statements by calling a large language model via an API interface.
[0053] To illustrate the implementation process of the method of this invention, this embodiment selects the Spider Chinese Text-to-SQL dataset as example data. This dataset contains database schemas from multiple different domains and corresponding natural language query and SQL statement pairs, which can be used to simulate real database query scenarios.
[0054] Under the above conditions, the overall operation flow of the method of the present invention can be divided into two stages:
[0055] 1. Structured Intent Parsing Phase
[0056] In this stage, the system receives natural language queries input by the user and converts them into structured query intent representations through steps such as text preprocessing, query type recognition, semantic element extraction, and database schema mapping. These representations clearly express information such as the query object, return fields, query conditions, sorting requirements, and the number of returns.
[0057] Its pseudocode is as follows:
[0058]
[0059] 2. Iterative SQL Generation and Optimization Phase
[0060] In this phase, the system generates candidate SQL statements based on structured intents and executes them in the target database to obtain the query results. Subsequently, through intent... Figure 1 Consistency checks verify the consistency between the execution result and the structured intent. When a deviation is detected, the system identifies the deviation type and generates corrective constraints, triggering the next round of SQL generation, thereby continuously optimizing the generated result until an SQL statement that satisfies the query intent is obtained.
[0061] Its pseudocode is as follows:
[0062]
Claims
1. An iterative optimization method for SQL queries based on natural language intent constraints, characterized in that, The method includes: (1) Receive natural language queries input by users, perform semantic parsing on natural language queries, extract query objects, return fields, query conditions, sorting rules and return quantity limits, and construct a structured query intent representation; (2) Generate candidate SQL statements based on the structured query intent representation, and execute the candidate SQL statements in the target database to obtain query results; (3) Perform intent consistency verification between the query results and the structured query intent representation to determine whether the candidate SQL statement meets the user's query requirements; (4) When the intent consistency check fails, identify the type of deviation between the query results and the structured query intent, and generate corrective constraints based on the type of deviation; (5) Regenerate candidate SQL statements based on the modified constraints, and repeat the SQL generation, execution and consistency verification steps until the query results meet the structured query intent or the preset maximum number of iterations is reached.
2. The method according to claim 1, characterized in that, Step 1 includes at least the following: (1) Perform text preprocessing on natural language queries, including word segmentation and part-of-speech tagging; (2) Identify the query type based on the keywords in the query statement; (3) Extract semantic elements from natural language queries, including query objects, returned fields, query conditions, sorting rules, and return quantity limits; (4) Match semantic elements with data tables and fields in the database schema to generate a structured query intent representation.
3. The method according to claim 2, characterized in that, Matching semantic elements with database schemas includes: (1) By comparing the similarity between words in semantic elements and database table names, field names and their descriptive information, the data table corresponding to the query object and the fields involved in the query are determined; (2) Perform integrity and consistency verification on the generated structured query intent representation to ensure that the data tables and fields exist in the database and that the logical relationships are correct.
4. The method according to claim 1, characterized in that, Step 2 includes: (1) Convert the structured query intent representation into a constraint description; (2) Construct input prompts by combining the original natural language query, constraint description, and database schema information; (3) Input the input prompts into the SQL generation model to generate multiple candidate SQL statements.
5. The method according to claim 1, characterized in that, Step 3 includes: (1) Extract the returned field information from the SQL execution result and compare it with the returned fields in the structured query intent to perform a return field integrity check; (2) Perform condition expression validation on the data records in the query results to check the satisfaction of the query conditions; (3) Compare the field values of adjacent records according to the sorting field to verify the correctness of the sorting; (4) Count the number of records in the query results and verify the reasonableness of the returned number in combination with the return number limit.
6. The method according to claim 1, characterized in that, Step 4 includes: (1) Identify the deviation type based on the consistency check result. The deviation types include return field deviation, query condition deviation, sorting rule deviation, and return quantity deviation. (2) Generate corresponding correction constraints based on the deviation type to constrain the generation of SQL statements in the next round.
7. The method according to claim 1, characterized in that, When regenerating candidate SQL statements, the following are included: (1) The modified constraints and the structured query intent representation are used together as input to guide the SQL generation model to generate new candidate SQL statements. (2) SQL query optimization is achieved by iteratively executing the SQL generation, execution and consistency verification steps until the query result satisfies the structured intent representation or reaches the preset maximum number of iterations.