A visual system for interactive SQL generation and modification that integrates implicit knowledge

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By constructing an implicit knowledge base and a multi-view interactive interface, SQL code is mapped back to natural language form, solving the problems of low accuracy and poor interpretability in the existing technology of implicit knowledge processing, and realizing efficient and accurate SQL generation and correction.

CN122309601APending Publication Date: 2026-06-30ZHEJIANG UNIV +1

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: ZHEJIANG UNIV
Filing Date: 2026-03-05
Publication Date: 2026-06-30

Application Information

Patent Timeline

05 Mar 2026

Application

30 Jun 2026

Publication

CN122309601A

IPC: G06F16/26; G06F16/242; G06N3/0455; G06F16/36; G06N5/022; G06N5/04

AI Tagging

Technology Topics

Code generation Theoretical computer science

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies have low accuracy and poor interpretability when processing implicit knowledge in users' natural language commands, and interactive correction tools lack precise editing methods for specific knowledge points, making it difficult to effectively utilize domain conventions in historical data.

Method used

The implicit knowledge base module extracts and categorizes implicit knowledge items from historical SQL scripts. Combined with a multi-view interactive interface, it maps SQL code back to natural language form, allowing users to review and correct it at the knowledge level. It also generates and updates SQL scripts through a large language model.

Benefits of technology

It significantly improves the accuracy and efficiency of SQL writing, allowing users to edit precisely without delving into the underlying code, reducing the burden of writing detailed suggestions, and improving interpretability and interaction efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122309601A_ABST

Patent Text Reader

Abstract

This invention provides a visual system for interactive SQL generation and modification that integrates implicit knowledge, belonging to the field of interactive database query technology. By constructing a visual system including an implicit knowledge base building module and an interactive generation and modification module, implicit knowledge items are automatically extracted and categorized from historical SQL scripts to build an implicit knowledge base, enhancing the code generation capabilities of LLM (Low-Level Management). Simultaneously, a multi-view interactive interface is provided, mapping the generated SQL code back to implicit knowledge items in natural language form, supporting user review and correction at the knowledge level rather than just the code level. This invention enhances the accuracy of SQL generation by reusing historical implicit knowledge and achieves knowledge-level understanding and correction through multi-view interaction. It solves the problems of low accuracy, poor interpretability, and difficulty in correction of existing NL-to-SQL tools when handling implicit knowledge, significantly improving the efficiency and accuracy of SQL writing and making it more practical.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of interactive database query technology, and in particular to a visual system for interactive SQL generation and modification that integrates implicit knowledge. Background Technology

[0002] As data-driven decision-making becomes increasingly prevalent across industries, SQL queries have become a core tool for data exploration. However, writing complex SQL statements presents a significant barrier to entry for many data professionals without a technical background.

[0003] Significant progress has been made in natural language SQL generation technology based on Large Language Models (LLM) on general benchmark tests. For example, invention patent CN120030036A discloses a visualization method driven by SQL generation based on LLM preference learning. This method includes acquiring input natural language query text and user query history; generating a user profile based on the user query history; parsing the natural language query text using a pre-trained LLM to obtain the table names of database tables related to the user's query intent and the field names corresponding to the query objects in the database tables; generating an initial structured query statement based on the user profile, the table names, and the field names corresponding to the query objects; performing preference optimization on the initial structured query statement based on the user's query history and user profile to obtain a final structured query statement; and visualizing the query results corresponding to the final structured query statement based on the user profile to obtain the visualization result. However, this method only visualizes the final query results, and its application in complex real-world business scenarios still faces significant challenges. In particular, users' natural language commands often contain a lot of unspoken "implicit knowledge," such as specific dataset naming conventions (e.g., the meaning of specific abbreviations) or task-specific computational logic (e.g., the calculation formula for specific metrics).

[0004] To address the above problems, existing technologies mainly fall into two categories: (1) End-to-end NL-to-SQL generation technology: This technology directly translates natural language instructions into SQL statements using large models or fine-tuned neural networks (such as context-based learning methods or self-correcting text-to-SQL decompositional context learning (DIN-SQL)). These methods rely excessively on the explicit expression of user instructions and ignore the domain knowledge contained in historical queries. When user instructions are ambiguous or contain specific business assumptions, the model often produces logical errors due to a lack of context (such as misunderstanding the meaning of fields or filtering conditions), and users find it difficult to detect such semantic deviations.

[0005] (2) Interactive SQL assistance and correction tools: These include visual code explanation tools (such as QueryVis) and dialogue-based iterative correction tools (such as SQLucid). The former aids understanding by graphically displaying the SQL syntax structure, while the latter allows users to adjust the results through natural language feedback. However, these tools mainly focus on the syntactic correctness or literal interpretation of the code, failing to establish an intuitive mapping between the underlying code logic and the user's high-level "implicit intent." When users discover errors, they still need to modify the underlying code or repeatedly try to modify the prompts, lacking precise and controllable editing methods for specific knowledge points (such as specific filtering conditions or calculation rules).

[0006] Therefore, existing technologies have significant shortcomings in "knowledge utilization" (how to automatically capture implicit domain conventions from historical data) and "interaction efficiency" (how to support intuitive verification and correction at the knowledge level rather than the code level). There is an urgent need for an SQL generation system and method that can combine historical scripts to mine implicit knowledge and support interactive alignment, review, and targeted editing. Summary of the Invention

[0007] The purpose of this invention is to provide a visual system for interactive SQL generation and modification that integrates implicit knowledge. By constructing a visual system comprising an implicit knowledge base building module and an interactive generation and modification module, implicit knowledge items are automatically extracted and categorized from historical SQL scripts to build an implicit knowledge base, thereby enhancing the code generation capabilities of LLM (Low-Low Memory Management). Simultaneously, a multi-view interactive interface is provided, mapping the generated SQL code back to implicit knowledge items in natural language form, supporting user review and correction at the knowledge level rather than just the code level. This solves the problems of low accuracy, poor interpretability, and difficulty in correction of existing NL-to-SQL tools when handling implicit knowledge, significantly improving the efficiency and accuracy of SQL writing and making it more practical.

[0008] To achieve the above-mentioned objectives, an embodiment provides a visual system for interactive SQL generation and modification that integrates implicit knowledge, comprising: The implicit knowledge base construction module is used to build a data dictionary, parse historical SQL scripts into code fragments, classify the code fragments according to preset implicit knowledge types, generate natural language descriptions for each code fragment using a large language model combined with the data dictionary, and vectorize the natural language descriptions and corresponding code fragments and store them in the implicit knowledge base. The interactive generation and modification module receives natural language query commands from the user and performs intent parsing. It retrieves matching implicit knowledge from the implicit knowledge base, injects the implicit knowledge, associated historical SQL scripts, and data dictionary into the large language model to generate an initial SQL script, parses the initial SQL script into a structured representation, and presents it in a multi-view format. In response to the user's editing operations on knowledge items in the multi-view, it automatically updates the corresponding code snippets according to the user's natural language correction commands and propagates the changes to other parts that depend on this code snippet. Finally, it generates the corrected SQL script and refreshes the multi-view display.

[0009] This invention automatically extracts implicit knowledge from historical scripts, establishing an intuitive mapping between underlying code logic and the user's high-level implicit intent. This ensures that SQL scripts are executable while also facilitating user understanding. Combined with an implicit knowledge base and a multi-view interactive interface, it supports users in reviewing and correcting at the knowledge level rather than just the code level, without requiring modifications to the underlying code or repeated attempts to change prompts. This allows for precise and controllable editing of specific knowledge points.

[0010] Preferably, in the implicit knowledge base construction module, the construction of the data dictionary includes: generating column descriptions based on the database schema and sample data; and supporting user-interactive supplementation of domain-specific convention descriptions in the data dictionary, including field meanings, abbreviation explanations, or specific calculation rules.

[0011] Preferably, in the implicit knowledge base construction module, the step of parsing historical SQL scripts into code fragments and classifying the code fragments according to preset implicit knowledge types includes: using an SQL parser to convert each historical SQL script into an abstract syntax tree; Extract independent code snippets based on the clause type of the abstract syntax tree, and classify each code snippet into the corresponding implicit knowledge type.

[0012] More preferably, the implicit knowledge types include: computational knowledge, conditional knowledge, relational knowledge, dimensional knowledge, and output knowledge; The computational knowledge corresponds to the expression in the SELECT or ORDER BY clause, and is used to reflect a specific calculation formula; The conditional knowledge corresponds to the filtering conditions in the WHERE or HAVING clause, and is used to reflect specific filtering logic; The relational knowledge corresponds to the join relationships between tables in the FROM-JOIN clause; The dimensional knowledge corresponds to the data aggregation granularity in the GROUP BY clause; The output knowledge corresponds to the result presentation method in the ORDER BY, LIMIT, or DISTINCT clause.

[0013] Preferably, in the interactive generation and modification module, a large language model is used to decompose the received natural language query instructions input by the user into keywords; Calculate the vector cosine similarity between the keywords and the natural language descriptions in the implicit knowledge base, and filter out knowledge fragments with similarity higher than a threshold; The search results are reordered using a large language model, and the most relevant implicit knowledge is selected for subsequent SQL script generation.

[0014] Preferably, the multiple views include: The script view is used to visualize the structure of SQL scripts in the form of a data flow diagram, including input tables, subqueries and their dependencies; and supports highlighting the code snippets corresponding to the selected knowledge items in response to user actions in the knowledge view. The knowledge view is used to break down SQL scripts into several implicit knowledge items and display them in natural language. Each knowledge item is associated with a corresponding code snippet and execution metadata. It also supports responding to user clicks on knowledge items by highlighting the corresponding code snippet in the script view and dynamically loading and displaying the intermediate execution results under the constraints of this knowledge item in the data view. The data view is used to display the complete query results and supports automatically switching to display the intermediate execution results of the corresponding subquery or knowledge item in response to the user's operation of selecting a subquery node in the script view or selecting a knowledge item in the knowledge view.

[0015] Preferably, the script view constructs a subquery dependency graph based on a recursive parsing abstract syntax tree, and calculates and displays intermediate execution results in the order of dependencies to ensure the correct execution order of intermediate execution results in the data view.

[0016] Preferably, the execution metadata associated with each knowledge item in the knowledge view includes: For computational knowledge, sample values are displayed; For conditional knowledge, display the number of rows before and after filtering; For relational knowledge, display the specifications of the results table; For dimensional knowledge, the results after grouping are displayed; For the output knowledge, display a preview of the results.

[0017] Preferably, the data view adopts an asynchronous loading strategy. When the user switches to view different subqueries or the final result, the background executes the query in the order of dependency and returns the result to ensure the smoothness of the interface interaction.

[0018] On the other hand, the present invention also provides a visual method for interactive SQL generation and modification that integrates implicit knowledge, comprising the following steps: Step 1: Construct a data dictionary, parse historical SQL scripts into code fragments, classify the code fragments according to preset implicit knowledge types, use a large language model combined with the data dictionary to generate natural language descriptions for each code fragment, and vectorize the natural language descriptions and corresponding code fragments and store them in the implicit knowledge base. Step 2: Receive the user's natural language query command and perform intent parsing; retrieve matching implicit knowledge from the implicit knowledge base; inject the implicit knowledge, associated historical SQL scripts, and data dictionary into the large language model to generate an initial SQL script; parse the initial SQL script into a structured representation and present it through multiple views; respond to the user's operations on knowledge items in the multiple views, automatically update the corresponding code snippets according to the user's natural language correction commands, and propagate the changes to other parts that depend on this code snippet; finally, generate the corrected SQL script and refresh the multi-view display.

[0019] Compared with the prior art, the beneficial effects of the present invention include at least the following: (1) Implicit knowledge enhancement: By automatically extracting and reusing historical SQL scripts, code fragments are classified according to preset implicit knowledge types, so that the generated SQL code can accurately reflect the domain conventions and calculation logic not explicitly stated by the user, significantly reducing the burden on users to write detailed prompts.

[0020] (2) Enhanced interpretability and alignment: By transforming obscure SQL code snippets into knowledge items in natural language form through a visual knowledge view, and providing intermediate result previews, users can quickly understand how the model translates its intent, effectively bridging the knowledge alignment gap.

[0021] (3) Efficient iterative correction: It supports direct editing at the knowledge level, allowing users to accurately correct erroneous assumptions in the model without delving into the underlying SQL syntax. User research shows that the time to complete SQL writing tasks using this system is significantly shorter than that of traditional baseline tools, and the user's perceived workload is also lower. Attached Figure Description

[0022] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the accompanying drawings used in the description of the embodiments or the prior art will be briefly introduced below.

[0023] Figure 1 A schematic diagram of the structure of a visual system for interactive SQL generation and modification that incorporates implicit knowledge.

[0024] Figure 2 A visual interface for building a data dictionary.

[0025] Figure 3 A visual interface for interactive generation and modification modules.

[0026] Figure 4 This is a script view.

[0027] Figure 5 This is a knowledge view.

[0028] Figure 6 For data view. Detailed Implementation

[0029] To make the objectives, technical solutions, and advantages of this invention clearer, the following description is provided in conjunction with the accompanying drawings and... The embodiments further illustrate the present invention in detail. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the scope of protection of the invention.

[0030] The inventive concept of this invention addresses the shortcomings of existing technologies in terms of "knowledge utilization" (how to automatically capture implicit domain conventions from historical data) and "interaction efficiency" (how to support intuitive verification and correction at the knowledge level rather than the code level). The embodiment provides a visual system for interactive SQL generation and modification that integrates implicit knowledge. By constructing a visual system including an implicit knowledge base building module and an interactive generation and modification module, implicit knowledge items are automatically extracted and categorized from historical SQL scripts to build an implicit knowledge base, enhancing the code generation capabilities of LLM. Simultaneously, a multi-view interactive interface is provided, mapping the generated SQL code back to implicit knowledge items in natural language form, supporting user review and correction at the knowledge level rather than just the code level. This solves the problems of low accuracy, poor interpretability, and difficulty in correction of existing NL-to-SQL tools when processing implicit knowledge, significantly improving the efficiency and accuracy of SQL writing and making it more practical.

[0031] like Figure 1 As shown in the embodiment, the interactive SQL generation and modification visual system that integrates implicit knowledge includes an implicit knowledge base construction module and an interactive generation and modification module.

[0032] 1. Implicit Knowledge Base Construction Module: Data dictionary construction: Generate column descriptions based on database schema and sample data, and support user interactive supplementation of dataset-specific conventions (such as the meaning of specific abbreviations).

[0033] Historical script parsing: Using an abstract syntax tree (AST), the SQL scripts written by users in the past are parsed into code fragments.

[0034] Knowledge Classification: The parsed code snippets are categorized into five types of implicit knowledge: Calculation knowledge: This refers to expressions in the SELECT or ORDER BY clause that reflect specific calculation formulas.

[0035] Condition knowledge: This refers to the filtering conditions in the WHERE or HAVING clauses, reflecting specific filtering logic.

[0036] Relation knowledge: Involves the FROM-JOIN clause, reflecting the connection relationships between tables.

[0037] Dimensional knowledge: Involves the GROUP BY clause and reflects the granularity of data aggregation.

[0038] Output knowledge: This involves ORDER BY / LIMIT / DISTINCT clauses and reflects how the results are presented.

[0039] Natural Language Description Generation: LLM combined with a data dictionary is used to generate natural language descriptions for each code snippet and complete script, forming an implicit knowledge base.

[0040] 2. Interactive generation and modification module: 2.1 Online Knowledge Retrieval and Code Generation: Intent matching: When a user enters a natural language query command, the system retrieves historical script-level and fragment-level knowledge related to the semantics of the command by calculating cosine similarity.

[0041] Keyword decomposition and reordering: User commands are decomposed into keywords, relevant knowledge fragments are matched, and LLM is used to filter and reorder the search results to select the most relevant implicit knowledge.

[0042] Injection generation: The selected implicit knowledge, relevant historical scripts, and data dictionary are injected into the prompts of the LLM to guide the LLM to generate SQL scripts that conform to the user's implicit intent.

[0043] 2.2 Multi-view interactive generation and modification: (1) Script View: The generated SQL script structure is visualized using a data flow graph. It displays the input table, subqueries, and their dependencies. It supports implicit knowledge contained in subqueries and shows the intermediate execution results of each step. A subquery dependency graph is constructed by recursively parsing the AST, ensuring the correct execution order of intermediate result calculations.

[0044] (2) Knowledge View: Knowledge Explicitization: The generated SQL code is parsed into the five categories of implicit knowledge items mentioned above and displayed in a flat hierarchical list.

[0045] Metadata Display: Provides metadata-assisted validation for each knowledge item. For example, it displays sample values for calculated knowledge, displays the number of rows before and after filtering for conditional knowledge, and outputs a preview of the knowledge display results.

[0046] Two-way linkage: Clicking on a knowledge item will highlight the corresponding code in the script view and display the intermediate data results under the constraints of that knowledge item in the data view.

[0047] Users can directly perform "modify," "add," or "delete" operations on knowledge items. When a user selects a specific knowledge item and enters a natural language correction command, the system automatically retrieves relevant knowledge and updates the corresponding SQL code snippet, while automatically propagating changes to maintain the syntactic integrity of the query.

[0048] (3) Data View: Display the complete query results or the intermediate execution results of the selected knowledge item / subquery, using an asynchronous loading strategy to ensure smooth interaction.

[0049] To clearly demonstrate the operation of the interactive SQL generation and modification visual system integrating implicit knowledge provided by this invention, the following uses a toxicology database and its corresponding historical query logs as an example to explain the best implementation of this invention in detail. This embodiment shows how the system helps users complete the task of querying "the number of non-carcinogenic molecules and carcinogenic molecules with the least common elements" using natural language.

[0050] like Figure 1 As shown, the system workflow in this embodiment includes two main stages: an implicit knowledge base construction module and an interactive generation and modification module.

[0051] I. Implicit Knowledge Base Construction Module Before users perform interactive queries, the system first preprocesses the target database and historical query scripts to build an implicit knowledge base.

[0052] Step 1: Data dictionary construction and enhancement. The system reads the schema information of the toxicology database, which includes four tables: Atom, Molecule, Bond, and Connected. The system first generates preliminary column descriptions based on the table structure and sample data.

[0053] Interactive supplement: To capture domain-specific implicit conventions, users can do so through the data dictionary interface ( Figure 2Fine-tuning is then performed. For example, for the label column in the Molecule table, the system combines sample data ('+' / '-') and aliases appearing in historical queries (such as flag_carcinogenic) to generate a description: "Indicates carcinogenicityclassification. '+' indicates carcinogenic, '-' indicates non-carcinogenic." This description is saved and serves as the basis for subsequent LLM interpretation of the field's meaning.

[0054] Step 2: Historical SQL Script Parsing and Knowledge Classification Based on Abstract Syntax Tree (AST). The system reads a collection of high-quality SQL scripts written by the user throughout history. To achieve fine-grained knowledge reuse, the system uses the SQLGlot parser to parse each historical SQL script into an AST and decomposes it into independent code fragments based on the SQL clause type. The system classifies these fragments into five implicit knowledge types: computational knowledge, conditional knowledge, relational knowledge, dimensional knowledge, and output knowledge.

[0055] Step 3: Knowledge Description Generation and Vectorization System uses LLM to generate natural language descriptions (i.e., "implicit knowledge") for each code snippet extracted in Step 2, using the data dictionary from Step 1 as context.

[0056] The system uses the Sentence Transformers model to transform these natural language descriptions, corresponding code snippets, and the original complete scripts into vector embeddings, which are then stored in a vector database to build an implicit knowledge base.

[0057] II. Interactive Generation and Modification Module This section describes how users can use the system to complete a query: "Show me the number of non-carcinogenic molecules and the number of carcinogenic molecules with the least common elements."

[0058] Step 4: User Command Parsing and Knowledge Retrieval. The user enters the above natural language command in the system input box ( Figure 3 The system performs the following operations: Keyword decomposition: Using LLM, complex instructions are decomposed into key phrases to extract "non-carcinogenic molecules" and "least common elements".

[0059] Hybrid retrieval: The system calculates the cosine similarity between keywords and descriptions in the knowledge base.

[0060] For “non-carcinogenic”, the system retrieved the historical knowledge description “Filter for non-carcinogenic molecules” and its corresponding code WHERE label = '-'.

[0061] For “least common elements”, the system retrieves the logic defined in the historical script: it usually involves joining the Atom and Molecule tables, counting by grouping elements, and taking the one with the smallest count (the code snippet contains ORDER BY count(...) ASC LIMIT 1).

[0062] Code generation: The system assembles the retrieved knowledge fragments, relevant historical scripts, and data dictionary into a prompt and injects it into the LLM to generate the initial SQL script.

[0063] Step 5: Visual presentation and script review. The generated initial SQL script is parsed by the system and presented visually on the interface: Script View ( Figure 4 The system recursively parses the generated SQL's Abstract Syntax Tree (AST) to construct a subquery dependency graph, and then uses a data flow graph to visually present the overall structure of the SQL script. This data flow graph unfolds according to the logical execution order from left to right. Leftmost (Source Start): Displays the underlying input table nodes (e.g., the basic physical tables like Atom and Molecule in this example), representing the initial source of the data. Middle Section (Subqueries): Extends to the right, displaying intermediate subquery nodes at various levels. These nodes represent intermediate data processing steps (such as filtering, aggregation, table joins, etc.). Rightmost (Output End): Displays the final query result aggregation node. Directed connections between nodes clearly demonstrate the data flow and hierarchical dependencies, ensuring the correct execution order of intermediate result calculations and helping users quickly understand the complex SQL logic.

[0064] Knowledge View ( Figure 5When a user clicks on the "non_carci_mol" subquery card, the knowledge view on the right expands, explicitly listing the implicit knowledge behind that subquery. This is visualized as a flat, hierarchical list. Simultaneously, the knowledge view displays execution metadata (such as sample values, row counts before and after filtering, etc.) for each knowledge item to assist the user in verification.

[0065] Data View ( Figure 6 User click Figure 4 When a subquery is executed, the SQL query result of the subquery will be selected simultaneously.

[0066] Verification: The user sees a conditional knowledge item titled "Non-carcinogenic." Below it, metadata is displayed: "Filtered 9111 rows." Clicking this item highlights the code `WHERE T2.label = '-'` in the script view. Based on this, the user confirms that the model correctly understands the definition of "non-carcinogenic."

[0067] Step 6: Problem Identification and Interactive Correction. The user views the final result in the data view and finds that the result is displayed as (1, 0), meaning that only one non-carcinogenic molecule was counted, which is inconsistent with the user's domain knowledge (it is expected that there should be multiple).

[0068] Location error: The user clicked on the upstream subquery node least_com_el. In the knowledge view, the user noticed an output knowledge item: "Order, Limit - Order by cnt and limit to 1 result". Metadata shows that this step only output 1 record (i.e., only the least common element was selected).

[0069] Cause analysis: This is because the system retrieved and reused the logic of "finding a least common element" from the historical script (which included LIMIT 1), while the context of the current task implicitly requires considering all the least common elements in the same category.

[0070] The user selects the "Order, Limit" knowledge item in the knowledge view, switches to "Modify" mode, and enters the natural language correction command in the input box: "Consider multiple least common elements".

[0071] Step 7: Change Propagation and Result Update After receiving the correction instruction, the system performs the following processing: Logical update: The system retrieves new logic or uses LLM inference to replace the original ORDER BY ... LIMIT 1 with ranking-based filtering logic (such as using the RANK() window function or HAVING count = (SELECT MIN(count)...)), thereby removing the hard quantity limit.

[0072] Change propagation: The system automatically updates the AST and regenerates the code for all downstream subqueries (non_carci_mol and carci_mol) that depend on least_com_el, ensuring the correctness of variable references and table join syntax.

[0073] Result verification: After refreshing the interface, the "Order, Limit" items in the knowledge view disappeared, replaced by the new filtering logic. The data view calculated and updated the final result to (4, 3) in real time, which met the user's expectations.

[0074] Through the above implementation methods, this system realizes the transformation of fuzzy natural language instructions into precise SQL queries, and through a visual knowledge alignment mechanism, enables users to review the model's inference process in a "white-box" manner and directly correct errors at the knowledge level, thereby efficiently generating high-quality SQL scripts.

[0075] The specific embodiments described above illustrate the technical solution and beneficial effects of the present invention in detail. It should be understood that the above description is only the most preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, additions, and equivalent substitutions made within the scope of the principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A visual system for interactive SQL generation and modification that integrates implicit knowledge, characterized in that, include: The implicit knowledge base construction module is used to build a data dictionary, parse historical SQL scripts into code fragments, classify the code fragments according to preset implicit knowledge types, generate natural language descriptions for each code fragment using a large language model combined with the data dictionary, and vectorize the natural language descriptions and corresponding code fragments and store them in the implicit knowledge base. The interactive generation and modification module is used to receive natural language query commands input by users and perform intent parsing, retrieve matching implicit knowledge from the implicit knowledge base, inject implicit knowledge, associated historical SQL scripts and data dictionaries into the large language model, and generate the initial SQL script. The initial SQL script is parsed into a structured representation and presented in a multi-view format. In response to the user's editing operations on knowledge items in the multi-view, the corresponding code snippets are automatically updated according to the user's natural language correction instructions, and the changes are propagated to other parts that depend on this code snippet. Finally, the corrected SQL script is generated and the multi-view display is refreshed.

2. The interactive SQL generation and modification visual system integrating implicit knowledge according to claim 1, characterized in that, The implicit knowledge base construction module includes building a data dictionary, which involves generating column descriptions based on the database schema and sample data, and supporting user-interactive supplementation of domain-specific convention descriptions in the data dictionary, including field meanings, abbreviation explanations, or specific calculation rules.

3. The interactive SQL generation and modification visual system integrating implicit knowledge according to claim 1, characterized in that, In the implicit knowledge base construction module, the process of parsing historical SQL scripts into code fragments and classifying the code fragments according to preset implicit knowledge types includes: using an SQL parser to convert each historical SQL script into an abstract syntax tree; Extract independent code snippets based on the clause type of the abstract syntax tree, and classify each code snippet into the corresponding implicit knowledge type.

4. The interactive SQL generation and modification visual system integrating implicit knowledge according to claim 3, characterized in that, The implicit knowledge types mentioned include: computational knowledge, conditional knowledge, relational knowledge, dimensional knowledge, and output knowledge; The computational knowledge corresponds to the expression in the SELECT or ORDER BY clause, and is used to reflect a specific calculation formula; The conditional knowledge corresponds to the filtering conditions in the WHERE or HAVING clause, and is used to reflect specific filtering logic; The relational knowledge corresponds to the join relationships between tables in the FROM-JOIN clause; The dimensional knowledge corresponds to the data aggregation granularity in the GROUP BY clause; The output knowledge corresponds to the result presentation method in the ORDER BY, LIMIT, or DISTINCT clause.

5. The interactive SQL generation and modification visual system integrating implicit knowledge according to claim 1, characterized in that, In the interactive generation and modification module, a large language model is used to decompose the natural language query commands input by the user into keywords; Calculate the vector cosine similarity between the keywords and the natural language descriptions in the implicit knowledge base, and filter out knowledge fragments with similarity higher than a threshold; The search results are reordered using a large language model, and the most relevant implicit knowledge is selected for subsequent SQL script generation.

6. The visual system for interactive SQL generation and modification incorporating implicit knowledge according to claim 1, characterized in that, The multiple views include: The script view is used to visualize the structure of SQL scripts in the form of a data flow diagram, including input tables, subqueries and their dependencies; and supports highlighting the code snippets corresponding to the selected knowledge items in response to user actions in the knowledge view. The knowledge view is used to break down SQL scripts into several implicit knowledge items and display them in natural language. Each knowledge item is associated with a corresponding code snippet and execution metadata. It also supports responding to user clicks on knowledge items by highlighting the corresponding code snippet in the script view and dynamically loading and displaying the intermediate execution results under the constraints of this knowledge item in the data view. The data view is used to display the complete query results and supports automatically switching to display the intermediate execution results of the corresponding subquery or knowledge item in response to the user's operation of selecting a subquery node in the script view or selecting a knowledge item in the knowledge view.

7. The interactive SQL generation and modification visual system integrating implicit knowledge according to claim 6, characterized in that, The script view constructs a subquery dependency graph based on a recursive parsing abstract syntax tree, and calculates and displays intermediate execution results in the order of dependencies to ensure the correct execution order of intermediate execution results in the data view.

8. The interactive SQL generation and modification visual system integrating implicit knowledge according to claim 6, characterized in that, The execution metadata associated with each knowledge item in the knowledge view includes: For computational knowledge, sample values are displayed; For conditional knowledge, display the number of rows before and after filtering; For relational knowledge, display the specifications of the results table; For dimensional knowledge, the results after grouping are displayed; For the output knowledge, display a preview of the results.

9. The interactive SQL generation and modification visual system integrating implicit knowledge according to claim 1, characterized in that, The data view uses an asynchronous loading strategy. When a user switches between different subqueries or the final result, the background executes the query in the order of dependency and returns the result to ensure smooth interface interaction.

10. A visual method for interactive SQL generation and modification that integrates implicit knowledge, characterized in that, The visual system for interactive SQL generation and modification based on implicit knowledge as described in any one of claims 1 to 9 includes the following steps: Step 1: Construct a data dictionary, parse historical SQL scripts into code fragments, classify the code fragments according to preset implicit knowledge types, use a large language model combined with the data dictionary to generate natural language descriptions for each code fragment, and vectorize the natural language descriptions and corresponding code fragments and store them in the implicit knowledge base. Step 2: Receive the user's natural language query command and perform intent parsing; retrieve matching implicit knowledge from the implicit knowledge base; inject the implicit knowledge, associated historical SQL scripts, and data dictionary into the large language model to generate an initial SQL script; parse the initial SQL script into a structured representation and present it through multiple views; respond to the user's operations on knowledge items in the multiple views, automatically update the corresponding code snippets according to the user's natural language correction commands, and propagate the changes to other parts that depend on this code snippet; finally, generate the corrected SQL script and refresh the multi-view display.

Citation Information

Patent Citations

Visualization method driven by SQL generation based on large model preference learning
CN120030036A

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

Visualization method driven by SQL generation based on large model preference learning