SQL statement rewriting method and apparatus, and computing device cluster

WO2026137848A1PCT designated stage Publication Date: 2026-07-02HUAWEI TECH CO LTD +1

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
HUAWEI TECH CO LTD
Filing Date
2025-07-31
Publication Date
2026-07-02

Smart Images

  • Figure CN2025111736_02072026_PF_FP_ABST
    Figure CN2025111736_02072026_PF_FP_ABST
Patent Text Reader

Abstract

An SQL statement rewriting method, comprising: acquiring a first SQL statement to be rewritten; on the basis of the similarities between the first SQL statement and a plurality of rewriting cases, selecting at least one first rewriting case from among the plurality of rewriting cases, wherein one rewriting case comprises an example SQL statement and a rewritten SQL statement obtained by rewriting the example SQL statement; on the basis of the similarity between the first SQL statement and the at least one first rewriting case and association relationships between the at least one first rewriting case and a plurality of rewriting rules, evaluating the plurality of rewriting rules to obtain respective evaluation results of the plurality of rewriting rules, wherein the rewriting rules are rules used for performing query rewriting on an SQL statement; and on the basis of at least one first rewriting rule among the plurality of rewriting rules, rewriting the first SQL statement to obtain a second SQL statement, wherein the at least one first rewriting rule is a rule, the evaluation result of which satisfies a requirement, among the plurality of rewriting rules. The method can improve the rewriting performance.
Need to check novelty before this filing date? Find Prior Art

Description

A method, apparatus, and computing device cluster for rewriting SQL statements

[0001] This application claims priority to Chinese Patent Application No. 2024119827185, filed on December 27, 2024, entitled "A Method, Apparatus and Computing Device Cluster for Rewriting SQL Statements", the entire contents of which are incorporated herein by reference. Technical Field

[0002] This application relates to the field of artificial intelligence (AI) technology, and in particular to a method, apparatus and computing device cluster for rewriting SQL statements. Background Technology

[0003] Query rewriting, also known as query transformation or query modification, is a technique for optimizing SQL statements. By rewriting SQL statements, complex statements can be transformed into more concise and easier-to-understand forms. This process not only simplifies the structure of the SQL statement but also reduces data scanning and resource consumption, thereby accelerating query execution. The key to query rewriting lies in the equivalent transformation of execution logic, ensuring that the rewritten SQL statement is functionally identical to the original statement, or, in some cases, further improves execution efficiency.

[0004] However, query rewriting requires a massive number of rewrite rules, and the potential combinations of these rules grow exponentially. This makes identifying valid rewrite rules from numerous combinations extremely challenging and time-consuming. Therefore, how to efficiently identify and apply valid rewrite rules is a pressing technical problem that needs to be solved in the field of SQL statement rewriting. Summary of the Invention

[0005] This application provides a method, apparatus, computing device cluster, computer storage medium, and computer product for rewriting SQL statements, which can efficiently identify and apply effective rewriting rules when rewriting SQL statements.

[0006] Firstly, this application provides an SQL statement rewriting method, comprising: obtaining a first SQL statement to be rewritten; selecting at least one first rewriting case from multiple rewriting cases based on the similarity between the first SQL statement and multiple rewriting cases, wherein a rewriting case includes an example SQL statement and a rewritten SQL statement after rewriting the example SQL statement; evaluating multiple rewriting rules based on the similarity between the first SQL statement and the at least one first rewriting case, and the association between the at least one first rewriting case and multiple rewriting rules, to obtain the evaluation results of each of the multiple rewriting rules, wherein a rewriting rule is a rule used to rewrite the SQL statement; rewriting the first SQL statement based on the at least one first rewriting rule from the multiple rewriting rules to obtain a second SQL statement, wherein the at least one first rewriting rule is a rule whose evaluation result among the multiple rewriting rules meets the requirements. For example, the at least one rewriting rule is the top N rewriting rules after sorting the evaluation results of the multiple rewriting rules, where N≥1, and N is a positive integer.

[0007] In this way, by fully utilizing rewrite cases to guide the selection of rewrite rules, the compatibility between the selected rewrite rules and the SQL statements to be rewritten is improved, thereby enhancing query rewrite performance. Furthermore, since rewrite cases provide abundant existing query rewrite knowledge, this knowledge can be fully utilized to further improve query rewrite performance. Moreover, rewrite rule selection can be performed in different scenarios without retraining the model, demonstrating high robustness.

[0008] In one possible implementation, the method further includes: obtaining a query rewriting strategy for the first SQL statement based on the first SQL statement and at least one first rewrite case, wherein the query rewriting strategy instructs the first SQL statement to be rewritten with reference to the first rewrite case. Then, rewriting the first SQL statement to obtain a second SQL statement based on at least one first rewrite rule from a plurality of rewrite rules includes: selecting a second rewrite rule from the first rewrite rules based on the query rewriting strategy; and rewriting the first SQL statement based on the second rewrite rule to obtain the second SQL statement. In this way, the query rewriting strategy explains how to rewrite SQL statements with reference to rewrite cases. Compared with the original rewrite cases, the query rewriting strategy is easier to understand, thus allowing for the selection of more accurate rewrite rules and improving query rewriting performance.

[0009] In one possible implementation, a query rewriting strategy for the first SQL statement is obtained based on the first SQL statement and at least one first rewrite case. This includes: adding the first SQL statement and at least one first rewrite case to a first prompt template to obtain a first prompt, which instructs the neural network model to generate a query rewriting strategy for rewriting the first SQL statement by referring to the first rewrite cases; and inputting the first prompt into the neural network model to obtain the query rewriting strategy. In this way, the query rewriting strategy can be obtained by guiding the neural network model to perform inference through the prompt.

[0010] In one possible implementation, based on a query rewriting strategy, a second rewriting rule is selected from the first rewriting rules. This includes: performing multiple batches of filtering on the first rewriting rules based on the query rewriting strategy. During any batch filtering process, the query rewriting strategy, the filtering results obtained before any previous batch, and the rules to be filtered in any given batch are input into a neural network model to obtain the filtering results for that batch. Since rewrite analysis is easier to understand, it can filter out more accurate rewriting rules, and batch filtering can also improve filtering efficiency.

[0011] In one possible implementation, the query rewriting strategy, the filtering results obtained before any batch, and the rules to be filtered in any batch are input into the neural network model. This includes: adding the query rewriting strategy, the filtering results obtained before any batch, and the rules to be filtered in any batch to a second prompt template to obtain a second prompt. The second prompt is used to instruct the neural network model to filter the rules to be filtered in any batch with reference to the query rewriting strategy; and inputting the second prompt into the neural network model to obtain the filtering results for any batch. In this way, by guiding the neural network model to perform inference through the prompt, the filtering results can be obtained.

[0012] In one possible implementation, the first SQL statement is rewritten based on a second rewriting rule to obtain a second SQL statement. This includes: grouping the second rewriting rules based on the operators in the first SQL statement to obtain at least one set of rules; sorting the rules contained in each set of rules based on a query rewriting strategy; sorting all sets of rules in the at least one set of rules as a whole, using the rule sets as the granularity, based on the sorted rules in each set of rules; and rewriting the first SQL statement based on the sorted sets of rules in the at least one set of rules. By sorting the rewriting rules step by step, the illusion of the neural network model used in the sorting process can be reduced, improving the accuracy of the rewriting rule sorting.

[0013] In one possible implementation, multiple rewriting rules are evaluated based on the similarity between the first SQL statement and at least one first rewriting case, and the association between the at least one first rewriting case and multiple rewriting rules, to obtain evaluation results for each of the multiple rewriting rules. This includes: when there is an association between the third rewriting rule and the second rewriting case, the evaluation score for evaluating the third rewriting rule based on the second rewriting case is the similarity score between the second rewriting case and the first SQL statement, where the third rewriting rule is any one of the multiple rewriting rules, and the second rewriting case is any one of the at least one first rewriting case; when there is no association between the third rewriting rule and the second rewriting case, the evaluation score for evaluating the third rewriting rule based on the second rewriting case is the initial score of the third rewriting rule; wherein, the evaluation result of the third rewriting rule is calculated based on the evaluation scores of evaluating the third rewriting rule for each of the first rewriting cases.

[0014] In one possible implementation, after obtaining the second SQL statement, the method further includes: outputting a query rewriting suggestion for the second SQL statement and the second SQL statement, the query rewriting suggestion including the at least one first rewriting rule.

[0015] In one possible implementation, after obtaining the second SQL statement, the method further includes: if the execution cost of the second SQL statement is less than or equal to the execution cost of the first SQL statement, performing a query on the database based on the second SQL statement and returning the query result to the user; if the execution cost of the second SQL statement is greater than the execution cost of the first SQL statement, performing a query on the database based on the first SQL statement and returning the query result to the user.

[0016] Secondly, this application provides an SQL statement rewriting apparatus, comprising: an acquisition module and a processing module. The acquisition module is used to acquire a first SQL statement to be rewritten. The processing module is used to select at least one first rewriting case from multiple rewriting cases based on the similarity between the first SQL statement and multiple rewriting cases, wherein a rewriting case includes an example SQL statement and a rewritten SQL statement after rewriting the example SQL statement. The processing module is further used to evaluate multiple rewriting rules based on the similarity between the first SQL statement and at least one first rewriting case, and the association between the at least one first rewriting case and multiple rewriting rules, to obtain evaluation results for each of the multiple rewriting rules, wherein one rewriting rule is a rule used for query rewriting of the SQL statement. The processing module is further used to rewrite the first SQL statement based on at least one first rewriting rule from the multiple rewriting rules to obtain a second SQL statement, wherein the at least one first rewriting rule is a rule whose evaluation result among the multiple rewriting rules meets the requirements.

[0017] In one possible implementation, the at least one rewriting rule is the top N rewriting rules after the evaluation results of multiple rewriting rules are sorted, where N≥1 and N is a positive integer.

[0018] In one possible implementation, the processing module is further configured to: obtain a query rewriting strategy for the first SQL statement based on the first SQL statement and at least one first rewriting case, wherein the query rewriting strategy instructs the first SQL statement to be rewritten with reference to the first rewriting case. In this case, when the processing module rewrites the first SQL statement based on at least one first rewriting rule among multiple rewriting rules to obtain a second SQL statement, it is specifically configured to: select a second rewriting rule from the first rewriting rules based on the query rewriting strategy; and rewrite the first SQL statement based on the second rewriting rule to obtain the second SQL statement.

[0019] In one possible implementation, when the processing module obtains the query rewriting strategy for the first SQL statement based on the first SQL statement and at least one first rewriting case, it specifically performs the following steps: adding the first SQL statement and at least one first rewriting case to the first prompt template to obtain the first prompt, which instructs the neural network model to generate a query rewriting strategy for rewriting the first SQL statement by referring to the first rewriting case; and inputting the first prompt into the neural network model to obtain the query rewriting strategy.

[0020] In one possible implementation, when the processing module selects the second rewriting rule from the first rewriting rule based on the query rewriting strategy, it is specifically used to: perform multiple batch filtering on the first rewriting rule based on the query rewriting strategy, wherein, in any batch filtering process, the query rewriting strategy, the filtering results obtained before any batch, and the rule to be filtered in any batch are input into the neural network model to obtain the filtering results of any batch.

[0021] In one possible implementation, when the processing module inputs the query rewriting strategy, the filtering results obtained before any batch, and the rules to be filtered in any batch into the neural network model, it specifically performs the following: adding the query rewriting strategy, the filtering results obtained before any batch, and the rules to be filtered in any batch into a second prompt template to obtain a second prompt, which instructs the neural network model to filter the rules to be filtered in any batch with reference to the query rewriting strategy; and inputting the second prompt into the neural network model to obtain the filtering results for any batch.

[0022] In one possible implementation, when the processing module rewrites the first SQL statement based on the second rewriting rule to obtain the second SQL statement, it specifically performs the following: grouping the second rewriting rules based on the operators in the first SQL statement to obtain at least one set of rules; sorting the rules contained in each set of rules based on the query rewriting strategy; sorting the rules in each set of rules as a whole, with the rule group as the granularity, based on the sorted rules in each set of rules; and rewriting the first SQL statement based on the sorted rules in the at least one set of rules.

[0023] In one possible implementation, when the processing module evaluates multiple rewriting rules based on the similarity between the first SQL statement and at least one first rewriting case, and the association between the at least one first rewriting case and multiple rewriting rules, to obtain the evaluation result of the rewriting rules, specifically: when there is an association between the third rewriting rule and the second rewriting case, the evaluation score of the third rewriting rule based on the second rewriting case is the similarity score between the second rewriting case and the first SQL statement, where the third rewriting rule is any one of the multiple rewriting rules, and the second rewriting case is any one of the at least one first rewriting case; when there is no association between the third rewriting rule and the second rewriting case, the evaluation score of the third rewriting rule based on the second rewriting case is the initial score of the third rewriting rule; wherein, the evaluation result of the third rewriting rule is calculated based on the evaluation scores of the third rewriting rule evaluated separately for all the first rewriting cases.

[0024] In one possible implementation, after obtaining the second SQL statement, the processing module is further configured to: output a query rewriting suggestion for the second SQL statement and the second SQL statement, the query rewriting suggestion including the at least one first rewriting rule.

[0025] In one possible implementation, after obtaining the second SQL statement, the processing module is further configured to: perform a query on the database based on the second SQL statement if the execution cost of the second SQL statement is less than or equal to the execution cost of the first SQL statement, and return the query result to the user; and perform a query on the database based on the first SQL statement if the execution cost of the second SQL statement is greater than the execution cost of the first SQL statement, and return the query result to the user.

[0026] Thirdly, this application provides a computing device cluster, including at least one computing device, each computing device including a processor and a memory; the processor of the at least one computing device is used to execute instructions stored in the memory of the at least one computing device, so that the computing device cluster performs the method described in the first aspect or any possible implementation of the first aspect.

[0027] Fourthly, this application provides a computer-readable storage medium including computer program instructions that, when executed by a cluster of computing devices, perform the method described in the first aspect or any possible implementation thereof. Exemplarily, the computing device cluster may include one or more computing devices.

[0028] Fifthly, this application provides a computer program product containing instructions that, when executed by a cluster of computing devices, cause the cluster of computing devices to perform the method described in the first aspect or any possible implementation thereof. Exemplarily, the cluster of computing devices may include one or more computing devices.

[0029] It is understood that the beneficial effects of the second to fifth aspects mentioned above can be found in the relevant descriptions in the first aspect mentioned above, and will not be repeated here. Attached Figure Description

[0030] Figure 1 is a schematic diagram of a technical concept for SQL statement rewriting provided in an embodiment of this application;

[0031] Figure 2 is a schematic diagram of another technical concept for SQL statement rewriting provided in an embodiment of this application;

[0032] Figure 3 is a schematic diagram of a prompt word template and a rewrite analysis example provided in an embodiment of this application;

[0033] Figure 4 is a schematic diagram of a prompt word template provided in an embodiment of this application;

[0034] Figure 5 is a schematic diagram summarizing the technical concept described in Figure 2;

[0035] Figure 6 is a schematic diagram of user interaction with a cloud computing platform provided in an embodiment of this application;

[0036] Figure 7 is a flowchart illustrating an SQL statement rewriting method provided in an embodiment of this application;

[0037] Figure 8 is a schematic diagram of the application scenario of the method shown in Figure 7;

[0038] Figure 9 is a schematic diagram of the structure of an SQL statement rewriting device provided in an embodiment of this application;

[0039] Figure 10 is a schematic diagram of the structure of a computing device provided in an embodiment of this application;

[0040] Figure 11 is a schematic diagram of the structure of a computing device cluster provided in an embodiment of this application;

[0041] Figure 12 is a schematic diagram of another computing device cluster provided in an embodiment of this application;

[0042] Figure 13 is a schematic diagram of a prompt word template provided in an embodiment of this application. Detailed Implementation

[0043] In this article, the term "and / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. The symbol " / " in this article indicates that the related objects are in an "or" relationship; for example, A / B means A or B.

[0044] The terms "first" and "second," etc., used in the specification and claims herein are used to distinguish different objects, not to describe a specific order of objects. For example, "first response message" and "second response message," etc., are used to distinguish different response messages, not to describe a specific order of response messages.

[0045] In the embodiments of this application, the terms "exemplary" or "for example" are used to indicate that something is an example, illustration, or description. Any embodiment or design that is described as "exemplary" or "for example" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design. Specifically, the use of the terms "exemplary" or "for example" is intended to present the relevant concepts in a specific manner.

[0046] In the description of the embodiments of this application, unless otherwise stated, "multiple" means two or more, for example, multiple processing units means two or more processing units, multiple elements means two or more elements, etc.

[0047] First, the relevant technical terms involved in the technical solution provided in this application will be introduced.

[0048] (1) Rewrite rules

[0049] Rewrite rules are a set of guidelines or patterns used to transform raw SQL statements into equivalent but more efficient SQL statements. These rules can be manually defined or generated through automated methods such as machine learning, depending on the specific circumstances; no limitations are set here. For example, rewrite rules might include: eliminating unnecessary DISTINCT statements or replacing IN with EXISTS. The rule "eliminating unnecessary DISTINCT" means that if the query result is naturally unique, then using DISTINCT is redundant; the rule "replacing IN with EXISTS" means that when checking for at least one matching row, EXISTS is generally more efficient than IN.

[0050] Next, the technical solution provided in this application will be introduced.

[0051] For example, in the field of SQL query optimization, learning-based query rewriting is a convenient method for rewriting SQL statements. This method constructs a policy tree to represent multiple equivalent forms of SQL statements and their rewriting order. The core of this method lies in using a Monte Carlo tree search algorithm to select the rewriting paths that may bring the greatest performance improvement in the policy tree, thus balancing query execution time and the frequency of rewriting operations. To more accurately capture the characteristics and optimization potential of SQL statements, this method employs a deep attention network to learn the complex relationships between SQL statements, rewriting rules, and accessed data. This network can identify key features of SQL statements, such as the tables, columns, data types, and SQL operators involved, and then analyze the correlation between these features and potential rewriting rules, thereby customizing the best optimization strategy for each SQL statement. Furthermore, this method uses a deep learning model to identify key features of SQL statements and analyze the association between these features and possible rewriting rules to fit the overall optimization gain of the SQL statement. Through a rewriting gain estimation network, this method can select multiple nodes with the highest total gain that have no ancestor-descendant relationship on the policy tree, expanding the policy tree in parallel. This not only improves the efficiency of query rewriting but also significantly enhances overall performance.

[0052] In addition, several other query rewriting methods have been developed by improving Monte Carlo search. Some methods prune the set of possible rewriting rules. Specifically, these methods prepare a set of candidate SQL statements and their corresponding high-quality rule sets. For the input SQL statements, they evaluate the similarity between SQL statements by the distance between their embedding vectors, and use the high-quality rule set of the most similar SQL statements as candidate rewriting rules for Monte Carlo search. Other methods represent the logical query plan as an equality saturation graph (E-Graph) and rewrite the E-Graph using Monte Carlo search, selecting the node sequence in the new E-Graph with the minimum total cost as the rewritten query statement.

[0053] While all of these query rewriting methods can rewrite SQL statements, their robustness is low because the models trained using these methods struggle to adapt to unfamiliar database schemas. Furthermore, in new scenarios, the model typically needs to be retrained on hundreds of new query rewriting examples, which is costly. Additionally, these methods fail to leverage existing rich query rewriting knowledge, such as database documentation, forum questions and answers, and rewriting examples, to improve query rewriting performance.

[0054] In view of this, the embodiments of this application provide a query rewriting method for SQL statements, which can make full use of the existing rich query rewriting knowledge, improve rewriting performance, and can perform rewriting rule filtering without retraining the model in different scenarios, thus exhibiting high robustness.

[0055] For example, Figure 1 illustrates a schematic diagram of a technical concept for rewriting SQL statements according to an embodiment of this application. As shown in Figure 1, the architecture for rewriting SQL statements mainly includes: an SQL statement acquisition part, a case filtering part, a rewriting rule evaluation part, a rewriting rule filtering part, and a query rewriting part. The SQL statement acquisition part is mainly used to acquire the SQL statement to be rewritten input by the user, or to acquire the SQL statement to be rewritten sent by a device, apparatus, component, application, or service, etc. For example, in this part, the SQL statement to be rewritten can be acquired through the SQL statement acquisition module 110.

[0056] The case selection section primarily uses the similarity between the SQL statement to be rewritten and rewritten cases in the rewritten case library to select k rewritten cases (or "first rewritten cases") related to the SQL statement to be rewritten from n (n≥1) rewritten cases. A rewritten case may contain: an original SQL statement (i.e., the SQL statement before rewriting, or "example SQL statement") and the SQL statement after rewriting (i.e., the rewritten SQL statement). Of course, a rewritten case may also include the use case of the SQL statement and / or expert analysis (e.g., reasons for rewriting). In some embodiments, each rewritten case can be pre-embedded as a vector, and a vector index can be created. Then, the SQL statement to be rewritten is embedded as a vector, and the relevance between each rewritten case and the SQL statement to be rewritten is evaluated using methods such as vector similarity. Finally, the k most relevant rewritten cases are selected from the rewritten cases. k can be a hyperparameter. For example, in this section, the case retrieval module 120 can retrieve k rewritten cases from the n rewritten cases. In some embodiments, the SQL statement to be rewritten and the rewritten cases in the rewritten case library can be directly input into the LLM to obtain the similarity score between the SQL statement and the rewritten cases, or to directly obtain the selected k rewritten cases, and so on.

[0057] The rewrite rule evaluation section is primarily used to evaluate m rewrite rules based on k selected rewrite cases, obtaining an evaluation score for each rule. For example, a large language model (LLM) 130 can be used to evaluate the association between each rewrite rule and each of the k rewrite cases. Evaluation module 131 processes the association between the rewrite rules and the rewrite cases, as well as the similarity score between the rewrite cases and the SQL statement to be rewritten, to obtain the evaluation score for each rewrite rule. Of course, other modules or neural network models can also be used to evaluate the association between rewrite rules and rewrite cases, depending on the specific circumstances; no limitation is made here. For example, evaluation module 131 can be deployed separately from LLM 130, or they can be integrated together, depending on the specific circumstances; no limitation is made here. For example, when a rewrite case uses a certain rewrite rule, a relationship can be considered between the two; or, when a rewrite rule inspires a rewrite case, a relationship can be considered between the two, and so on. In some embodiments, the initial evaluation score of each rewriting rule can be 0 or other initial values. When a rewriting rule is associated with a rewriting case, the evaluation module 131 can add the similarity score between the rewriting case and the SQL statement to be rewritten to the evaluation score of the rewriting rule. By enumerating all rewriting rules and retrieved rewriting case pairs, the final evaluation score of each rewriting rule can be obtained. For example, when the retrieved rewriting cases are rewriting case A1 and rewriting case A2, and the similarity score between rewriting case A1 and the SQL statement to be rewritten is B1, and the similarity score between rewriting case A2 and the SQL statement to be rewritten is B2, and the initial evaluation score of rewriting rule C is 0, if rewriting rule C is associated with both rewriting cases A1 and A2, then the final evaluation score of rewriting rule C is B1 + B2. In some embodiments, in addition to identifying the relationship between rewrite cases and rewrite rules through LLM130, the relationship between them can also be stored in advance. In this case, after the rewrite cases are selected, the rewrite rules that are related to the selected rewrite cases can be obtained through the stored relationship. The specific relationship can be determined according to the actual situation and is not limited here.

[0058] The rewriting rule filtering section is primarily used to filter out rewriting rules that meet the requirements (or "first rewriting rules") based on the evaluation scores of each rewriting rule, resulting in f rewriting rules (or "first rewriting rules"). For example, rewriting rules with evaluation scores greater than the initial evaluation score can be selected as the desired rewriting rules. For instance, when the initial evaluation score is 0, rewriting rules with a final evaluation score of 0 from the m rewriting rules can be removed, resulting in f rewriting rules with a final evaluation score not equal to 0. For example, in this section, the rule filtering module 140 can filter out f rewriting rules from the m rewriting rules.

[0059] The query rewriting section primarily rewrites the SQL statement to be rewritten based on the selected f rewriting rules, resulting in a modified SQL statement. For example, the f rewriting rules can be sorted from highest to lowest based on their final evaluation scores; then, based on the sorting results, each rewriting rule is applied sequentially to rewrite the SQL statement to be rewritten, yielding the modified SQL statement. For example, in this section, the query rewriting module 150 can be used to rewrite the SQL statement to be rewritten.

[0060] Based on the above framework, the system retrieves rewrite cases related to the SQL statement to be rewritten. Using these rewrite case retrieval results, rewrite rules are selected and sorted from highest to lowest relevance to the cases. By fully utilizing the retrieval similarity scores of the rewrite cases to guide the selection and sorting of rewrite rules, the fit between the selected rewrite rules and the SQL statement to be rewritten is improved, thus enhancing rewrite performance. Furthermore, this framework fully leverages existing rich query rewrite knowledge, improving rewrite performance, and the same process can be used for rewrite rule selection in different scenarios, demonstrating high robustness.

[0061] Within the aforementioned framework, to improve the compatibility between the selected rewriting rules and the SQL statements to be rewritten, a secondary filtering of the rewriting rules can be added. In this case, the framework can be transformed into the framework shown in Figure 2. In Figure 2, the architecture for rewriting SQL statements mainly includes: SQL statement acquisition, case filtering, rewriting rule evaluation, rewriting rule filtering, rewriting analysis acquisition, secondary filtering of rewriting rules, and query rewriting. The SQL statement acquisition, case filtering, rewriting rule evaluation, rewriting rule filtering, and query rewriting parts in Figure 2 can be found in the relevant descriptions in Figure 1 above, and will not be elaborated upon here.

[0062] In Figure 2, the rewrite analysis acquisition section is mainly used to generate a rewrite analysis of the SQL statement to be rewritten based on the SQL statement to be rewritten and the k rewrite cases retrieved. In this section, at least the SQL statement and the k rewrite cases can be added to a prompt template to obtain a prompt, which is then input into the LLM130 to obtain the rewrite analysis. The prompt input into the LLM130 can guide the LLM130 to analyze the query rewrite strategy required for rewriting the SQL statement to be rewritten based on the k rewrite cases. Of course, the prompt can also guide the LLM130 to explain the analyzed query rewrite strategy, etc. For example, the prompt template for generating such a prompt can be shown in Figure 3(A). As shown in Figure 3(A), the prompt generated by this prompt template can guide the LLM130 to propose and explain query rewriting strategies that optimize the user-input SQL statement by rewriting it based on a user-input SQL statement and related rewrite cases. It can also guide the LLM130 to execute specified analysis steps. In this section, by designing the prompt and using the LLM to generate rewrite analysis, it explains how to rewrite SQL statements by referring to rewrite cases. Compared to the original rewrite cases, the rewrite analysis is more easily understood by the LLM and can be used to guide LLM rewriting, alleviating the illusion problem and improving performance.

[0063] The rewrite analysis obtained in the rewrite analysis section can at least be used to indicate the query rewrite strategy for rewriting the SQL statement to be rewritten in the referenced rewrite cases. Of course, the rewrite analysis can also be used to indicate related explanations of the suggested query rewrite strategy. For example, the rewrite analysis can be shown in Figure 3(B). As can be seen from Figure 3(B), there are two query rewrite strategies for the provided SQL statement. The first strategy is: "First, by moving the condition 'sr_returned_date_sk=d_date_sk' from the 'WHERE' clause to the 'ON' clause of 'INNER JOIN,' the query can reduce the size of the dataset before the join, thereby improving efficiency. Furthermore, conditions such as 'd_year=1999' and 'sr_return_amt / sr_return_quantity are between 80 and 139' are directly applied to their respective tables, further optimizing the query by minimizing the amount of data processed during the join." The second strategy is to "push the conditions in the outer query into the 'customer_total_return' common table expression (CTE), for example, moving 'ctr1.ctr_reason_sk BETWEEN 72AND 75' into the 'WHERE' clause of the CTE. This can reduce the number of rows processed in the main query and make more efficient use of database indexes. Finally, if feasible, simplifying the query by directly joining the 'store_returns' and 'date_dim' tables in the main query can reduce complexity and potentially improve performance by eliminating the need for temporary result sets, although this depends on the specific logic and the necessity of the 'WITH' clause for the clarity or separation of the logic." In some embodiments, a rewrite analysis or query rewrite strategy may instruct the SQL statements to be rewritten to reference k rewrite cases.

[0064] In Figure 2, the secondary filtering section for rewriting rules is mainly used to perform secondary filtering on the f rewriting rules selected in the rewriting analysis section based on the rewriting analysis obtained in the rewriting analysis section, and to determine the ranking of the rewriting rules after secondary filtering (or "secondary rewriting rules"). During the secondary filtering, at least the rewriting analysis, the rewriting rules to be filtered, and the original SQL statement can be added to the prompt template to obtain a prompt. This prompt is then input into LLM130 to obtain j (j≤f) rewriting rules after secondary filtering. The prompt input into LLM130 here can guide LLM130 to filter out rules related to the query rewriting strategy suggested in the rewriting analysis from the j rewriting rules. For example, the prompt template for generating this prompt can be shown in Figure 4. As shown in Figure 4, the prompt generated by this prompt template guides the LLM130 to select rules related to the query rewriting strategy suggested in the rewriting analysis from the initially selected rewriting rules according to the specified analysis steps. During the analysis, the LLM130 can first evaluate all the initially selected rewriting rules to see if they can transform a given SQL statement (i.e., the original SQL statement) consistent with the query rewriting strategy suggested in the rewriting analysis. During the evaluation process, it's important to note that a suggested query rewriting strategy may require a combination of multiple rewriting rules. Then, after completing the evaluation, the LLM130 selects the query rewriting rules consistent with the provided query rewriting suggestions. During the selection process, the combined effect of multiple rules can be considered, and the given SQL statement can only partially match the rule conditions. In some embodiments, during the secondary screening, the rewriting rules can be screened in batches (i.e., multiple batches of rewriting rule screening). In this case, for each batch of rewriting rules, a prompt can be created to accompany the rewriting analysis, allowing the LLM130 to perform secondary screening for each batch of rewriting rules. Considering the dependencies between rewriting rules, the rewriting rules selected in the previous stage of secondary screening, the latest batch of rewriting rules to be screened, and rewriting analysis data can all be added to the prompt template used in this section. This creates a prompt for screening the latest batch of rewriting rules, which is then input into LLM130 for secondary screening. In other words, when performing secondary screening in batches, previously selected rewriting rules and the next batch of rewriting rules can be input into the LLM for further selection, thereby improving the accuracy of the rewriting rules selected in the secondary screening.

[0065] Based on the framework shown in Figure 2, a secondary screening of rewriting rules is performed using rewrite analysis, further improving the fit between the selected rewriting rules and the SQL statement to be rewritten, thereby further improving query rewriting performance. Within this framework, a multi-step LLM rewriting algorithm is designed, combining rewrite analysis to optimize the selection and sorting of rewriting rules. This decomposes the query rewriting process into more easily solvable sub-problems, alleviating the LLM illusion problem and improving query performance. For example, the framework shown in Figure 2 can be summarized as follows: Based on embedding vector similarity, the k most relevant rewriting cases to the input SQL statement are retrieved from the rewriting cases; highly readable rewrite analysis is generated using LLM to explain how to refer to the rewriting cases to rewrite the SQL statement; the relevance between the retrieved rewriting cases and the rewriting rules is analyzed using LLM; the rewriting rules are selected and sorted according to relevance from high to low; the rewriting rule sequence is optimized using LLM combined with rewrite analysis; and finally, the rewriting rule sequence is used for effective query rewriting. Furthermore, as can be seen from the description in Figure 2, the core of the scheme framework, as shown in Figure 5, can be summarized as follows: (1) Rewriting case screening (or "retrieval"), using vector similarity retrieval to screen cases similar to the query statement from the rewritten cases; (2) Case-based rewriting analysis acquisition (or "generation"), using LLM based on rewritten cases to acquire rewriting analysis for the query statement; (3) Rewriting rule screening (or "selection") and sorting based on case screening (or "retrieval"), using LLM to analyze the relevance between rewritten cases and rewriting rules, and screening and sorting rewriting rules based on vector retrieval similarity scores; (4) Rewriting rule screening (or "selection") and sorting based on rewriting analysis, using LLM based on rewriting analysis to screen and sort rewriting rules.

[0066] In some embodiments, within the framework described above, when filtering rewriting rules, in addition to the filtering methods described above, the SQL statement to be rewritten, rewriting cases, and rewriting rules can be directly input into the LLM130 so that the LLM130 outputs rewriting rules that meet the requirements. For example, the SQL statement to be rewritten, rewriting cases, and rewriting rules can be added to a preset prompt template to obtain a prompt; then, this prompt can be input into the LLM130 to obtain rewriting rules that meet the requirements. At this time, the prompt can guide the LLM130 to filter the rewriting rules based on the similarity between the SQL statement and each rewriting case, as well as the correlation between each rewriting case and each rewriting rule. Under this rewriting rule filtering method, the evaluation result of the rewriting rules can refer to the filtering result of the rewriting rules. In addition, the SQL statement to be rewritten, the rewriting case, and the rewriting rules are input into the LLM130. The LLM130 can also output the evaluation results of each rewriting rule, or output the rewriting rules whose evaluation results meet the requirements, etc. The specifics can be determined according to the actual situation, and no limitation is made here.

[0067] In some embodiments, each part of the above-described solution framework can be configured on a cloud computing platform, for example, deployed on at least one instance such as a virtual machine or container, so that the cloud computing platform can provide SQL statement rewriting services. Of course, each part of the above-described solution framework can also be configured on nodes other than the cloud computing platform, for example, deployed in at least one data center or on at least one server, depending on the actual situation, and is not limited here. The cloud computing platform can provide pages related to public cloud services for users to remotely access public cloud services. In this embodiment, the user can pre-purchase the SQL statement rewriting service on the cloud computing platform. For ease of understanding, the interaction between the user and the cloud computing platform is described below. As shown in Figure 6, the interaction between the user and the cloud computing platform mainly includes: the user logs into the cloud computing platform 600 through a client webpage, selects and purchases the SQL statement rewriting service on the cloud computing platform 600, and after purchase, the user can rewrite SQL queries on the cloud computing platform 600 based on the functions provided by the SQL statement rewriting service. The cloud computing platform 600 is mainly used to manage the infrastructure running the SQL statement rewriting service. For example, the infrastructure for running the SQL statement rewriting service may include multiple data centers located in different regions, each data center comprising multiple servers. The data centers can provide basic resources for the SQL statement rewriting service, such as computing resources and storage resources. Therefore, when users purchase and use the SQL statement rewriting service, they primarily pay for the resources they use. When using the SQL statement rewriting service, users can input SQL statements through the configuration interface, application programming interface, or user interaction interface provided by the cloud computing platform 600. The cloud computing platform 600 can then rewrite the SQL statements according to the user input. In some embodiments, the components of the above-described solution framework can also be configured on local servers, depending on the actual situation, and are not limited here.

[0068] The following describes the specific implementation process of the above technical concept.

[0069] For example, Figure 7 shows a flowchart of an SQL statement rewriting method provided in an embodiment of this application. It is understood that this method can be executed by any device, equipment, platform, or device cluster with computing and processing capabilities. For example, this method can be executed by an SQL statement rewriting device, which can be implemented by software and / or hardware, and can be configured in, but is not limited to, an electronic device or a server; typically, it can be configured on a server. As shown in Figure 7, the SQL statement rewriting method may include the following steps:

[0070] S701, Obtain the first SQL statement to be rewritten.

[0071] In this embodiment, the first SQL statement can be entered by the user or sent by a device, apparatus, or service, etc., and there is no limitation here.

[0072] S702. Based on the similarity between the first SQL statement and multiple rewrite cases, at least one first rewrite case is selected from the multiple rewrite cases. A rewrite case includes an example SQL statement and a rewritten SQL statement after rewriting the example SQL statement.

[0073] In this embodiment, similarity scores between the first SQL statement and each rewritten case in the rewritten case library can be calculated using similarity algorithms such as cosine similarity. Then, the k (k≥1) rewritten cases with the highest similarity scores can be selected as at least the cases intended to spoof the first SQL statement. For example, a rewritten case may include an example SQL statement and a rewritten SQL statement that rewrites the example SQL statement; that is, it includes both the SQL statement before and after the query rewrite. For example, the detailed process of this step can be found in the aforementioned case selection section, and will not be repeated here.

[0074] S703. Based on the similarity between the first SQL statement and at least one first rewrite case, and the association between the at least one first rewrite case and the multiple rewrite rules, evaluate the multiple rewrite rules to obtain the evaluation results of each of the multiple rewrite rules, wherein a rewrite rule is a rule used to perform query rewriting on the SQL statement.

[0075] In this embodiment, after selecting the first rewrite case, the rewrite rules in the rule base can be evaluated using the selected first rewrite case to obtain the evaluation results of each rewrite rule. The evaluation result of a rewrite rule can be calculated based on the association between the rewrite rule and each of the first rewrite cases, and the similarity between each of the first rewrite cases and the first SQL statement. For example, the association between a rewrite rule R and a first rewrite case C can be evaluated using LLM as described in the aforementioned rewrite rule evaluation section. Then, when there is an association, the similarity score between the first rewrite case C and the first SQL statement can be used as the evaluation score for evaluating the rewrite rule R using the first rewrite case C. When there is no association, the initial score of the rewrite rule R can be used as the evaluation score for evaluating the rewrite rule R using the first rewrite case C. After determining the relationship between rewriting rule C and each first rewriting case, the statistically calculated evaluation score can be used to obtain the evaluation result of rewriting rule C. For example, the sum or average of all evaluation scores related to rewriting rule C can be used as the evaluation result of rewriting rule C.

[0076] S704. Based on at least one first rewrite rule among multiple rewrite rules, rewrite the first SQL statement to obtain the second SQL statement, wherein the at least one first rewrite rule is the rule whose evaluation result meets the requirements among the multiple rewrite rules.

[0077] In this embodiment, after obtaining the evaluation results of each rewriting rule, the rewriting rules whose evaluation results meet the requirements (or "first rewriting rules") can be selected according to pre-set criteria. For example, when the evaluation result is represented by an evaluation score, the rewriting rules whose evaluation scores are greater than or equal to a preset value can be used as the rewriting rules that meet the requirements. Then, the first SQL statement can be rewritten using the rewriting rules whose evaluation results meet the requirements to obtain the second SQL statement. For example, when the evaluation result is represented by an evaluation score, the rewriting rules that meet the requirements can be sorted according to the evaluation scores, and then the corresponding rewriting rules can be used sequentially to rewrite the first SQL statement according to the sorting results to obtain the second SQL statement. In some embodiments, the rules whose evaluation results meet the requirements among multiple rewriting rules are the top N rewriting rules after sorting the evaluation results of multiple rewriting rules, where N≥1 and N is a positive integer.

[0078] In this way, by fully utilizing rewrite cases to guide the selection of rewrite rules, the compatibility between the selected rewrite rules and the SQL statements to be rewritten is improved, thereby enhancing rewrite performance. Furthermore, since rewrite cases provide abundant existing query rewrite knowledge, this knowledge can be fully utilized to further improve rewrite performance. Moreover, rewrite rule selection can be performed in different scenarios without retraining the model, demonstrating high robustness.

[0079] In some embodiments, in Figure 7 above, the rewrite analysis acquisition section described in Figure 2 above can also be referenced to obtain a query rewrite strategy for the first SQL statement based on the first SQL statement and at least one first rewrite case. This query rewrite strategy can instruct the first SQL statement to be rewritten with reference to the first rewrite cases. For example, at least the first SQL statement and each first rewrite case can be added to a first prompt template (e.g., the template shown in (A) of Figure 3 above) to obtain a first prompt. This first prompt is used to instruct the neural network model to generate a query rewrite strategy for rewriting the first SQL statement with reference to the first rewrite cases. Then, the first prompt is input into the neural network model to obtain the query rewrite strategy. Exemplarily, this query rewrite strategy can be, but is not limited to, included in the aforementioned rewrite analysis.

[0080] Furthermore, in S704, based on the query rewriting strategy, rules related to the query rewriting strategy can be filtered from the rewriting rules whose evaluation results meet the requirements (or "first rewriting rules") to obtain second rewriting rules. For example, to improve filtering efficiency, the first rewriting rules can be filtered in multiple batches based on the query rewriting strategy. During multi-batch filtering, to improve filtering accuracy, in any batch filtering process, the query rewriting strategy, the filtering results obtained before that batch, and the rules to be filtered in that batch can be input into the neural network model to obtain the filtering results for that batch. For example, at least the query rewriting strategy, the filtering results obtained before that batch, and the rules to be filtered in that batch can be added to the second prompt template (such as the template shown in Figure 4) to obtain the second prompt. This second prompt is used to instruct the neural network model to filter the rules to be filtered in any batch with reference to the query rewriting strategy. Then, the second prompt is input into the neural network model to obtain the filtering results for any batch. For example, the process of secondary filtering can be referred to in the relevant description of the secondary filtering section of the rewritten rules in Figure 2 above, which will not be repeated here.

[0081] After completing the second filtering, the first SQL statement can be rewritten based on the second rewriting rules from the filtering stage to obtain the second SQL statement. Alternatively, the second rewriting rules obtained from the second filtering stage can be sorted first, for example, by sorting them according to the evaluation results of each second rewriting rule, and then the first SQL statement can be rewritten sequentially according to the sorting results to obtain the second SQL statement. As a possible implementation, when sorting the second rewriting rules, considering the interdependencies between the rewriting rules, they can be grouped according to the operators (e.g., JOIN) in the first SQL statement to obtain at least one set of rules. Then, the LLM is required to sort the rules contained in each set of rules separately according to the query rewriting strategy (i.e., sort the rules contained in each set of rules separately based on the query rewriting strategy). For example, at least the query rewriting strategy and each group of rules can be added to a pre-defined prompt template (such as the template shown in Figure 13(A)) to obtain a third prompt; then, this third prompt is input into a neural network model to obtain the ranking result of the rules contained in each group of rules. Finally, based on the order of the rewriting rules contained in each group of rules, the LLM is further required to optimize the overall rewriting rule order (i.e., based on the ranked rules in each group of rules, the overall ranking of each group of rules is performed at the rule group level) so that the final ranking of the second rewriting rules is consistent with the rewriting analysis. For example, at least the query rewriting strategy, each group of rules, and the sub-rules contained in each group of rules can be added to a pre-defined prompt template (such as the template shown in Figure 13(B)) to obtain a fourth prompt; then, this fourth prompt is input into a neural network model to obtain the overall ranking result of each group of rules. In this way, by ranking the rewriting rules step by step, the illusion of LLM can be reduced and the accuracy of the rewriting rule ranking can be improved.

[0082] In some embodiments, when a user requests query rewriting of an SQL statement, after rewriting the first SQL statement as shown in Figure 7 to obtain the second SQL statement, the second SQL statement and its query rewriting suggestions can be output. The query rewriting suggestions include at least one rewriting rule used to rewrite the first SQL statement. Alternatively, the rewriting analysis can be returned to the user to provide an easily understandable query rewriting solution.

[0083] In a database query scenario, after rewriting the first SQL statement as shown in Figure 7 to obtain the second SQL statement, the costs of executing the first and second SQL statements can be calculated separately. If the execution cost of the second SQL statement is less than or equal to the execution cost of the first SQL statement, a query can be performed on the database based on the second SQL statement, and the query result can be returned to the user. Otherwise, a query can be performed on the database based on the first SQL statement, and the query result can be returned to the user. This achieves efficient database querying.

[0084] The above is a brief introduction to the SQL statement rewriting method provided in this application. It is understood that this method can be applied, but is not limited to, cloud data warehouses such as GaussDB (DWS) and DataArts Insight, as well as BI products. This method can also be applied independently to third-party rewriting tools (such as middleware) to rewrite query statements on different databases, including SQL Server and Oracle. Furthermore, this method can be integrated into AI assistants, providing users with prompts, including rewriting suggestions and the rewritten SQL statement, when they are writing SQL statements. For example, when this method is integrated into an AI assistant, as shown in Figure 8(A), the user can input the SQL statement to be rewritten on the display interface 81 of the client associated with the AI ​​assistant. The client can then transmit the SQL statement to be rewritten to the server. After receiving the SQL statement to be rewritten, the server can rewrite the SQL statement from the client according to the aforementioned SQL statement rewriting method and transmit the rewritten SQL statement back to the client. After receiving the rewritten SQL statement from the server, the client can display it on its interface 81 to show the user. Additionally, the client can be configured with a control 82 to explain how the rewriting was done; the user can select control 82 to view the explanation or analysis of the rewritten SQL statement.

[0085] When this method is applied to products such as databases, as shown in Figure 8(B), users can input SQL statements on the database-related client. The client can then send a query request containing the SQL statement to the server. Upon receiving the query request, the server can rewrite the SQL statement in the query request using the aforementioned SQL statement rewriting method, perform a database query based on the rewritten SQL statement, and transmit the query results to the client. Furthermore, besides users actively inputting SQL statements, other applications, devices, apparatuses, modules, or services can also send query requests containing SQL statements to the server; this is not limited here.

[0086] When this method is applied to a third-party query rewriting tool, as shown in Figure 8(C), the user can input an SQL statement on a database-related client. The client can then send a query request containing that SQL statement to the middleware. Upon receiving the query request, the middleware can rewrite the SQL statement in the query request according to the aforementioned SQL statement rewriting method and send a query request containing the rewritten SQL statement to the server. After receiving the query request from the middleware, the server can perform a database query based on the rewritten SQL statement and transmit the query results to the client. Furthermore, besides users actively inputting SQL statements, other applications, devices, mechanisms, modules, or services can also send query requests containing SQL statements to the middleware; this is not limited here.

[0087] It is understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application. In addition, the various embodiments and technical features described above can be combined according to actual conditions, and the combined solutions are still within the protection scope of this application.

[0088] Next, based on the methods in the above embodiments, an SQL statement rewriting device provided in this application will be introduced.

[0089] For example, Figure 9 shows a schematic diagram of the structure of an SQL statement rewriting device provided in an embodiment of this application. As shown in Figure 9, the SQL statement rewriting device 900 includes an acquisition module 901 and a processing module 902. The acquisition module 901 is used to acquire a first SQL statement to be rewritten. The processing module 902 is used to select at least one first rewriting case from multiple rewriting cases based on the similarity between the first SQL statement and multiple rewriting cases, wherein a rewriting case includes an example SQL statement and a rewritten SQL statement after rewriting the example SQL statement. The processing module 902 is further used to evaluate multiple rewriting rules based on the similarity between the first SQL statement and at least one first rewriting case, and the association between at least one first rewriting case and multiple rewriting rules, to obtain the evaluation results of each of the multiple rewriting rules, wherein one rewriting rule is a rule used to rewrite the SQL statement query. The processing module 902 is further configured to rewrite the first SQL statement based on at least one first rewrite rule among multiple rewrite rules to obtain a second SQL statement, wherein the at least one first rewrite rule is a rule among multiple rewrite rules whose evaluation result meets the requirements.

[0090] In some embodiments, the processing module 902 is further configured to: obtain a query rewriting strategy based on the first SQL statement and at least one first rewrite case, wherein the query rewriting strategy instructs the first SQL statement to be rewritten with reference to the first rewrite case.

[0091] At this time, when the processing module 902 rewrites the first SQL statement based on at least one first rewrite rule among multiple rewrite rules to obtain the second SQL statement, it is specifically used to: select a second rewrite rule from at least one first rewrite rule based on the query rewrite strategy; and rewrite the first SQL statement based on the second rewrite rule to obtain the second SQL statement.

[0092] In some embodiments, when the processing module 902 obtains a query rewriting strategy for the first SQL statement based on the first SQL statement and at least one first rewriting case, it is specifically used to: add the first SQL statement and at least one first rewriting case to a first prompt template to obtain a first prompt, the first prompt being used to instruct the neural network model to generate a query rewriting strategy for rewriting the first SQL statement by referring to the first rewriting case; and input the first prompt into the neural network model to obtain the query rewriting strategy.

[0093] In some embodiments, when the processing module 902 selects a second rewriting rule from at least one first rewriting rule based on a query rewriting strategy, it is specifically used to: perform multiple batch filtering on the first rewriting rule based on the query rewriting strategy, wherein, in any batch filtering process, the query rewriting strategy, the filtering results obtained before any batch, and the rule to be filtered in any batch are input into the neural network model to obtain the filtering results of any batch.

[0094] In some embodiments, when the processing module 902 inputs the query rewriting strategy, the filtering results obtained before any batch, and the rules to be filtered in any batch into the neural network model, it is specifically used to: add the query rewriting strategy, the filtering results obtained before any batch, and the rules to be filtered in any batch into the second prompt template to obtain the second prompt, the second prompt being used to instruct the neural network model to filter the rules to be filtered in any batch with reference to the query rewriting strategy; and input the second prompt into the neural network model to obtain the filtering results of any batch.

[0095] In some embodiments, when the processing module 902 performs query rewriting on the first SQL statement based on the second rewriting rules to obtain the second SQL statement, it is specifically used to: group the second rewriting rules based on the operators in the first SQL statement to obtain at least one set of rules; sort the rules contained in each set of rules based on the query rewriting strategy; sort the rules in each set of rules as a whole, with the rule group as the granularity, based on the sorted rules in each set of rules; and perform query rewriting on the first SQL statement based on the sorted rules in the at least one set of rules.

[0096] In some embodiments, when the processing module 902 evaluates multiple rewriting rules based on the similarity between the first SQL statement and at least one first rewriting case, and the association between the at least one first rewriting case and multiple rewriting rules, to obtain the evaluation results of each of the multiple rewriting rules, it is specifically configured to: when there is an association between the third rewriting rule and the second rewriting case, the evaluation score of the third rewriting rule based on the second rewriting case is the similarity score between the second rewriting case and the first SQL statement, where the third rewriting rule is any one of the multiple rewriting rules, and the second rewriting case is any one of the at least one first rewriting case; when there is no association between the third rewriting rule and the second rewriting case, the evaluation score of the third rewriting rule based on the second rewriting case is the initial score of the third rewriting rule; wherein, the evaluation result of the third rewriting rule is calculated based on the evaluation scores of the third rewriting rule evaluated on all the first rewriting cases respectively.

[0097] In some embodiments, after obtaining the second SQL statement, the processing module 902 is further configured to: output the second SQL statement and query rewriting suggestions for the second SQL statement, wherein the query rewriting suggestions include at least one first rewriting rule.

[0098] In some embodiments, after obtaining the second SQL statement, the processing module 902 is further configured to: perform a query on the database based on the second SQL statement and return the query result to the user if the execution cost of the second SQL statement is less than or equal to the execution cost of the first SQL statement; and perform a query on the database based on the first SQL statement and return the query result to the user if the execution cost of the second SQL statement is greater than the execution cost of the first SQL statement.

[0099] In some embodiments, at least one rewriting rule is the top N rewriting rules among the plurality of rewriting rules after the evaluation results are sorted, where N≥1 and N is a positive integer.

[0100] In some embodiments, both the acquisition module 901 and the processing module 902 shown in FIG9 can be implemented in software or in hardware. For example, the implementation of the acquisition module 901 will be described below. Similarly, the implementation of the processing module 902 can refer to the implementation of the acquisition module 901.

[0101] As an example of a software functional unit, module 901 may include code running on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, or a container. Further, the aforementioned computing instance may be one or more. For example, module 901 may include code running on multiple hosts / virtual machines / containers. It should be noted that the multiple hosts / virtual machines / containers used to run the code may be distributed within the same region or in different regions. Further, the multiple hosts / virtual machines / containers used to run the code may be distributed within the same availability zone (AZ) or in different AZs, each AZ including one or more geographically proximate data centers. Typically, a region may include multiple AZs.

[0102] Similarly, multiple hosts / virtual machines / containers used to run this code can be distributed within the same Virtual Private Cloud (VPC) or across multiple VPCs. Typically, a VPC is set up within a region. Communication between two VPCs within the same region, as well as between VPCs in different regions, requires a communication gateway to be set up within each VPC to enable interconnection between VPCs.

[0103] As an example of a hardware functional unit, the acquisition module 901 may include at least one computing device, such as a server. Alternatively, the acquisition module 901 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). The PLD may be implemented using a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), generic array logic (GAL), or any combination thereof.

[0104] The multiple computing devices included in the acquisition module 901 can be distributed in the same region or in different regions. Similarly, the multiple computing devices included in the acquisition module 901 can be distributed in the same Availability Zone (AZ) or in different AZs. Likewise, the multiple computing devices included in the acquisition module 901 can be distributed in the same Virtual Private Cloud (VPC) or in multiple VPCs. These multiple computing devices can be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.

[0105] It should be noted that in other embodiments, the acquisition module 901 can be used to execute any step in the SQL statement rewriting method described in the above embodiments, and the processing module 902 can also be used to execute any step in the SQL statement rewriting method described in the above embodiments. Furthermore, the steps implemented by the acquisition module 901 and the processing module 902 can be specified as needed. By implementing different steps in the SQL statement rewriting method described in the above embodiments through the acquisition module 901 and the processing module 902, all the functions of the SQL statement rewriting device 700 shown in FIG9 can be achieved.

[0106] This application also provides a computing device 1000. As shown in FIG10, the computing device 1000 includes: a bus 1002, a processor 1004, a memory 1006, and a communication interface 1008. The processor 1004, the memory 1006, and the communication interface 1008 communicate with each other via the bus 1002. The computing device 1000 may be a server or a terminal device. It should be understood that this application does not limit the number of processors and memories in the computing device 1000.

[0107] Bus 1002 can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of illustration, only one line is used in Figure 10, but this does not imply that there is only one bus or one type of bus. Bus 1004 can include pathways for transmitting information between various components of the computing device 1000 (e.g., memory 1006, processor 1004, communication interface 1008).

[0108] The processor 1004 may include any one or more processors such as a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP), or a digital signal processor (DSP).

[0109] The memory 1006 may include volatile memory, such as random access memory (RAM). The processor 1004 may also include non-volatile memory, such as read-only memory (ROM), flash memory, hard disk drive (HDD), or solid state drive (SSD).

[0110] The memory 1006 stores executable program code, and the processor 1004 executes the executable program code to implement the functions of the acquisition module 901 and the processing module 902 shown in FIG. 9, thereby implementing the SQL statement rewriting method described in the above embodiments. That is, the memory 1006 stores instructions for executing the SQL statement rewriting method described in the above embodiments.

[0111] Alternatively, the memory 1006 stores executable code, which the processor 1004 executes to implement the functions of the SQL statement rewriting apparatus 700 shown in FIG. 9, thereby implementing the SQL statement rewriting method described in the above embodiments. That is, the memory 1006 stores instructions for executing the SQL statement rewriting method described in the above embodiments.

[0112] The communication interface 1008 uses transceiver modules such as, but not limited to, network interface cards and transceivers to enable communication between the computing device 1000 and other devices or communication networks.

[0113] This application also provides a computing device cluster. The computing device cluster includes at least one computing device. The computing device can be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device can also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.

[0114] As shown in Figure 11, the computing device cluster includes at least one computing device 1000. The memory 1006 in one or more computing devices 1000 in the computing device cluster may store the same instructions for executing the SQL statement rewriting method described in the above embodiments.

[0115] In some possible implementations, the memory 1006 of one or more computing devices 1000 in the computing device cluster may also store partial instructions for executing the SQL statement rewriting method described in the above embodiments. In other words, a combination of one or more computing devices 1000 can jointly execute instructions for executing the SQL statement rewriting method described in the above embodiments.

[0116] It should be noted that the memory 1006 in different computing devices 1000 within the computing device cluster can store different instructions, which are used to execute some of the functions of the SQL statement rewriting device 700 shown in Figure 9. That is, the instructions stored in the memory 1006 of different computing devices 1000 can implement the functions of one or more modules in the acquisition module 901 and the processing module 902.

[0117] In some possible implementations, one or more computing devices in a computing device cluster can be connected via a network. This network can be a wide area network (WAN) or a local area network (LAN), etc. Figure 12 illustrates one possible implementation. As shown in Figure 12, two computing devices 1000A and 1000B are connected via a network. Specifically, they are connected to the network through communication interfaces in each computing device. In this type of possible implementation, the memory 1006 in computing device 1000A stores instructions for executing the functions of the acquisition module 901. Simultaneously, the memory 1006 in computing device 1000B stores instructions for executing the functions of the processing module 902.

[0118] It should be understood that the functions of computing device 1000A shown in Figure 12 can also be performed by multiple computing devices 1000. Similarly, the functions of computing device 1000B can also be performed by multiple computing devices 1000.

[0119] This application also provides another computing device cluster. The connection relationship between the computing devices in this computing device cluster can be similarly referred to the connection method of the computing device cluster described in Figures 11 and 12. The difference is that the memory 1006 of one or more computing devices 1000 in this computing device cluster can store the same instructions for executing the methods in the above embodiments.

[0120] In some possible implementations, the memory 1006 of one or more computing devices 1000 in the computing device cluster may also store partial instructions for executing the aforementioned SQL statement rewriting method. In other words, a combination of one or more computing devices 1000 can jointly execute the instructions for executing the aforementioned SQL statement rewriting method.

[0121] It should be understood that each step of the above method embodiments can be accomplished by hardware logic circuits or software instructions in a processor.

[0122] Based on the methods in the above embodiments, this application provides a computer-readable storage medium including computer program instructions. When executed by a cluster of computing devices including at least one computing device, the computer program instructions cause the cluster of computing devices to perform the methods in the above embodiments. Exemplarily, the computer-readable storage medium can be any available medium that the computing device can store, or a data storage device such as a data center containing one or more available media. The available media can be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media (e.g., solid-state drives).

[0123] Based on the methods in the above embodiments, this application provides a computer program product containing instructions that, when executed by a cluster of computing devices containing at least one computing device, cause the cluster of computing devices to perform the methods in the above embodiments.

[0124] It is understood that the processor in the embodiments of this application may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. A general-purpose processor may be a microprocessor or any conventional processor.

[0125] The method steps in the embodiments of this application can be implemented in hardware or by a processor executing software instructions. The software instructions can consist of corresponding software modules, which can be stored in random access memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disks, portable hard disks, CD-ROMs, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor, enabling the processor to read information from and write information to the storage medium. Of course, the storage medium can also be a component of the processor. The processor and the storage medium can reside in an ASIC.

[0126] In the above embodiments, implementation can be achieved entirely or partially through software, hardware, firmware, or any combination thereof. When implemented using software, it can be implemented entirely or partially in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium. The computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that integrates one or more available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid-state disk (SSD)).

[0127] It is understood that the various numerical designations used in the embodiments of this application are merely for descriptive convenience and are not intended to limit the scope of the embodiments of this application.

[0128] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the protection scope of the technical solutions of the embodiments of this application.

Claims

1. A method for rewriting Structured Query Language (SQL) statements, characterized in that, The method includes: Get the first SQL statement to be rewritten; Based on the similarity between the first SQL statement and multiple rewrite cases, at least one first rewrite case is selected from the multiple rewrite cases, wherein a rewrite case includes an example SQL statement and a rewritten SQL statement after rewriting the example SQL statement. Based on the similarity between the first SQL statement and the at least one first rewrite case, and the association between the at least one first rewrite case and the multiple rewrite rules, the multiple rewrite rules are evaluated to obtain the evaluation results of each of the multiple rewrite rules, wherein a rewrite rule is a rule used to rewrite the SQL statement; The first SQL statement is rewritten based on at least one first rewrite rule among the plurality of rewrite rules to obtain the second SQL statement, wherein the at least one first rewrite rule is a rule among the plurality of rewrite rules whose evaluation result meets the requirements.

2. The method according to claim 1, characterized in that, The method further includes: Based on the first SQL statement and the at least one first rewrite case, a query rewrite strategy for the first SQL statement is obtained, wherein the query rewrite strategy indicates that the first SQL statement is rewritten with reference to the at least one first rewrite case; The step of rewriting the first SQL statement based on at least one first rewrite rule among the plurality of rewrite rules to obtain the second SQL statement includes: Based on the query rewriting strategy, a second rewriting rule is selected from the at least one first rewriting rule; Based on the second rewriting rule, the first SQL statement is rewritten to obtain the second SQL statement.

3. The method according to claim 2, characterized in that, The query rewriting strategy for the first SQL statement, based on the first SQL statement and the at least one first rewrite case, includes: The at least one first rewrite case and the first SQL statement are added to the first prompt template to obtain the first prompt, which is used to instruct the neural network model to generate the query rewrite strategy that rewrites the first SQL statement with reference to the first rewrite case. The first prompt is input into the neural network model to obtain the query rewriting strategy.

4. The method according to claim 2 or 3, characterized in that, The step of selecting a second rewrite rule from at least one first rewrite rule based on the query rewrite strategy includes: Based on the query rewriting strategy, the first rewriting rule is filtered in multiple batches. In the process of filtering a batch, the query rewriting strategy, the filtering results obtained before the batch, and the rules to be filtered in the batch are input into the neural network model to obtain the filtering results of the batch.

5. The method according to any one of claims 2-4, characterized in that, The step of rewriting the first SQL statement based on the second rewriting rule to obtain the second SQL statement includes: Based on the operators in the first SQL statement, the second rewrite rules are grouped to obtain at least one set of rules; Based on the query rewriting strategy, the rules contained in each of the at least one set of rules are sorted. Based on the sorted rules in each set of rules, sort the rules in each set of rules as a whole, with the rule set as the granularity; The first SQL statement is rewritten based on the sorted sets of rules from the at least one set of rules.

6. The method according to any one of 1-5, characterized in that, The evaluation of the multiple rewriting rules based on the similarity between the first SQL statement and the at least one first rewriting case, and the association between the at least one first rewriting case and the multiple rewriting rules, to obtain the evaluation results of each of the multiple rewriting rules, includes: When there is a correlation between the third rewriting rule and the second rewriting case, the evaluation score of the third rewriting rule based on the second rewriting case is the similarity score between the second rewriting case and the first SQL statement. The third rewriting rule is any one of the multiple rewriting rules, and the second rewriting case is any one of the at least one first rewriting case. If there is no correlation between the third rewriting rule and the second rewriting case, the evaluation score of the third rewriting rule based on the second rewriting case shall be the initial score of the third rewriting rule. The evaluation result of the third rewriting rule is calculated based on the evaluation scores of all the first rewriting cases, respectively.

7. The method according to any one of claims 1-6, characterized in that, After obtaining the second SQL statement, the following is also included: Output the second SQL statement and a query rewrite suggestion for the second SQL statement, wherein the query rewrite suggestion includes the at least one first rewrite rule.

8. The method according to any one of claims 1-7, characterized in that, After obtaining the second SQL statement, the following is also included: If the execution cost of the second SQL statement is less than or equal to the execution cost of the first SQL statement, a query is performed on the database based on the second SQL statement, and the query result is returned to the user. If the execution cost of the second SQL statement is greater than the execution cost of the first SQL statement, a query is performed on the database based on the first SQL statement, and the query result is returned to the user.

9. The method according to any one of claims 1-8, characterized in that, The at least one rewriting rule is the top N rewriting rules after the evaluation results of the plurality of rewriting rules are sorted, where N≥1 and N is a positive integer.

10. A device for rewriting Structured Query Language (SQL) statements, characterized in that, include: The acquisition module is used to retrieve the first SQL statement to be rewritten. The processing module is used to select at least one first rewrite case from the multiple rewrite cases based on the similarity between the first SQL statement and multiple rewrite cases, wherein a rewrite case includes an example SQL statement and a rewritten SQL statement after rewriting the example SQL statement; The processing module is further configured to evaluate the multiple rewriting rules based on the similarity between the first SQL statement and the at least one first rewriting case, and the association between the at least one first rewriting case and the multiple rewriting rules, so as to obtain the evaluation results of the multiple rewriting rules respectively, wherein a rewriting rule is a rule used to rewrite the SQL statement; The processing module is further configured to rewrite the first SQL statement based on at least one first rewrite rule among the plurality of rewrite rules to obtain a second SQL statement, wherein the at least one first rewrite rule is a rule among the plurality of rewrite rules whose evaluation result meets the requirements.

11. The apparatus according to claim 10, characterized in that, The processing module is further configured to: Based on the first SQL statement and the at least one first rewrite case, a query rewrite strategy for the first SQL statement is obtained, wherein the query rewrite strategy indicates that the first SQL statement is rewritten with reference to the at least one first rewrite case; When the processing module rewrites the first SQL statement based on at least one first rewrite rule among the plurality of rewrite rules to obtain the second SQL statement, it is specifically used for: Based on the query rewriting strategy, a second rewriting rule is selected from the at least one first rewriting rule; Based on the second rewriting rule, the first SQL statement is rewritten to obtain the second SQL statement.

12. The apparatus according to claim 11, characterized in that, When the processing module obtains the query rewriting strategy for the first SQL statement based on the first SQL statement and the at least one first rewriting case, it is specifically used for: The at least one first rewrite case and the first SQL statement are added to the first prompt template to obtain the first prompt, which is used to instruct the neural network model to generate the query rewrite strategy that rewrites the first SQL statement with reference to the first rewrite case. The first prompt is input into the neural network model to obtain the query rewriting strategy.

13. The apparatus according to claim 11 or 12, characterized in that, When the processing module selects a second rewrite rule from at least one first rewrite rule based on the query rewrite strategy, it is specifically used for: Based on the query rewriting strategy, the first rewriting rule is filtered in multiple batches. In the filtering process of one batch, the query rewriting strategy, the filtering results obtained before the first batch, and the rules to be filtered in the first batch are input into the neural network model to obtain the filtering results of the first batch.

14. The apparatus according to any one of claims 11-13, characterized in that, When the processing module rewrites the first SQL statement based on the second rewriting rule to obtain the second SQL statement, it is specifically used for: Based on the operators in the first SQL statement, the second rewrite rules are grouped to obtain at least one set of rules; Based on the query rewriting strategy, the rules contained in each of the at least one set of rules are sorted. Based on the sorted rules in each set of rules, sort the rules in each set of rules as a whole, with the rule set as the granularity; Based on the sorted sets of rules in the at least one set of rules, the first SQL statement is rewritten.

15. The apparatus according to any one of claims 10-14, characterized in that, When the processing module evaluates the multiple rewriting rules based on the similarity between the first SQL statement and the at least one first rewrite case, and the association between the at least one first rewrite case and the multiple rewrite rules, to obtain the evaluation results of each of the multiple rewrite rules, it is specifically used for: When there is a correlation between the third rewriting rule and the second rewriting case, the evaluation score of the third rewriting rule based on the second rewriting case is the similarity score between the second rewriting case and the first SQL statement. The third rewriting rule is any one of the multiple rewriting rules, and the second rewriting case is any one of the at least one first rewriting case. If there is no correlation between the third rewriting rule and the second rewriting case, the evaluation score of the third rewriting rule based on the second rewriting case shall be the initial score of the third rewriting rule. The evaluation result of the third rewriting rule is calculated based on the evaluation scores of all the first rewriting cases, respectively.

16. The apparatus according to any one of claims 10-15, characterized in that, After obtaining the second SQL statement, the processing module is further configured to: Output the second SQL statement and a query rewrite suggestion for the second SQL statement, wherein the query rewrite suggestion includes the at least one first rewrite rule.

17. The apparatus according to any one of claims 10-16, characterized in that, After obtaining the second SQL statement, the processing module is further configured to: If the execution cost of the second SQL statement is less than or equal to the execution cost of the first SQL statement, a query is performed on the database based on the second SQL statement, and the query result is returned to the user. If the execution cost of the second SQL statement is greater than the execution cost of the first SQL statement, a query is performed on the database based on the first SQL statement, and the query result is returned to the user.

18. The apparatus according to any one of claims 10-17, characterized in that, The at least one rewriting rule is the rewriting rule that ranks among the top N rewriting rules after the evaluation results are sorted, where N≥1 and N is a positive integer.

19. A computing device cluster, characterized in that, It includes at least one computing device, each computing device including a processor and memory; The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the method as described in any one of claims 1-9.

20. A computer-readable storage medium, characterized in that, The method includes computer program instructions that, when executed by a cluster of computing devices, cause the cluster of computing devices to perform the method as described in any one of claims 1-9, wherein the cluster of computing devices includes at least one computing device.

21. A computer program product containing instructions, characterized in that, When the instruction is executed by the computing device cluster, the computing device cluster causes the computing device cluster to perform the method as described in any one of claims 1-9, wherein the computing device cluster includes at least one computing device.