Data query method and device, computer device, storage medium and computer program product

By converting the connection type of the downstream inner join operator to a semi-join in the distributed computing engine, the logical execution plan of data query is optimized, the problem of excessive computing resource consumption in join operations is solved, and more efficient data query is achieved.

CN122196009APending Publication Date: 2026-06-12TENCENT TECHNOLOGY (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date
2024-12-10
Publication Date
2026-06-12

Smart Images

  • Figure CN122196009A_ABST
    Figure CN122196009A_ABST
Patent Text Reader

Abstract

The application relates to a data query method and device, computer equipment, a storage medium and a computer program product. The method comprises the following steps: obtaining a target logical execution plan corresponding to a query statement, the target logical execution plan comprising an initial sub-plan, the initial sub-plan comprising an upstream aggregation operator, a projection operator connected with the upstream aggregation operator and a downstream inner connection operator connected with the projection operator; if the output parameter of the downstream inner connection operator to a left sub-operator comprises the output parameter of the projection operator, and the upstream aggregation operator is of a type not carrying an aggregation function, the connection type of the downstream inner connection operator is converted into semi-connection to obtain a target sub-plan comprising a converted downstream semi-connection operator; an optimized logical execution plan is obtained based on the target sub-plan, and query execution is performed based on the optimized logical execution plan to obtain a query result corresponding to the query statement. The method can improve query efficiency.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and in particular to a data query method, apparatus, computer equipment, storage medium, and computer program product. Background Technology

[0002] With the development of computer technology, distributed computing technology has emerged. Distributed computing is a technical solution that distributes computing tasks across multiple computing nodes for parallel computation to obtain the final result. Currently, distributed computing is commonly applied to scenarios such as processing large-scale datasets, real-time data processing, streaming data, and iterative computation through distributed computing engines.

[0003] Currently, distributed computing engines typically involve aggregation operations and join operations during data query processing. However, join operations often require processing large amounts of unnecessary data, increasing computational resource consumption and reducing query efficiency. Summary of the Invention

[0004] Therefore, it is necessary to provide a data query method, apparatus, computer equipment, computer-readable storage medium, and computer program product that can save computing resources and improve data query efficiency in response to the above-mentioned technical problems.

[0005] Firstly, this application provides a data query method. The method includes:

[0006] Obtain the target logical execution plan corresponding to the query statement. The target logical execution plan includes an initial subplan, which includes an upstream aggregation operator, a projection operator connected to the upstream aggregation operator, and a downstream inner join operator connected to the projection operator.

[0007] If the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, and the upstream aggregation operator is of a type that does not carry aggregation functions, then the connection type of the downstream inner join operator is converted to a semi-join, resulting in the target sub-plan that includes the converted downstream semi-join operator.

[0008] The optimized logical execution plan is obtained based on the target subplan, and the query is executed based on the optimized logical execution plan to obtain the query results corresponding to the query statement.

[0009] Secondly, this application also provides a data query device. The device includes:

[0010] The execution plan acquisition module is used to acquire the target logical execution plan corresponding to the query statement. The target logical execution plan includes an initial sub-plan, which includes an upstream aggregation operator, a projection operator connected to the upstream aggregation operator, and a downstream inner join operator connected to the projection operator.

[0011] The type conversion module is used to convert the connection type of the downstream inner connection operator to a semi-connection if the output parameters of the left sub-operator corresponding to the downstream inner connection operator include the output parameters of the projection operator and the upstream aggregation operator is a type that does not carry aggregation functions, so as to obtain the target sub-plan that includes the converted downstream semi-connection operator.

[0012] The query execution module is used to obtain an optimized logical execution plan based on the target sub-plan, and to execute queries based on the optimized logical execution plan to obtain the query results corresponding to the query statements.

[0013] Thirdly, this application also provides a computer device. The computer device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to perform the following steps:

[0014] Obtain the target logical execution plan corresponding to the query statement. The target logical execution plan includes an initial subplan, which includes an upstream aggregation operator, a projection operator connected to the upstream aggregation operator, and a downstream inner join operator connected to the projection operator.

[0015] If the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, and the upstream aggregation operator is of a type that does not carry aggregation functions, then the connection type of the downstream inner join operator is converted to a semi-join, resulting in the target sub-plan that includes the converted downstream semi-join operator.

[0016] The optimized logical execution plan is obtained based on the target subplan, and the query is executed based on the optimized logical execution plan to obtain the query results corresponding to the query statement.

[0017] Fourthly, this application also provides a computer-readable storage medium. The computer-readable storage medium stores a computer program thereon, which, when executed by a processor, performs the following steps:

[0018] Obtain the target logical execution plan corresponding to the query statement. The target logical execution plan includes an initial subplan, which includes an upstream aggregation operator, a projection operator connected to the upstream aggregation operator, and a downstream inner join operator connected to the projection operator.

[0019] If the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, and the upstream aggregation operator is of a type that does not carry aggregation functions, then the connection type of the downstream inner join operator is converted to a semi-join, resulting in the target sub-plan that includes the converted downstream semi-join operator.

[0020] The optimized logical execution plan is obtained based on the target subplan, and the query is executed based on the optimized logical execution plan to obtain the query results corresponding to the query statement.

[0021] Fifthly, this application also provides a computer program product. The computer program product includes a computer program that, when executed by a processor, performs the following steps:

[0022] Obtain the target logical execution plan corresponding to the query statement. The target logical execution plan includes an initial subplan, which includes an upstream aggregation operator, a projection operator connected to the upstream aggregation operator, and a downstream inner join operator connected to the projection operator.

[0023] If the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, and the upstream aggregation operator is of a type that does not carry aggregation functions, then the connection type of the downstream inner join operator is converted to a semi-join, resulting in the target sub-plan that includes the converted downstream semi-join operator.

[0024] The optimized logical execution plan is obtained based on the target subplan, and the query is executed based on the optimized logical execution plan to obtain the query results corresponding to the query statement.

[0025] The aforementioned data query method, apparatus, computer equipment, storage medium, and computer program product obtain the target logical execution plan corresponding to the query statement. The target logical execution plan includes an initial sub-plan, which comprises an upstream aggregation operator, a projection operator connected to the upstream aggregation operator, and a downstream inner join operator connected to the projection operator. If the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, and the upstream aggregation operator is of a type that does not carry aggregation functions, then the join type of the downstream inner join operator is converted to a semi-join, resulting in a target sub-plan including the converted downstream semi-join operator. Based on the target sub-plan, an optimized logical execution plan is obtained, and the query is executed based on the optimized logical execution plan to obtain the query result corresponding to the query statement. In other words, if the target logic execution plan, including the initial sub-plan, detects that the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, and the upstream aggregation operator is of a type that does not carry aggregation functions, the connection type of the downstream inner join operator can be converted from inner join to semi join for optimized execution. Since the semi join operator can reduce the amount of computation and result involved in the join operation compared to the inner join operator, it can save the consumption of computing resources, reduce the execution time of data query, and improve the efficiency of data query. Attached Figure Description

[0026] Figure 1 This is a diagram illustrating the application environment of a data query method in one embodiment.

[0027] Figure 2 This is a flowchart illustrating a data query method in one embodiment;

[0028] Figure 3 This is a schematic diagram of the structure of the initial sub-plan in a specific embodiment;

[0029] Figure 4 This is a schematic diagram of the structure of a target sub-plan in a specific embodiment;

[0030] Figure 5 This is a flowchart illustrating the process of obtaining the target sub-plan in one embodiment;

[0031] Figure 6 This is a schematic diagram of the structure of the left sub-plan including the initial sub-plan in a specific embodiment;

[0032] Figure 7 This is a schematic diagram of the structure of the right sub-plan including the initial sub-plan in a specific embodiment;

[0033] Figure 8 This is a flowchart illustrating a data query method in a specific embodiment;

[0034] Figure 9 This is a schematic diagram of the data query architecture in a specific embodiment;

[0035] Figure 10 This is a schematic diagram illustrating the technical principle of data query in a specific embodiment;

[0036] Figure 11 This is a schematic diagram of the analysis before optimization in a specific embodiment;

[0037] Figure 12 This is an optimized analysis diagram in a specific embodiment;

[0038] Figure 13 This is a structural block diagram of a data query device in one embodiment;

[0039] Figure 14 This is an internal structural diagram of a computer device in one embodiment;

[0040] Figure 15 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation

[0041] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0042] The data query method provided in this application embodiment can be applied to, for example... Figure 1 In the application environment shown, terminal 102 communicates with distributed server cluster 104 via a network. This distributed server cluster 104 can be established using a distributed engine architecture and includes a management node 104a. The management node 104a in the distributed server cluster 104 can obtain a query request from terminal 102. This query request carries a join query statement. The management node 104a in the distributed server cluster 104 obtains the target logic execution plan corresponding to the query statement. The target logic execution plan includes an initial sub-plan, which includes an upstream aggregation operator, a projection operator connected to the upstream aggregation operator, and a downstream inner join operator connected to the projection operator. If the management node 104a in the distributed server cluster 104 detects that the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, and the upstream aggregation operator is not of a type carrying an aggregation function, then it converts the join type of the downstream inner join operator to a semi-join, obtaining a target sub-plan including the converted downstream semi-join operator. In the distributed server cluster 104, the management node 104a obtains an optimized logical execution plan based on the target sub-plan, and performs query execution based on the optimized logical execution plan to obtain the query results corresponding to the query statement. The management node 104a in the distributed server cluster 104 can return the query results to the terminal 102. The terminal 102 can be, but is not limited to, various desktop computers, laptops, smartphones, tablets, IoT devices, and portable wearable devices. IoT devices can include smart speakers, smart TVs, smart air conditioners, smart in-vehicle devices, etc. Portable wearable devices can include smartwatches, smart bracelets, head-mounted devices, etc. The management node 104a can be a computer device, which can be a terminal or a server. The terminal and server can be directly or indirectly connected via wired or wireless communication, which is not limited herein.

[0043] In one embodiment, such as Figure 2 As shown, a data query method is provided, which can be applied to... Figure 1 Taking management node 104a as an example, it can be understood that this method can also be applied to terminals, and also to systems including terminals and servers, and is implemented through the interaction between the terminal and the server. In this embodiment, the method includes the following steps:

[0044] S202, obtain the target logical execution plan corresponding to the query statement. The target logical execution plan includes an initial sub-plan, which includes an upstream aggregation operator, a projection operator connected to the upstream aggregation operator, and a downstream inner join operator connected to the projection operator.

[0045] A query statement, in this context, refers to a database query statement. It's a set of syntax rules and commands used to interact with a database, retrieve or manipulate data, and can retrieve data from the database. For example, a query statement can retrieve data from a distributed database. This distributed database is deployed in a distributed server cluster and can be built using a distributed computing engine architecture. Distributed computing engines mainly include Hadoop (a distributed system infrastructure developed by the Apache Software Foundation), Spark (a distributed computing framework based on in-memory computing), Flink (an open-source stream processing framework), Presto (a distributed SQL query engine that can independently provide computational and analytical operations), Hive (a data warehouse tool built on top of Hadoop), HBase (a distributed, column-oriented open-source database), Cassandra (an open-source distributed key-value storage system), Kafka (an open-source stream processing platform), Elasticsearch (a distributed search and analysis engine at the core of the Elastic Stack), and Druid (a distributed data processing system that supports real-time multidimensional OLAP analysis and columnar storage), among others.

[0046] A logical execution plan is a step-by-step guide generated by the database query optimizer based on the textual representation of the data query, describing how to execute the query. This logical execution plan can contain multiple operation operators, such as scans, joins, sorts, and aggregations. It typically does not contain actual data pages and rows, but rather describes how to manipulate them. Logical execution plans are primarily used for query optimization, guiding the generation of physical execution plans. A target logical execution plan refers to the logical execution plan that includes an initial subplan. An initial subplan is a portion of the target logical execution plan, comprising upstream aggregation operators, projection operators connected to upstream aggregation operators, and downstream inner join operators connected to projection operators. In other words, the initial subplan contains both join and aggregation operators, with aggregation operators preceding join operators. For example, when the initial subplan is represented using a tree structure, the parent node is the aggregation operator, the intermediate nodes are projection operators, and the child nodes are join operators.

[0047] An upstream aggregation operator refers to an aggregation operator located upstream in the initial sub-plan. This aggregation operator is the Aggregate operator, which is used to summarize and calculate a set of data to produce a single result value. Aggregation operations are commonly used in statistical analysis, data summarization, and other similar scenarios. In other words, the Aggregate operator uses an operator to perform aggregation operations, such as COUNT, SUM, AVG, MIN, and MAX. Aggregation operations can combine multiple rows of data into a single result.

[0048] Projection operators are computational operators used to define the structure of data output by specifying the columns to be displayed (or computed). In Spark SQL (Spark Structured Query Language), project operations in the DataFrame or DataSet API manifest as the selection or computation of columns, typically implemented through operations such as select (which selects one or more columns and returns a new DataFrame) or selectExpr (which selects columns and performs expression evaluations or renames them).

[0049] A downstream inner join operator refers to an inner join operator that is downstream of the upstream aggregation operator. An inner join operator is a join operator of the inner join type, specifically an inner join. Inner join is the most common join type in databases; it returns rows from two tables that satisfy the join condition. When two tables have matching records, an inner join merges these records into a single row and returns the result. If no matching rows are found, the result will not include that row.

[0050] Specifically, the management node can directly obtain the target logical execution plan, including the initial sub-plan, corresponding to the query statement. For example, the management node can obtain the target logical execution plan corresponding to the query statement uploaded by the terminal. The management node can also obtain the target logical execution plan corresponding to the query statement from a third party, such as a service provider offering data query services. Furthermore, the management node can directly obtain the target logical execution plan corresponding to the query statement stored in the database, and so on.

[0051] In some embodiments, the management node can obtain a query statement that includes aggregation and join operations, with the aggregation operation preceding the join operation, and then parse the query statement to obtain the target logic execution plan corresponding to the query statement.

[0052] In some embodiments, the management node can obtain a query statement, parse the query statement to obtain an initial logical execution plan, and then determine whether the initial logical execution plan includes an initial sub-plan. If it includes an initial sub-plan, it means that the initial logical execution plan is the target logical execution plan. If it does not include an initial sub-plan, it means that the initial logical execution plan corresponding to the query statement is not the target logical execution plan. In this case, the management node directly continues to execute the query according to the initial logical execution plan corresponding to the query statement.

[0053] In a specific embodiment, such as Figure 3 The diagram shows the structure of the initial sub-plan, represented using a tree structure. It includes Aggregate, Project, and Join nodes. The Join operator uses an inner join (jointype=Inner) and also includes a LeftChild node and a RightChild node. The Aggregate node connects to the Project node, the Project node connects to the Join node, and the Join node connects to its left and right child nodes.

[0054] S204, if the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, and the upstream aggregation operator is of a type that does not carry aggregation functions, then the connection type of the downstream inner join operator is converted to a semi-join, resulting in a target sub-plan that includes the converted downstream semi-join operator.

[0055] In this context, the downstream inner join operator corresponding to the left child operator refers to the operator executed on the left-hand data table during the join operation. The left-hand data table can be the first data table selected during the join in the query statement. The output parameter of the left child operator refers to the value of the outputSet (output result set) of the left child node of the Join. This output parameter of the left child operator typically includes the output parameter of the projection operator, which refers to the value of the outputSet of the Project. This output parameter of the projection operator typically includes information about the columns that the query result set should contain, such as column names, column aliases, the source table of the column, the source column of the column, the source operator of the splitting column, etc. The join type is used to characterize the execution method when performing a join query. The transformed downstream semi-join operator refers to the join operator obtained after converting the join type of the downstream inner join operator; this join operator's join type is a semi-join type. For example, it could be a Semi Join, a special type of join used to return rows in one table that match those in another table, but not the column data of the right table. Semi Join only cares about whether there are matching records and does not return all data from the right table. Therefore, the result set of a Semi Join only includes records from the left table that meet the conditions. The target subplan refers to the subplan obtained after the join type of the join operator in the initial subplan is changed from inner join to semi join.

[0056] Specifically, the management node determines whether the initial sub-plan meets the conditions for conversion of the join type. This can begin by checking if the output parameters of the left child operator corresponding to the downstream inner join operator include the output parameters of the projection operator. For example, the management node can obtain the output parameters of the left child operator corresponding to the downstream inner join operator and the output parameters of the projection operator, and then compare them. For instance, it can determine if the outputSet of the Project is a subset of the outputSet of the left child node of the Join. If the outputSet of the Project is a subset of the outputSet of the left child node of the Join, then the condition is met. At this point, the management node can further determine whether the upstream aggregation operator is a type that does not carry aggregation functions. This means that the aggregation operator will not perform aggregation function operations during execution. For example, it can determine whether the Aggregate operator is a grouponly type, which is a type that does not carry aggregation functions, i.e., it will not perform aggregation function operations such as COUNT, SUM, AVG, MIN, MAX, etc. The Aggregate operator can be a deduplication operator. In this case, the Aggregate operator is a type that does not carry aggregate functions. The management node can determine whether it is a type that does not carry aggregate functions based on the specific function information to be executed corresponding to the upstream aggregate operator. When the function information does not include an aggregate function, the upstream aggregate operator is a type that does not carry aggregate functions.

[0057] Finally, the management node determines whether to perform a connection type conversion if all connection types are satisfied. This involves converting the connection type of the downstream inner join operator to a semi-join, which can be done by changing the connection type of the join operator from inner join to semi-join. At this point, the target sub-plan, including the converted downstream semi-join operators, is obtained.

[0058] In a specific embodiment, such as Figure 4 The diagram illustrates the structure of the target sub-plan, represented using a tree structure. It includes Aggregate, Project, and Join nodes. The Join operator is a semi-join type (jointype=Smei), and also includes a LeftChild node and a RightChild node. The Aggregate node connects to the Project node, the Project node connects to the Join node, and the Join node connects to its left and right child nodes.

[0059] S206. Based on the target subplan, an optimized logical execution plan is obtained, and the query is executed based on the optimized logical execution plan to obtain the query results corresponding to the query statement.

[0060] The optimized logical execution plan refers to the logical execution plan obtained by converting the initial sub-plan in the target logical execution plan into the target sub-plan. The query result refers to the query result corresponding to the query statement, processed through distributed computing in a distributed database.

[0061] In this embodiment, the management node obtains the optimized logical execution plan based on the target sub-plan, and continues to perform query execution based on the optimized logical execution plan. This can be achieved by using the optimized logical execution plan to generate the corresponding physical execution plan, and then performing query execution on the physical execution plan to obtain the query results corresponding to the query statement.

[0062] In some embodiments, the initial sub-plans in the target logical execution plan may include at least two, and each initial sub-plan in the target logical execution plan can be traversed. That is, if the output parameters of the left sub-operator corresponding to the downstream inner join operator in each initial sub-plan include the output parameters of the projection operator, and the upstream aggregation operator is a type that does not carry aggregation functions, then the join type of the downstream inner join operator is converted to a semi-join, resulting in the target sub-plan obtained after the transformation of each initial sub-plan. Then, the optimized logical execution plan can be obtained based on all the target sub-plans, and finally, the query execution continues according to the optimized logical execution plan. In other words, all initial sub-plans that meet the conditions are transformed to obtain the transformed target sub-plans, and then the query execution is performed according to all the optimized target sub-plans, thereby further improving query efficiency.

[0063] In some embodiments, if at least two initial subplans contain output parameters of the left sub-operator corresponding to the downstream inner join operator in the target initial subplan that include the output parameters of the projection operator, and the upstream aggregation operator is not of a type carrying aggregation functions, then the join type of the downstream inner join operator in the target initial subplan is converted to a semi-join, resulting in the target subplan after the target initial subplan is converted. If at least two initial subplans contain non-target initial subplans that do not meet the join type conversion conditions, for example, if the output parameters of the left sub-operator corresponding to the downstream inner join operator in the non-target initial subplan do not include the output parameters of the projection operator, or if the upstream aggregation operator is not of a type not carrying aggregation functions, then no join type conversion is performed. Finally, the query execution continues according to the logical execution plan after the partial initial subplan conversion, which can improve query execution efficiency while ensuring the accuracy of query execution.

[0064] The aforementioned data query method, apparatus, computer equipment, storage medium, and computer program product obtain the target logical execution plan corresponding to the query statement. The target logical execution plan includes an initial sub-plan, which comprises an upstream aggregation operator, a projection operator connected to the upstream aggregation operator, and a downstream inner join operator connected to the projection operator. If the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, and the upstream aggregation operator is of a type that does not carry aggregation functions, then the join type of the downstream inner join operator is converted to a semi-join, resulting in a target sub-plan including the converted downstream semi-join operator. Based on the target sub-plan, an optimized logical execution plan is obtained, and the query is executed based on the optimized logical execution plan to obtain the query result corresponding to the query statement. In other words, if the target logic execution plan, including the initial sub-plan, detects that the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, and the upstream aggregation operator is of a type that does not carry aggregation functions, the connection type of the downstream inner join operator can be converted from inner join to semi join for optimized execution. Since the semi join operator can reduce the amount of computation and result involved in the join operation compared to the inner join operator, it can save the consumption of computing resources, reduce the execution time of data query, and improve the efficiency of data query.

[0065] In some embodiments, such as Figure 5 As shown, S204, if the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, and the upstream aggregation operator is of a type that does not carry aggregation functions, then the join type of the downstream inner join operator is converted to a semi-join, resulting in a target sub-plan that includes the converted downstream semi-join operator, including:

[0066] S502, if the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, and the upstream aggregation operator is of a type that does not carry aggregation functions, then obtain the connection execution strategy information corresponding to the downstream inner join operator;

[0067] S504a, if the connection execution strategy information is a non-broadcast connection execution strategy, then the connection type of the downstream internal connection operator is converted into a half-connection, resulting in a target sub-plan that includes the converted downstream half-connection operator.

[0068] The join execution strategy information refers to the join execution strategy corresponding to the join operator. The join execution strategy characterizes the distributed execution method adopted by the join operator during execution. This strategy can include broadcast join execution strategy and non-broadcast join execution strategy. A broadcast join execution strategy refers to a strategy that performs join execution via broadcast. This strategy broadcasts the data to be joined to each execution node for join operations. A non-broadcast join execution strategy refers to a strategy that performs join execution in a non-broadcast manner, such as a sort-merge join strategy. In Spark, join strategies include Broadcast Join and SortMergeJoin strategies.

[0069] In this embodiment, if the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, and the upstream aggregation operator is not of a type carrying an aggregation function, the management node further needs to determine whether the connection execution strategy information corresponding to the downstream inner join operator meets the conditions for connection type conversion. If the connection execution strategy information is a non-broadcast connection execution strategy, it indicates that the execution method of the connection operator at this time is an execution method that consumes more resources. At this time, connection type conversion can be performed to optimize the subsequent execution method and improve execution performance. Then, the management node can convert the connection type of the downstream inner join operator to a semi-connection, obtaining the target sub-plan including the converted downstream semi-connection operator. Specifically, when determining whether the connection execution strategy information corresponding to the downstream inner join operator meets the conditions for connection type conversion, the management node can obtain the connection execution strategy information corresponding to the downstream inner join operator, and then determine whether the connection execution strategy information is a non-broadcast connection execution strategy. If the connection execution strategy information is a non-broadcast connection execution strategy, the connection type of the downstream inner join operator is converted to a semi-connection, obtaining the target sub-plan including the converted downstream semi-connection operator.

[0070] In the above embodiments, connection type conversion is only performed when the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, the upstream aggregation operator is of a type that does not carry aggregation functions, and the connection execution strategy information is a non-broadcast connection execution strategy. That is, performing connection type conversion when using a non-broadcast connection execution strategy can reduce the amount of computation during connection execution, thereby reducing the execution time of data queries and improving the efficiency of data queries.

[0071] In some embodiments, such as Figure 5 As shown, in S502, the connection execution strategy corresponding to the downstream inner join operator is obtained, including:

[0072] S502a, Obtain the data statistics information of the left sub-operator corresponding to the downstream inner connection operator;

[0073] S502b, if the data statistics information does not meet the statistical information conditions of the preset broadcast connection execution policy, the obtained connection execution policy information is a non-broadcast connection execution policy;

[0074] S502c, if the data statistics information meets the statistical information conditions of the preset broadcast connection execution strategy, then the connection execution strategy information obtained is the broadcast connection execution strategy.

[0075] Among them, data statistics refer to the statistical information of the data required for the left sub-operator to perform operator execution. This statistical information can be the data volume, the number of rows, the number of columns, the data distribution, and the cardinality of the columns, etc. The statistical information conditions of the preset broadcast join execution strategy refer to the statistical information conditions that are set in advance when performing join execution according to the broadcast join execution strategy. This can be a pre-set data volume threshold for joining execution according to the broadcast join execution strategy.

[0076] In this embodiment, the management node can perform statistical queries on the data of the left sub-operator corresponding to the downstream internal connection operator to obtain the data statistics of the left sub-operator. For example, this data statistics can be CBO (Cost-Based Optimization) statistics. Then, the management node checks whether the data statistics meet the statistical conditions of a preset broadcast connection execution strategy. For example, it can compare the data volume corresponding to the left sub-operator with a preset data volume threshold. When the data volume corresponding to the left sub-operator is less than the preset data volume threshold, it indicates that the data statistics meet the statistical conditions of the preset broadcast connection execution strategy. In this case, the management node can determine that the connection execution strategy is a broadcast connection execution strategy. When the data volume corresponding to the left sub-operator is greater than the preset data volume threshold, it indicates that the data statistics do not meet the statistical conditions of the preset broadcast connection execution strategy. In this case, the management node can determine that the connection execution strategy is a non-broadcast connection execution strategy.

[0077] In some embodiments, the management node can determine the connection execution strategy information of the downstream internal join operator by detecting whether the statistical information of the creation data corresponding to the downstream internal join operator meets the preset broadcast connection execution conditions. Here, creation data typically refers to the data on the build side during connection execution. The side with smaller data volume is usually called the build side, and the side with larger data volume is called the probe side. That is, the data volume on the build side is less than the data volume on the probe side. The statistical information of the creation data refers to the information obtained from statistics on the data corresponding to the build side. This statistical information may include, but is not limited to, the data volume, the number of rows, the number of columns, the data distribution, and the cardinality of the columns, etc. The preset broadcast connection execution conditions are pre-set trigger conditions for connection execution via broadcast. For example, the data volume on the build side may be less than a preset broadcast threshold, such as 10M (megabytes), or it can be set according to requirements. For example, in a production environment, the preset broadcast threshold can be set according to the memory size of the execution node, such as 2G (gigabytes). The preset broadcast connection execution conditions may also include other conditions, such as the proportion of empty files in the creation data being less than the configuration item. When the statistical information of the created data meets the preset broadcast connection execution conditions, the connection execution strategy information of the downstream inner join operator is determined to be a broadcast connection execution strategy. When the statistical information of the created data does not meet the preset broadcast connection execution conditions, the connection execution strategy information of the downstream inner join operator is determined to be a non-broadcast connection execution strategy.

[0078] In the above embodiments, by obtaining the data statistics of the left sub-operator corresponding to the downstream inner join operator, and then using whether the data statistics meet the statistical conditions of the preset broadcast join execution strategy, the join execution strategy information is obtained. That is, by judging through the data statistics of the left sub-operator, the possibility of judgment errors can be reduced, thereby improving the accuracy of the obtained join execution strategy information.

[0079] It is understandable that when the connection execution strategy information corresponding to the downstream internal join operator is a broadcast connection execution strategy, since the broadcast connection execution strategy itself has high execution efficiency, there is no need to perform connection type conversion. This prevents the problem that performing connection type conversion would lead to a decrease in execution efficiency compared to broadcast connection execution. In some embodiments, such as... Figure 5 As shown, the data query methods also include:

[0080] S504b, if the connection execution strategy information corresponding to the downstream inner join operator is a broadcast connection execution strategy, then the query execution is performed based on the target logical execution plan including the initial sub-plan, and the query result corresponding to the query statement is obtained.

[0081] In this embodiment, when the management node detects that the join execution strategy information corresponding to the downstream inner join operator is a broadcast join execution strategy, it indicates that the downstream inner join operator performs join operations by broadcasting to each execution node during query execution. At this time, the management node does not need to perform join type conversion and directly performs query execution according to the continued target logical execution plan, including the unconverted initial sub-plan. For example, the target logical execution plan can be converted into a physical execution plan to obtain a physical execution plan, and then executed according to the physical execution plan to obtain the query result corresponding to the query statement.

[0082] In the above embodiments, when the connection execution strategy information corresponding to the downstream inner join operator is detected to be a broadcast connection execution strategy, the query can be executed according to the broadcast connection execution strategy without the need for connection type conversion. This prevents the problem that the execution efficiency will be reduced compared to broadcast connection execution after connection type conversion, and ensures the normal and efficient execution of data query.

[0083] In some embodiments, S202, the target logical execution plan corresponding to the query statement is obtained. The target logical execution plan includes an initial sub-plan, which includes an upstream aggregation operator, a projection operator connected to the upstream aggregation operator, and a downstream inner join operator connected to the projection operator.

[0084] Obtain the initial logical execution plan corresponding to the query statement, and match it with the initial sub-plan within the initial logical execution plan. If an initial sub-plan is matched within the initial logical execution plan, the initial logical execution plan is used as the target logical execution plan for the query statement. If no initial sub-plan is matched within the initial logical execution plan, the query is executed based on the initial logical execution plan to obtain the query results corresponding to the query statement.

[0085] The initial logical execution plan refers to the logical execution plan that needs to be checked to see if it includes the initial subplan. It can be the logical execution plan obtained by parsing the query statement.

[0086] In this embodiment, the management node can obtain the query statement, parse it to obtain an initial logical execution plan, and then check if a matching initial sub-plan exists within the initial logical execution plan. If a matching sub-plan is found, this matching sub-plan may have the same operators or connection relationships as the initial sub-plan. In this case, the management node uses the initial logical execution plan as the target logical execution plan for the query statement and then performs subsequent join type conversion checks. If the management node does not find a matching sub-plan in the initial logical execution plan, for example, if the sub-plan has different operators or connection relationships than the initial sub-plan, then no join type conversion check is needed. The management node can directly use the initial logical execution plan for subsequent query execution. For example, in Spark, the initial logical execution plan can be logically optimized to obtain an optimized logical execution plan. Then, the optimized logical execution plan is used to generate a physical execution plan, and finally, the physical execution plan is used to execute the query to obtain the query result corresponding to the query statement. In a specific embodiment, the management node can match the initial logical execution plan with, for example,... Figure 3 The initial sub-plan shown will be used as the target logical execution plan when it is matched.

[0087] In the above embodiments, by matching the initial sub-plan with the initial logical execution plan, it is possible to detect and determine whether the connection type needs to be converted. If the initial sub-plan is matched with the initial logical execution plan, it is not necessary to detect and determine whether the connection type needs to be converted, thereby avoiding the conversion of the connection type of unmatched sub-plans and ensuring the accuracy of the connection type conversion.

[0088] In some embodiments, matching according to the initial subplan in the initial logical execution plan includes:

[0089] If an upstream aggregation operator is found in the initial logical execution plan, the first operator information pointed to by the pointer corresponding to the upstream aggregation operator is obtained; if the first operator information includes a projection operator, the second operator information pointed to by the pointer corresponding to the projection operator is obtained; if the second operator information includes a downstream inner join operator, the initial sub-plan is matched in the initial logical execution plan.

[0090] In the initial logical execution plan, operators can be stored as arrays and associated with pointers. The first operator information refers to the operators planned for execution after the upstream aggregation operator is executed. The second operator information refers to the operators planned for execution after the projection operator is executed.

[0091] In this embodiment, the management phase searches for aggregation operators in the initial logical execution plan. When an aggregation operator is found, it is used as the upstream aggregation operator. Then, the first operator information pointed to by the pointer corresponding to the upstream aggregation operator is obtained. The existence of a projection operator is then checked in the first operator information. If a projection operator exists, it indicates that the upstream aggregation operator and the projection operator are connected. At this point, the management node obtains the second operator information pointed to by the pointer corresponding to the projection operator. The existence of an inner join operator is then checked in the second operator information. If an inner join operator exists, it is the downstream inner join operator. That is, an initial sub-plan is matched in the initial logical execution plan. For example, the initial logical execution plan can be matched to see if there is an Aggregate operator, where the Aggregate operator pointer points to a Project operator, and the Project operator pointer points to a Join operator. When this sub-plan is matched, it indicates that a join type conversion detection and judgment can be performed on the initial logical execution plan, that is, the initial logical execution plan is used as the target logical execution plan for subsequent judgment.

[0092] In the above embodiments, by matching operators according to pointers, when a matching operator is found, it indicates that an initial sub-plan has been matched in the initial logic execution plan, thereby improving the accuracy of the initial sub-plan matching and ensuring the accuracy of the determined target logic execution plan.

[0093] In some embodiments, the left sub-plan corresponding to the transformed downstream semi-connection operator includes the initial sub-plan;

[0094] Based on the target subplan, an optimized logical execution plan is obtained. Then, the query is executed based on this optimized logical execution plan to obtain the query results corresponding to the query statement, including:

[0095] If the initial subplan included in the left subplan meets the preset join type conversion conditions, the join type of the downstream inner join operator in the left subplan is converted to a semi-join, resulting in a left target subplan including the converted downstream semi-join operator. The preset join type conversion conditions include that the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, and the upstream aggregation operator is a type that does not carry aggregation functions. Based on the target subplan and the left target subplan, a left recursive optimized logical execution plan is obtained, and the query is executed based on the left recursive optimized logical execution plan to obtain the query results corresponding to the query statement.

[0096] In this context, the left subplan corresponding to the transformed downstream semi-join operator refers to the subplan of the left branch corresponding to the transformed downstream semi-join operator. The left subplan can be represented by a tree structure and may include multiple operators connected sequentially. The left subplan may include the initial subplan, meaning that the initial subplan can be matched within the left subplan. In some embodiments, the left subplan may be directly the initial subplan, meaning that the left subplan includes an upstream aggregation operator, a projection operator connected to the upstream aggregation operator, and a downstream inner join operator connected to the projection operator. The preset join type conversion condition refers to the pre-set condition for converting the join type of the downstream inner join operator to a semi-join. The left target subplan refers to the subplan obtained by converting the join type of the downstream inner join operator in the initial subplan included in the left subplan to a semi-join. The left recursively optimized logical execution plan refers to the subplan obtained after performing join type conversion on the initial subplan included in the left subplan of each join operator in the target logical execution plan.

[0097] In this embodiment, the preset connection type conversion conditions may include the output parameters of the left sub-operator corresponding to the downstream inner join operator including the output parameters of the projection operator, and the upstream aggregation operator being a type that does not carry aggregation functions. When the management node matches the initial sub-plan in the left sub-plan corresponding to the converted downstream half-join operator, it can continue to detect and judge the connection type conversion of the initial sub-plan matched in the left sub-plan. Specifically, if the initial sub-plan matched in the left sub-plan does not meet any of the preset connection type conversion conditions, no connection type conversion is performed. Then, when the initial sub-plan matched in the left sub-plan meets the preset connection type conversion conditions—that is, the output parameters of the left sub-operator corresponding to the downstream inner join operator in the initial sub-plan included in the left sub-plan include the output parameters of the projection operator, and the upstream aggregation operator is a type that does not carry aggregation functions—the management node converts the connection type of the downstream inner join operator in the initial sub-plan matched in the left sub-plan to a half-join, obtaining the left target sub-plan including the converted downstream half-join operator. Then, the initial sub-plan matching continues within the logical execution plan. If no initial sub-plan is found, a left-recursively optimized logical execution plan is obtained based on the target sub-plan and the left target sub-plan. The query is then executed based on this optimized plan to obtain the query results. In some embodiments, the management node continues matching initial sub-plans within the logical execution plan. For example, it can continue matching initial sub-plans within the sub-plans of the left branch of the downstream semi-join operator in the left target sub-plan until the logical execution plan is traversed, resulting in the final left-recursively optimized logical execution plan. The query is then executed based on this final optimized plan to obtain the query results.

[0098] Understandably, the preset connection type conversion conditions can include the output parameters of the left sub-operator corresponding to the downstream inner join operator including the output parameters of the projection operator, the upstream aggregation operator being a type that does not carry aggregation functions, and the downstream inner join operator being a non-broadcast join execution strategy. That is, if the management node determines that the initial sub-plan matched in the left sub-plan does not meet any of the preset connection type conversion conditions, it will not perform a connection type conversion. Then, when the initial sub-plan matched in the left sub-plan meets the preset connection type conversion conditions—that is, the output parameters of the left sub-operator corresponding to the downstream inner join operator in the initial sub-plan included in the left sub-plan include the output parameters of the projection operator, the upstream aggregation operator is a type that does not carry aggregation functions, and the downstream inner join operator is a non-broadcast join execution strategy—then the management node will convert the connection type of the downstream inner join operator in the initial sub-plan matched in the left sub-plan to a semi-join, resulting in a left target sub-plan including the converted downstream semi-join operator.

[0099] In a specific embodiment, such as Figure 6 The diagram illustrates the structure where the left subplan includes the initial subplan. The transformed downstream semi-join operator's left subplan includes the initial subplan. Specifically, the left branch of the Join node (jointype=Smei) includes an Aggregate node, a Project node, and a Join node. This Join operator is an inner join (jointype=Inner). The Join operator also includes a LeftChild node and a RightChild node. The Aggregate node connects to the Project node, the Project node connects to the Join node, and the Join node connects to both its left and right child nodes.

[0100] In the above embodiments, when the initial subplan included in the left subplan meets the connection type conversion condition, the connection type of the downstream inner join operator in the left subplan is converted into a semi-join, resulting in a left target subplan including the converted downstream semi-join operator. Then, the query is executed according to the target subplan and the left target subplan. Thus, the computational amount of the join operation included in the left subplan can be further reduced through recursion, which can further reduce the execution time of data query and thus improve the efficiency of data query.

[0101] In some embodiments, the right sub-plan corresponding to the transformed downstream semi-connection operator includes the initial sub-plan;

[0102] Based on the target subplan, an optimized logical execution plan is obtained. Then, the query is executed based on this optimized logical execution plan to obtain the query results corresponding to the query statement, including:

[0103] If the initial subplan included in the right subplan meets the preset join type conversion conditions, the join type of the downstream inner join operator in the right subplan is converted to a semi-join, resulting in a right target subplan including the converted downstream semi-join operator. The preset join type conversion conditions include that the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, and the upstream aggregation operator is a type that does not carry aggregation functions. Based on the target subplan and the right target subplan, a right recursive optimized logical execution plan is obtained, and the query is executed based on the right recursive optimized logical execution plan to obtain the query results corresponding to the query statement.

[0104] In this context, the right subplan corresponding to the transformed downstream semi-join operator refers to the subplan of the right branch corresponding to the transformed downstream semi-join operator. The right subplan can be represented by a tree structure and may include multiple operators connected sequentially. The right subplan may include the initial subplan, meaning that the initial subplan can be matched within the right subplan. In some embodiments, the right subplan may be directly the initial subplan, meaning that the right subplan includes an upstream aggregation operator, a projection operator connected to the upstream aggregation operator, and a downstream inner join operator connected to the projection operator. The preset connection type conversion condition refers to the pre-set condition for converting the connection type of the downstream inner join operator to a semi-join. The right target subplan refers to the subplan obtained by converting the connection type of the downstream inner join operator in the initial subplan included in the right subplan to a semi-join. The right recursively optimized logical execution plan refers to the subplan obtained after converting the connection type of the initial subplan included in the right subplan corresponding to each join operator in the target logical execution plan.

[0105] In this embodiment, the preset connection type conversion conditions may include the output parameters of the left sub-operator corresponding to the downstream inner join operator including the output parameters of the projection operator, and the upstream aggregation operator being a type that does not carry aggregation functions. When the management node matches the initial sub-plan in the right sub-plan corresponding to the converted downstream half-join operator, it can continue to detect and determine the connection type conversion for the initial sub-plan matched in the right sub-plan. Specifically, if the initial sub-plan matched in the right sub-plan does not meet any of the preset connection type conversion conditions, no connection type conversion is performed. Then, when the initial sub-plan matched in the right sub-plan meets the preset connection type conversion conditions—that is, the output parameters of the left sub-operator corresponding to the downstream inner join operator in the initial sub-plan included in the right sub-plan include the output parameters of the projection operator, and the upstream aggregation operator is a type that does not carry aggregation functions—the management node converts the connection type of the downstream inner join operator in the initial sub-plan matched in the right sub-plan to a half-join, obtaining the right target sub-plan including the converted downstream half-join operator. Then, the initial sub-plan matching continues within the logical execution plan. If no initial sub-plan is found, a right-recursively optimized logical execution plan is obtained based on the target sub-plan and the right target sub-plan. The query is then executed based on this optimized plan to obtain the query results. In some embodiments, the management node continues matching initial sub-plans within the logical execution plan. For example, it can continue matching initial sub-plans within the sub-plans of the right-hand branches of the downstream semi-join operators in the right target sub-plan, until the logical execution plan is traversed completely, resulting in the final right-recursively optimized logical execution plan. The query is then executed based on this final optimized plan to obtain the query results.

[0106] Understandably, the preset connection type conversion conditions can include the output parameters of the left sub-operator corresponding to the downstream inner join operator including the output parameters of the projection operator, the upstream aggregation operator being a type that does not carry aggregation functions, and the downstream inner join operator being a non-broadcast join execution strategy. That is, if the management node determines that the initial sub-plan matched in the right sub-plan does not meet any of the preset connection type conversion conditions, it will not perform a connection type conversion. Then, when the initial sub-plan matched in the right sub-plan meets the preset connection type conversion conditions—that is, the output parameters of the left sub-operator corresponding to the downstream inner join operator in the initial sub-plan included in the right sub-plan include the output parameters of the projection operator, the upstream aggregation operator is a type that does not carry aggregation functions, and the downstream inner join operator is a non-broadcast join execution strategy—then the management node will convert the connection type of the downstream inner join operator in the initial sub-plan matched in the right sub-plan to a half-join, resulting in the right target sub-plan including the converted downstream half-join operator.

[0107] In a specific embodiment, such as Figure 7 The diagram illustrates the structure where the right subplan includes the initial subplan. The right subplan corresponding to the transformed downstream semi-join operator includes the initial subplan. Specifically, the right branch of the Join node of the semi-join type (jointype=Smei) includes an Aggregate node, a Project node, and a Join node. The join operator's join type is inner join (jointype=Inner). The join operator also includes a LeftChild node and a RightChild node. The Aggregate node is connected to the Project node, the Project node is connected to the Join node, and the Join node is connected to both its left and right child nodes.

[0108] In the above embodiments, when the initial subplan included in the right subplan meets the connection type conversion condition, the connection type of the downstream inner join operator in the right subplan is converted into a semi-join, resulting in a right target subplan including the converted downstream semi-join operator. Then, the query is executed according to the target subplan and the right target subplan. Thus, the computational amount of the join operation included in the right subplan can be further reduced through recursion, which can further reduce the execution time of data query and thus improve the efficiency of data query.

[0109] In some embodiments, the left subplan corresponding to the transformed downstream semi-join operator includes the initial subplan, and the right subplan also includes the initial subplan. In this case, the management node can detect and determine the join type conversion for both the left and right subplans that include the initial subplan. If the left subplan including the initial subplan meets the preset join type conversion condition, and the right subplan also meets the preset join type conversion condition, the join type of the included downstream inner join operators is converted to a semi-join, resulting in the right target subplan and the left target subplan. Based on the right target subplan and the left target subplan, a recursively optimized logical execution plan is obtained, and the query is executed according to the recursively optimized logical execution plan to obtain the query results corresponding to the query statement. In other words, when both the left and right subplans of the semi-join operator include the initial subplan, the join type conversion can be detected and judged separately. If the join type conversion condition is met, the join type conversion is performed, and the recursion can continue until the logical execution plan is traversed, resulting in the final recursively optimized logical execution plan. The final recursively optimized logical execution plan is then executed, which can significantly reduce the computational load of join operations, significantly reduce the execution time of data queries, and greatly improve the efficiency of data queries.

[0110] In some embodiments, the data query method further includes the step of:

[0111] If the output parameters of the left sub-operator corresponding to the downstream inner join operator do not include the output parameters of the projection operator, then the query is executed based on the target logic execution plan including the initial sub-plan, and the query result corresponding to the query statement is obtained.

[0112] In this embodiment, the management node detects that the output parameters of the left sub-operator corresponding to the downstream inner join operator do not include the output parameters of the projection operator. This could be because the output parameters of the projection operator contain columns that are not present in the output parameters of the left sub-operator corresponding to the downstream inner join operator. This can be determined by the column names; for example, the output parameters of the projection operator include column A, but the output parameters of the left sub-operator corresponding to the downstream inner join operator do not include column A. In other words, the outputSet of the Project operator is not a subset of the outputSet of the left child node of the Join operator. This indicates that the initial sub-plan does not meet the join type conversion condition. The management node then executes the query according to normal distributed database query logic. For example, it could use the target logic execution plan to generate the corresponding physical execution plan and execute the query according to the physical execution plan to obtain the query results corresponding to the query statement.

[0113] In the above embodiments, when the output parameters of the left sub-operator corresponding to the downstream inner connection operator do not include the output parameters of the projection operator, there is no need to perform connection type conversion. The query is executed directly according to the normal distributed database query logic, thereby avoiding inaccurate query results and ensuring the accuracy of data query.

[0114] In some embodiments, the data query method further includes the step of:

[0115] If the upstream aggregate operator is of a type that carries aggregate functions, then the query is executed based on the target logic execution plan, which includes the initial sub-plan, to obtain the query results corresponding to the query statement.

[0116] In this embodiment, if the management node detects that the output parameters of the left child operator corresponding to the downstream inner join operator include the output parameters of the projection operator, it then checks whether the upstream aggregation operator is a type carrying aggregation functions. Specifically, it checks whether the Aggregate operator is of grouponly type. If it is not grouponly, it indicates that the upstream aggregation operator is of a type carrying aggregation functions, and no join type conversion is needed. Converting the join type from inner join to semi-join would negatively impact the accuracy of the query results. Therefore, the management node executes the query according to normal distributed database query logic. For example, it can use the target logic execution plan to generate the corresponding physical execution plan and execute the query according to the physical execution plan to obtain the query results corresponding to the query statement.

[0117] In some embodiments, if the management node detects that the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator and the upstream aggregation operator is of a type that does not carry aggregation functions, then it detects the connection execution strategy information corresponding to the downstream inner join operator. If the connection execution strategy information is a broadcast connection execution strategy, the management node performs the query execution according to the normal distributed database query logic and obtains the query result.

[0118] In the above embodiments, when the upstream aggregation operator is of a type carrying aggregation functions, there is no need to perform a connection type conversion. The query is executed directly according to the normal distributed database query logic, thereby avoiding inaccurate query results and ensuring the accuracy of data query.

[0119] In some embodiments, before S202, i.e. before obtaining the target logical execution plan corresponding to the query statement, the following steps are also included:

[0120] The optimization configuration request is obtained, and the optimization configuration request carries the optimization enabling configuration parameters;

[0121] Respond to the optimization configuration request and save the optimization enabling configuration parameters;

[0122] Obtain the target logic execution plan corresponding to the query statement, including:

[0123] Based on the optimization of configuration parameters, the target logic execution plan corresponding to the query statement is obtained.

[0124] The optimization configuration request is used to optimize the initial sub-plan by optimizing the configuration parameters. The optimization configuration parameters are the configuration parameters used to enable the execution of the data query method in this application. Specifically, when the optimization configuration parameters are detected, the initial sub-plan needs to be optimized within the initial logical execution plan. This involves matching the initial sub-plan within the initial logical execution plan, and upon finding a matched initial sub-plan, determining whether it meets the preset connection type conversion conditions. If it does, the connection type is converted.

[0125] Specifically, the management node can receive an optimization configuration request sent by the management terminal, which carries optimization enabling configuration parameters. At this point, the management node can respond to the optimization configuration request and save the optimization enabling configuration parameters. Then, when parsing the query statement to obtain the initial logical execution plan, the management node matches the initial sub-plan within the initial logical execution plan based on these optimization enabling configuration parameters. Upon finding a matched initial sub-plan, it determines whether the initial sub-plan meets the preset connection type conversion conditions. If it does, it performs connection type conversion to obtain the optimized logical execution plan, and then executes the query according to the optimized logical execution plan to obtain the query results corresponding to the query statement.

[0126] In some embodiments, the management node can also receive an optimization configuration disable request. In response, the management node deletes the saved optimization enable configuration parameters. At this point, when the management node parses the initial logic execution plan, it does not detect any optimization enable configuration parameters, and execution proceeds according to the normal distributed database query logic.

[0127] In one specific embodiment, when using this application in the Spark distributed computing engine, a pluggable configuration rule for converting join type in the logical plan layer can be provided. When a user needs to use the optimizations of this application, they can insert the corresponding configuration rule into the Spark Conf (Spark configuration class) to indicate that the optimization configuration of this application is enabled. When the optimizations of this application are no longer needed, the corresponding configuration rule can be deleted.

[0128] In the above embodiments, an optimization configuration request is obtained, carrying optimization enabling configuration parameters. The optimization enabling configuration parameters are saved in response to the optimization configuration request. Only when the existence of optimization enabling configuration parameters is detected will the target logical execution plan, including the initial sub-plan, be obtained, and subsequent connection type conversion conditions be determined. When the existence of optimization enabling configuration parameters is detected, no connection type conversion optimization determination is needed. That is, enabling and disabling logical execution plan optimization through configuration parameters can improve the flexibility of logical execution plan optimization.

[0129] In a specific embodiment, such as Figure 8 The diagram illustrates the data query method, executed by a management node in a distributed server cluster. This management node can be a computer device, which can be a server or a terminal, preferably a server. The method includes the following steps:

[0130] S802, obtain the initial logical execution plan corresponding to the query statement, and match it according to the initial sub-plan in the initial logical execution plan. The initial sub-plan includes the upstream aggregation operator, the projection operator connected to the upstream aggregation operator, and the downstream inner join operator connected to the projection operator.

[0131] S804, if an upstream aggregation operator is found in the initial logic execution plan, the first operator information pointed to by the pointer corresponding to the upstream aggregation operator is obtained. If the first operator information includes a projection operator, the second operator information pointed to by the pointer corresponding to the projection operator is obtained. If the second operator information includes a downstream inner join operator, the initial sub-plan is matched in the initial logic execution plan.

[0132] S806 If an initial sub-plan is matched in the initial logical execution plan, the initial logical execution plan is used as the target logical execution plan for the query statement. If no initial sub-plan is matched in the initial logical execution plan, the query is executed based on the initial logical execution plan to obtain the query result corresponding to the query statement.

[0133] S808, if the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, and the upstream aggregation operator is of a type that does not carry aggregation functions, then obtain the data statistics information of the left sub-operator corresponding to the downstream inner join operator.

[0134] S810, if the data statistics do not meet the statistical information conditions of the preset broadcast connection execution policy, the obtained connection execution policy information is a non-broadcast connection execution policy; if the data statistics meet the statistical information conditions of the preset broadcast connection execution policy, the obtained connection execution policy information is a broadcast connection execution policy.

[0135] S812, if the connection execution strategy information is a non-broadcast connection execution strategy, then the connection type of the downstream internal join operator is converted to a half-join, resulting in a target sub-plan that includes the converted downstream half-join operator. The left sub-plan corresponding to the converted downstream half-join operator includes the initial sub-plan, and the right sub-plan corresponding to the initial sub-plan also includes the initial sub-plan.

[0136] S814 If the output parameters of the left sub-operator corresponding to the downstream inner join operator in the left sub-plan include the output parameters of the projection operator in the left sub-plan, the upstream aggregation operator in the left sub-plan is of a type that does not carry aggregation functions, and the connection execution strategy information of the downstream inner join operator in the left sub-plan is a non-broadcast connection execution strategy, then the connection type of the downstream inner join operator in the left sub-plan is converted to a semi-join, and a left target sub-plan including the converted downstream semi-join operator is obtained.

[0137] S816 If the output parameters of the left sub-operator corresponding to the downstream inner join operator in the right sub-plan include the output parameters of the projection operator in the right sub-plan, the upstream aggregation operator in the right sub-plan is of a type that does not carry aggregation functions, and the connection execution strategy information of the downstream inner join operator in the right sub-plan is a non-broadcast connection execution strategy, then the connection type of the downstream inner join operator in the right sub-plan is converted to a semi-join, resulting in a right target sub-plan that includes the converted downstream semi-join operator.

[0138] S818 derives a recursively optimized logical execution plan based on the target subplan, left target subplan, and right target subplan, and performs query execution based on the recursively optimized logical execution plan to obtain the query results corresponding to the query statement.

[0139] In the above embodiments, by matching the initial sub-plan in the initial logical execution plan, when an initial sub-plan exists in the initial logical execution plan, the join type conversion condition is checked. When the join type conversion condition is met, the join type of the inner join operator is converted to a semi-join, resulting in the target sub-plan. Then, the inner join operators in all initial sub-plans that meet the join type conversion condition in the initial logical execution plan are converted to obtain the optimized logical execution plan. The query is then executed based on the optimized logical execution plan to obtain the query results corresponding to the query statement. Specifically, by converting the join type of the join operator from inner join to semi-join, i.e., optimizing the join operator to an efficient join type, the amount of computation and results involved in the join operation can be significantly reduced, thereby saving computational resources, reducing data query execution time, and improving data query efficiency.

[0140] In a specific embodiment, the technical implementation principle of this application is explained: The data query method of this application is applied to the logical plan optimization layer, which can be a distributed computing data warehouse, such as an MPP (Massively Parallel Processing) architecture data warehouse or other data warehouses, or it can be applied to a traditional data warehouse, such as a relational database. Figure 9 The diagram shown illustrates the architecture of a data query, outlining the entire process from the start to the end of the query. Specifically, rule optimization involving join type conversion is inserted at the logical plan optimization layer. Then, as... Figure 10 The diagram illustrates the technical principle of data querying. This application can be applied in Spark. When a query statement contains both Join and Aggregate operators, and the Aggregate operator performs logical plan optimization before the Join operator, and combines this with the logical judgment of the Projection operator, the type of the Join operator can be optimized into a more efficient Join operator type. Specifically, when the logical execution plan of the query statement matches a pointer of the Aggregate operator pointing to a Project operator, and a pointer of the Project operator pointing to a sub-plan of the Join operator, then a join type conversion detection and judgment is performed. That is, a join type conversion is performed when the following three conditions are met:

[0141] 1. The output of the projection operator is a subset of the output of the left child operator of the join operator. That is, the outputSet of the Project operator is a subset of the outputSet of the left child node of the Join operator.

[0142] 2. The parent node of the join operator is an aggregation operator that is not a type with aggregation functions. That is, the parent node of the join operator, the aggregate operator, is of type grouponly, meaning it does not have aggregation functions. If it is not a grouponly type, it cannot be converted, otherwise it will affect the accuracy of the result.

[0143] 3. The left child operator of the join operator cannot be the creation side of the broadcast join strategy. That is, the left child node of the join cannot be the build side that generates the broadcast join. If the inner join operator is converted to a semi-join operator, and the left child node of the join operator must be the build side, then the broadcast join cannot be generated, resulting in performance loss, which is not worth the effort.

[0144] When the above three conditions are met, the join type of the join operator in the initial sub-plan can be changed from inner join to semi join and then executed. That is, the type of the downstream join operator is optimized based on the upstream aggregation operation and projection, thereby achieving performance optimization. Moreover, only the type of the join operator is modified, without adding any operators. The conversion process is very lightweight, avoiding the need for excessive resources in the optimization process.

[0145] Then, this application was tested, and in the 1TB TPCDS benchmark (a benchmark test for measuring data warehouse performance), the Q82 achieved a 17% performance improvement. The test results are shown in Table 1 below:

[0146] Table 1 Comparison of Test Results

[0147]

[0148] It is evident that the native Spark takes 27 seconds to execute a query. In contrast, the optimized Spark described in this application only takes 23 seconds, representing a 1.17x improvement. A comparative analysis of the before and after optimization is then conducted, such as... Figure 11 The diagram shown is a schematic of the analysis before optimization, where the number of output rows for the join output operator is 8,902,621. Figure 12 As shown in the diagram, the number of output rows of the join output operator is 885. Obviously, the amount of data output by the join output operator in this application is reduced by 4 orders of magnitude, and the execution time of the computational operator after the join operator is significantly reduced, thus greatly improving the execution performance of the entire data query job.

[0149] This application significantly reduces the computational load of the join operator, thereby lowering the computational load and network bandwidth requirements. It also saves data in subsequent operators during computation, greatly improving the overall performance of the execution process. Furthermore, by reducing the computational load and the amount of computation required, the computational load, network bandwidth, and disk throughput requirements are reduced, which in turn improves the computational performance of the distributed computing engine, significantly shortens the execution time of data query jobs, and ultimately achieves cost reduction and efficiency improvement.

[0150] In a specific embodiment, this data query method is applied in the process of big data querying. Specifically, in a distributed server cluster, data needed for business operations can be queried from distributed stored big data. That is, the management node of the distributed server cluster can obtain a business data query request from the business query terminal, which carries a query statement. The management node then parses the query statement to obtain an initial logical execution plan, and then matches an initial sub-plan within that plan. During the matching of the initial sub-plan, a join type conversion condition is determined. If the join type conversion condition is met, for example, if all three conditions for join type conversion are met, the management node converts the join operator type in the initial sub-plan from an inner join to a semi-join, obtaining the target sub-plan. The query execution continues according to the logical execution plan including the target sub-plan, finally obtaining the business data to be queried. The management node then returns the queried business data to the business query terminal for display. In other words, this application can improve the efficiency of users in retrieving business data.

[0151] In a specific embodiment, this data query method can be applied in real-time data processing scenarios, such as in a search platform, where the server can be a management node in a distributed cluster. Specifically: when a user searches for desired content through the search platform, they use a search query. Upon receiving the search query, the search platform server generates a query statement. The server parses the query statement to obtain an initial logical execution plan and then matches an initial sub-plan within that plan. During the matching of the sub-plan, a join type conversion condition is checked. If the conditions are met (e.g., all three conditions are met), the management stage converts the join operator type in the initial sub-plan from an inner join to a semi-join, obtaining the target sub-plan. The query execution continues according to the logical execution plan including the target sub-plan, ultimately yielding the desired business data. The management node then returns the retrieved business data to the business query terminal for display. This application improves the efficiency of content retrieval for users.

[0152] In a specific embodiment, this data query method can be applied to financial scenarios, such as the query process of financial transaction data on a financial trading platform. Specifically: the financial trading platform receives a user's request to query financial transaction data, parses the request to obtain the query statement, and then parses the query statement to obtain an initial logical execution plan. An initial sub-plan is then matched within the initial logical execution plan. During the matching of the initial sub-plan, a join type conversion condition is determined. If the join type conversion condition is met, for example, if all three conditions for join type conversion are met, the management stage converts the join operator type in the initial sub-plan from inner join to semi-join, obtaining the target sub-plan. The query execution continues according to the logical execution plan including the target sub-plan, finally obtaining the business data to be queried. The management node then returns the queried business data to the business query terminal for display. In other words, this application allows users to improve the efficiency of retrieving financial transaction data while ensuring the accuracy of the queried data.

[0153] It should be understood that although the steps in the flowcharts of the embodiments described above are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the embodiments described above may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.

[0154] Based on the same inventive concept, this application also provides a data query apparatus for implementing the data query method described above. The solution provided by this apparatus is similar to the implementation scheme described in the above method; therefore, the specific limitations in one or more data query apparatus embodiments provided below can be found in the limitations of the data query method described above, and will not be repeated here.

[0155] In one embodiment, such as Figure 13 As shown, a data query device 1300 is provided, including: an execution plan acquisition module 1302, a type conversion module 1304, and a query execution module 1306, wherein:

[0156] The execution plan acquisition module 1302 is used to acquire the target logical execution plan corresponding to the query statement. The target logical execution plan includes an initial sub-plan, which includes an upstream aggregation operator, a projection operator connected to the upstream aggregation operator, and a downstream inner join operator connected to the projection operator.

[0157] The type conversion module 1304 is used to convert the connection type of the downstream inner connection operator to a semi-connection if the output parameters of the left sub-operator corresponding to the downstream inner connection operator include the output parameters of the projection operator and the upstream aggregation operator is a type that does not carry an aggregation function, so as to obtain the target sub-plan that includes the converted downstream semi-connection operator.

[0158] The query execution module 1306 is used to obtain an optimized logical execution plan based on the target sub-plan, and to perform query execution based on the optimized logical execution plan to obtain the query results corresponding to the query statement.

[0159] In one embodiment, the type conversion module 1304 is further configured to: if the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, and the upstream aggregation operator is a type that does not carry an aggregation function, then obtain the connection execution strategy information corresponding to the downstream inner join operator; if the connection execution strategy information is a non-broadcast connection execution strategy, then convert the connection type of the downstream inner join operator to a half-join, and obtain the target sub-plan including the converted downstream half-join operator.

[0160] In one embodiment, the type conversion module 1304 is further configured to obtain data statistics information of the left sub-operator corresponding to the downstream inner connection operator; if the data statistics information does not meet the statistical information conditions of the preset broadcast connection execution strategy, the obtained connection execution strategy information is a non-broadcast connection execution strategy; if the data statistics information meets the statistical information conditions of the preset broadcast connection execution strategy, the obtained connection execution strategy information is a broadcast connection execution strategy.

[0161] In one embodiment, the data query device 1300 further includes:

[0162] The first target execution module is used to perform query execution based on the target logic execution plan, including the initial sub-plan, if the connection execution strategy information corresponding to the downstream internal join operator is a broadcast connection execution strategy, and obtain the query result corresponding to the query statement.

[0163] In one embodiment, the execution plan acquisition module 1302 is further configured to acquire the initial logical execution plan corresponding to the query statement, and match it with the initial sub-plan in the initial logical execution plan; if the initial sub-plan is matched in the initial logical execution plan, the initial logical execution plan is used as the target logical execution plan corresponding to the query statement; if the initial sub-plan is not matched in the initial logical execution plan, the query is executed based on the initial logical execution plan to obtain the query result corresponding to the query statement.

[0164] In one embodiment, the execution plan acquisition module 1302 is further configured to: if an upstream aggregation operator is found in the initial logical execution plan, acquire the first operator information pointed to by the pointer corresponding to the upstream aggregation operator; if the first operator information includes a projection operator, acquire the second operator information pointed to by the pointer corresponding to the projection operator; if the second operator information includes a downstream inner join operator, match the initial sub-plan in the initial logical execution plan.

[0165] In one embodiment, the left subplan corresponding to the transformed downstream semi-join operator includes the initial subplan; the query execution module 1306 is further configured to convert the connection type of the downstream inner join operator in the left subplan to a semi-join if the output parameters of the left sub-operator corresponding to the downstream inner join operator in the left subplan include the output parameters of the projection operator in the left subplan, and the upstream aggregation operator in the left subplan is a type that does not carry aggregation functions, thereby obtaining a left target subplan that includes the transformed downstream semi-join operator; based on the target subplan and the left target subplan, a left recursive optimized logical execution plan is obtained, and the query is executed based on the left recursive optimized logical execution plan to obtain the query result corresponding to the query statement.

[0166] In one embodiment, the right subplan corresponding to the transformed downstream semi-join operator includes the initial subplan; the query execution module 1306 is further configured to convert the connection type of the downstream inner join operator in the right subplan to a semi-join if the output parameters of the left sub-operator corresponding to the downstream inner join operator in the right subplan include the output parameters of the projection operator in the right subplan, and the upstream aggregation operator in the right subplan is a type that does not carry aggregation functions, thereby obtaining a right target subplan that includes the transformed downstream semi-join operator; based on the target subplan and the right target subplan, a right recursively optimized logical execution plan is obtained, and the query is executed based on the right recursively optimized logical execution plan to obtain the query result corresponding to the query statement.

[0167] In one embodiment, the data query device 1300 further includes:

[0168] The second target execution module is used to perform query execution based on the target logic execution plan including the initial sub-plan if the output parameters of the left sub-operator corresponding to the downstream inner join operator do not include the output parameters of the projection operator, and obtain the query result corresponding to the query statement.

[0169] In one embodiment, the data query device 1300 further includes:

[0170] The third target execution module is used to execute the query based on the target logic execution plan, including the initial sub-plan, if the upstream aggregation operator is of the type carrying aggregation functions, and obtain the query results corresponding to the query statement.

[0171] In one embodiment, the data query device 1300 further includes:

[0172] The configuration enabling module is used to obtain optimization configuration requests, which carry optimization enabling configuration parameters; and to save the optimization enabling configuration parameters in response to optimization configuration requests.

[0173] The execution plan acquisition module 1302 is also used to acquire the target logic execution plan corresponding to the query statement based on the optimization start configuration parameters.

[0174] Each module in the aforementioned data query device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device, or stored in the memory of a computer device as software, so that the processor can call and execute the operations corresponding to each module.

[0175] In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 14 As shown, this computer device includes a processor, memory, input / output interfaces (I / O), and a communication interface. The processor, memory, and I / O interfaces are connected via a system bus, and the communication interface is also connected to the system bus via the I / O interfaces. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and a database. The internal memory provides the environment for the operating system and computer programs stored in the non-volatile storage media. The database stores data such as connection type conversion conditions and query results corresponding to query statements. The I / O interfaces are used for exchanging information between the processor and external devices. The communication interface is used for communicating with external terminals via a network connection. When the computer program is executed by the processor, it implements a data query method.

[0176] In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as follows: Figure 15 As shown, the computer device includes a processor, memory, input / output interfaces, a communication interface, a display unit, and an input device. The processor, memory, and input / output interfaces are connected via a system bus, and the communication interface, display unit, and input device are also connected to the system bus via the input / output interfaces. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The input / output interfaces are used for exchanging information between the processor and external devices. The communication interface is used for wired or wireless communication with external terminals; wireless communication can be achieved through Wi-Fi, mobile cellular networks, NFC (Near Field Communication), or other technologies. When the computer program is executed by the processor, it implements a data query method. The display unit of the computer device is used to form a visually visible image. It can be a display screen, a projection device, or a virtual reality imaging device. The display screen can be an LCD screen or an e-ink screen. The input device of the computer device can be a touch layer covering the display screen, or buttons, trackballs, or touchpads set on the casing of the computer device, or external keyboards, touchpads, or mice, etc.

[0177] Those skilled in the art will understand that Figure 14 or Figure 15 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0178] In one embodiment, a computer device is also provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the above method embodiments.

[0179] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon that, when executed by a processor, implements the steps in the above method embodiments.

[0180] In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps in the above method embodiments.

[0181] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions.

[0182] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments described above. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.

[0183] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0184] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.

Claims

1. A data query method, characterized in that, The method includes: Obtain the target logic execution plan corresponding to the query statement. The target logic execution plan includes an initial sub-plan, which includes an upstream aggregation operator, a projection operator connected to the upstream aggregation operator, and a downstream inner join operator connected to the projection operator. If the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, and the upstream aggregation operator is of a type that does not carry aggregation functions, then the connection type of the downstream inner join operator is converted to a semi-join, resulting in a target sub-plan that includes the converted downstream semi-join operator. Based on the target sub-plan, an optimized logical execution plan is obtained, and the query is executed based on the optimized logical execution plan to obtain the query result corresponding to the query statement.

2. The method according to claim 1, characterized in that, If the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, and the upstream aggregation operator is of a type that does not carry aggregation functions, then the join type of the downstream inner join operator is converted to a semi-join, resulting in a target sub-plan that includes the converted downstream semi-join operator, including: If the output parameters of the left sub-operator corresponding to the downstream inner join operator include the output parameters of the projection operator, and the upstream aggregation operator is of a type that does not carry aggregation functions, then the connection execution strategy information corresponding to the downstream inner join operator is obtained. If the connection execution strategy information is a non-broadcast connection execution strategy, then the connection type of the downstream intra-connection operator is converted into a half-connection, resulting in a target sub-plan that includes the converted downstream half-connection operator.

3. The method according to claim 2, characterized in that, The step of obtaining the connection execution strategy information corresponding to the downstream inner join operator includes: Obtain the data statistics of the left sub-operator corresponding to the downstream inner connect operator; If the data statistics do not meet the statistical information conditions of the preset broadcast connection execution strategy, then the obtained connection execution strategy information is a non-broadcast connection execution strategy. If the data statistics information meets the statistical information conditions of the preset broadcast connection execution strategy, then the obtained connection execution strategy information is the broadcast connection execution strategy.

4. The method according to claim 2, characterized in that, The method further includes: If the connection execution strategy information corresponding to the downstream internal join operator is a broadcast connection execution strategy, then the query execution is performed based on the target logical execution plan including the initial sub-plan to obtain the query result corresponding to the query statement.

5. The method according to claim 1, characterized in that, The step of obtaining the target logic execution plan corresponding to the query statement, wherein the target logic execution plan includes an initial sub-plan, the initial sub-plan including an upstream aggregation operator, a projection operator connected to the upstream aggregation operator, and a downstream inner join operator connected to the projection operator, including: Obtain the initial logical execution plan corresponding to the query statement, and match it with the initial sub-plan in the initial logical execution plan; If the initial sub-plan is matched in the initial logical execution plan, then the initial logical execution plan is used as the target logical execution plan corresponding to the query statement; If the initial sub-plan is not matched in the initial logical execution plan, the query is executed based on the initial logical execution plan to obtain the query result corresponding to the query statement.

6. The method according to claim 5, characterized in that, The matching process in the initial logical execution plan according to the initial sub-plan includes: If the upstream aggregation operator is found in the initial logic execution plan, then the first operator information pointed to by the pointer corresponding to the upstream aggregation operator is obtained; If the first operator information includes the projection operator, then the second operator information pointed to by the pointer corresponding to the projection operator is obtained; If the second operator information includes the downstream inner join operator, then the initial sub-plan is matched in the initial logic execution plan.

7. The method according to claim 1, characterized in that, The left sub-plan corresponding to the transformed downstream semi-connection operator includes the initial sub-plan; The optimized logical execution plan is obtained based on the target sub-plan, and the query is executed based on the optimized logical execution plan to obtain the query results corresponding to the query statement, including: If the initial subplan included in the left subplan meets the preset connection type conversion condition, then the connection type of the downstream inner connection operator in the left subplan is converted to a semi-connection to obtain a left target subplan including the converted downstream semi-connection operator. The preset connection type conversion condition includes that the output parameters of the left sub-operator corresponding to the downstream inner connection operator include the output parameters of the projection operator, and the upstream aggregation operator is a type that does not carry an aggregation function. Based on the target sub-plan and the left target sub-plan, a left recursive optimized logical execution plan is obtained, and the query is executed based on the left recursive optimized logical execution plan to obtain the query result corresponding to the query statement.

8. The method according to claim 1, characterized in that, The right sub-plan corresponding to the transformed downstream semi-connection operator includes the initial sub-plan; The optimized logical execution plan is obtained based on the target sub-plan, and the query is executed based on the optimized logical execution plan to obtain the query results corresponding to the query statement, including: If the initial subplan included in the right subplan meets the preset connection type conversion condition, then the connection type of the downstream inner connection operator in the right subplan is converted to a semi-connection, resulting in a right target subplan including the converted downstream semi-connection operator. The preset connection type conversion condition includes that the output parameters of the left sub-operator corresponding to the downstream inner connection operator include the output parameters of the projection operator, and the upstream aggregation operator is a type that does not carry an aggregation function. Based on the target sub-plan and the right target sub-plan, a right recursive optimized logical execution plan is obtained, and the query is executed based on the right recursive optimized logical execution plan to obtain the query result corresponding to the query statement.

9. The method according to claim 1, characterized in that, The method further includes: If the output parameters of the left sub-operator corresponding to the downstream inner join operator do not include the output parameters of the projection operator, then the query is executed based on the target logic execution plan including the initial sub-plan to obtain the query result corresponding to the query statement.

10. The method according to claim 1, characterized in that, The method further includes: If the upstream aggregation operator is of the type carrying aggregation functions, then the query is executed based on the target logic execution plan including the initial sub-plan, and the query result corresponding to the query statement is obtained.

11. The method according to claim 1, characterized in that, Before obtaining the target logical execution plan corresponding to the query statement, the process also includes: An optimization configuration request is obtained, which carries optimization enabling configuration parameters; In response to the optimization configuration request, the optimization enabling configuration parameters are saved; The step of obtaining the target logic execution plan corresponding to the query statement includes: Based on the aforementioned optimization, the target logic execution plan corresponding to the query statement is obtained by enabling the configuration parameters.

12. A data query device, characterized in that, The device includes: The execution plan acquisition module is used to acquire the target logical execution plan corresponding to the query statement. The target logical execution plan includes an initial sub-plan, which includes an upstream aggregation operator, a projection operator connected to the upstream aggregation operator, and a downstream inner join operator connected to the projection operator. The type conversion module is used to convert the connection type of the downstream inner connection operator to a semi-connection if the output parameters of the left sub-operator corresponding to the downstream inner connection operator include the output parameters of the projection operator and the upstream aggregation operator is a type that does not carry an aggregation function, so as to obtain a target sub-plan that includes the converted downstream semi-connection operator. The query execution module is used to obtain an optimized logical execution plan based on the target sub-plan, and to perform query execution based on the optimized logical execution plan to obtain the query results corresponding to the query statement.

13. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 11.

14. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 11.

15. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 11.