Optimization methods, storage media, and devices for database selectivity calculation
By rationally calculating the selectivity in database operation statements, the problem of inaccurate selectivity calculation in existing technologies is solved, the accuracy of selectivity calculation by the optimizer is improved, and thus the execution efficiency of the database is increased.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CETC JINCANG (BEIJING) TECH CO LTD
- Filing Date
- 2023-05-30
- Publication Date
- 2026-06-30
AI Technical Summary
In existing technologies, databases cannot accurately calculate the selectivity before executing operation statements, resulting in inaccurate cardinality estimation, which affects the optimizer's generation of the optimal execution plan and thus impacts query performance.
By extracting the first constraint clause of the AND predicate, merging columns with functional dependencies into a target column set, calculating the selectivity of each target column, and combining it with a preset formula to calculate the total selectivity, the accuracy of selectivity calculation is improved by taking into account data correlation.
It improves the accuracy of optimizer selectivity calculation, thereby enhancing database execution efficiency and performance.
Smart Images

Figure CN116595044B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to database technology, and in particular to an optimization method, storage medium, and device for database selectivity calculation. Background Technology
[0002] Before executing database operation statements, the optimizer is typically invoked to generate an optimal execution plan based on these statements. The optimizer performs cardinality estimation on the database operation statements, and the efficiency and accuracy of this estimation directly affect its ability to quickly and accurately generate the optimal execution plan. Cardinality estimation is crucial for ensuring the accuracy of the cost model and finding a good join order. Without proper cardinality estimation, query performance may be slow. A common approach to cardinality estimation is to assume complete independence, ignoring the correlation between columns. However, since data in a query inevitably exhibits some degree of correlation, a method for reasonably calculating the selectivity is needed. Summary of the Invention
[0003] One object of this invention is to calculate the selection rate reasonably.
[0004] A further objective of this invention is to improve the accuracy of optimizer selectivity calculation.
[0005] Specifically, the present invention provides an optimization method for database selectivity calculation, comprising:
[0006] Extract the first constraint clause connected by the predicate AND;
[0007] The columns with functional dependencies in the first constraint clause are merged into the first target column set according to the functional dependencies, and the first selectivity of each merged target column in the first target column set is calculated one by one;
[0008] Calculate the second selectivity of target columns outside the first target column set in the first constraint clause;
[0009] Sort the first selection rate and the second selection rate, select the selection rate of the preset number of target columns according to preset rules, and substitute them into the preset formula to calculate the total selection rate.
[0010] Optionally, the step of extracting the constraint clauses connected by the predicate AND includes the following:
[0011] Retrieve database operation statements;
[0012] Determine if a preset statement exists in the database operation statement. Preset statements are used to enable optimization methods.
[0013] If no preset statement exists, the optimization method ends.
[0014] Optionally, after determining whether a preset statement exists in the database operation statement, the following steps are included:
[0015] If a preset statement exists, determine whether the database operation statement contains a second constraint clause connected by a predicate OR;
[0016] If so, the second constraint clause is equivalently replaced with the third constraint clause connected by the predicate AND by a preset rule.
[0017] Optionally, the step of merging columns with functional dependencies in the first constraint clause into a first target column set based on functional dependencies includes:
[0018] Determine if the first constraint clause contains columns with functional dependencies;
[0019] If they exist, the columns with functional dependencies will be merged into the first target column set based on the functional dependencies.
[0020] Optionally, the steps of sorting the first selection rate and the second selection rate, selecting a preset number of target columns according to preset rules, and substituting the selection rates into a preset formula to calculate the total selection rate include:
[0021] Sort by the first and second selection rates;
[0022] Extract the minimum selection rate of the preset number and substitute it into the preset formula to calculate the total selection rate.
[0023] Optionally, the preset formulas include:
[0024] p represents the first choice rate or
[0025] Second choice rate.
[0026] Optionally, after the step of selecting a preset number of target columns according to preset rules and substituting the selection rate into a preset formula to calculate the total selection rate, the following steps are included:
[0027] The total selectivity is returned to the optimizer.
[0028] Optionally, the step of returning the total selectivity to the optimizer includes:
[0029] The optimizer is invoked to estimate the cardinality based on the total selectivity and generate the optimal execution plan.
[0030] According to another aspect of the present invention, a machine-readable storage medium is also provided, on which a machine-executable program is stored, which, when executed by a processor, implements the optimization method for database selectivity calculation as described above.
[0031] According to another aspect of the present invention, a computer device is also provided, including a memory, a processor, and a machine-executable program stored in the memory and running on the processor, wherein the processor executes the machine-executable program to implement the optimization method for database selectivity calculation as described above.
[0032] The optimization method for database selectivity calculation of the present invention, after obtaining the database operation statement, determines whether a preset statement exists at the end of the database operation statement. If so, it extracts the first constraint clause connected by the predicate AND in the database operation statement, merges the columns with functional dependencies in the first constraint clause according to the functional dependencies to form a first target column set, and calculates the first selectivity of each merged target column in the first target column set; calculates the second selectivity of the target columns outside the first target column set in the first constraint clause; sorts the first selectivity and the second selectivity, selects a preset number of selectivity according to preset rules, and substitutes them into a preset formula to calculate the total selectivity, wherein the preset formula includes: p represents the first or second selection rate. This method allows for a reasonable calculation of the selection rate and incorporates data relevance into the selection rate formula, thereby improving the accuracy of the optimizer's selection rate calculation.
[0033] Furthermore, the optimization method for database selectivity calculation of the present invention, when determining that a preset statement exists in the database operation statement, determines whether a second constraint clause connected by predicate OR exists in the database operation statement; if it exists, the second constraint clause is equivalently replaced by a third constraint clause connected by predicate AND according to preset rules, thereby extracting constraint clauses in the database operation statement more comprehensively and improving the accuracy of the optimizer selectivity calculation.
[0034] The above and other objects, advantages and features of the present invention will become more apparent to those skilled in the art from the following detailed description of specific embodiments of the invention in conjunction with the accompanying drawings. Attached Figure Description
[0035] The following sections will describe some specific embodiments of the invention in detail by way of example and not limitation, with reference to the accompanying drawings. The same reference numerals in the drawings denote the same or similar parts or portions. Those skilled in the art should understand that these drawings are not necessarily drawn to scale. In the drawings:
[0036] Figure 1 This is a flowchart illustrating an optimization method for calculating database selectivity according to an embodiment of the present invention.
[0037] Figure 2 This is a flowchart illustrating an optimization method for calculating database selectivity according to another embodiment of the present invention;
[0038] Figure 3 This is a schematic diagram of a machine-readable storage medium in an optimization method for database selectivity calculation according to an embodiment of the present invention; and
[0039] Figure 4 This is a schematic diagram of a computer device in an optimization method for calculating database selectivity according to an embodiment of the present invention. Detailed Implementation
[0040] Those skilled in the art should understand that the embodiments described below are merely a part of the embodiments of the present invention, and not all of the embodiments of the present invention. These partial embodiments are intended to explain the technical principles of the present invention and are not intended to limit the scope of protection of the present invention. Based on the embodiments provided by the present invention, all other embodiments obtained by those skilled in the art without creative effort should still fall within the scope of protection of the present invention.
[0041] Figure 1 This is a flowchart illustrating an optimization method for calculating database selectivity according to an embodiment of the present invention. The process may include:
[0042] Step S101: Obtain the database operation statement.
[0043] Step S102: Determine whether a preset statement exists in the database operation statement. The preset statement is used to enable this optimization function. In some embodiments of this application, a preset statement can be added to the end of the database operation statement, and the database will then enable the optimization function of this application when executing the database operation statement.
[0044] Step S103: If the determination in step S102 is yes, determine whether the database operation statement is a second constraint clause connected by a predicate OR. This step includes: after determining that the database has enabled the optimization function of this application, determining whether there is a constraint clause connected by a predicate OR in the constraint clause of the database operation statement.
[0045] Step S104: If the condition in step S103 is met, the second constraint clause is equivalently replaced with a third constraint clause connected by predicate AND according to a preset rule. In some embodiments of this application, De Morgan's law can be used to equivalently replace the second constraint clause with a third constraint clause connected by predicate AND, for example:
[0046] not(A or B)<=>(not A)and(not B)
[0047] A or B<=>not((not A)and(not B))
[0048] A not equal to (1-A)
[0049] A or B = 1 – [(1-A) and (1-B)]
[0050] When the situation is complex,
[0051] A or B or C or…=1-[(1-A)and(1-B)and(1-C)and…]
[0052] Step S105: If step S104 is completed or step S103 is determined to be negative, extract the first constraint clause connected by the predicate AND.
[0053] Step S106: Columns with functional dependencies in the first constraint clause are merged into a first target column set based on these functional dependencies, and the first selectivity of each merged target column in the first target column set is calculated. This step includes: determining whether there are columns with functional dependencies in the first constraint clause; if so, merging the columns with functional dependencies into a first target column set based on these functional dependencies. A functional dependency refers to a relationship of dependence between the results of two target columns, for example:
[0054] Given a table text(a int, b int) with a total of 10,000 records,
[0055] There are 100 records that satisfy condition a=1, with a selection rate of 1%. There are also 100 records that satisfy condition b=1, with a selection rate of 1%. However, there are actually 100 records that satisfy both a=1 and b=1. If we assume complete independence of the attributes, the selection rate for records satisfying a=1 and b=1 would be 1% * 1% = 0.01%. But if we use the method of this application to statistically analyze the information, the selection rate for a=1 and b=1 would be 1%. This improves the accuracy of the calculated selection rate.
[0056] Step S107: Calculate the second selectivity of target columns outside the first target column set in the first constraint clause. This step includes: calculating the selectivity of each target column in the first constraint clause that has no functional dependency as the second selectivity.
[0057] Step S108: Sort the first selection rate and the second selection rate, and select a preset number of target columns according to preset rules, then substitute the selection rates into a preset formula to calculate the total selection rate. In some embodiments of this application, the selection rates calculated in steps S106 and S107 are sorted according to their size, and then the four smallest selection rates are selected and substituted into the preset formula for calculation. One example of the preset formula is as follows: p represents the selectivity of the target column. Those skilled in the art can determine the calculation formula and the number of selectivity values to be substituted based on the actual situation.
[0058] This method allows for the reasonable calculation of the selection rate and incorporates data relevance into the selection rate formula, thereby improving the accuracy of the optimizer's selection rate calculation and ultimately enhancing the database's execution efficiency.
[0059] Figure 2 This is a flowchart illustrating an optimization method for calculating database selectivity according to another embodiment of the present invention. The process may include:
[0060] Step S201: Extract the first constraint clause connected by the predicate AND. This step includes: extracting the constraint clauses connected by the predicate AND in the database operation statements.
[0061] Step S202: Columns with functional dependencies in the first constraint clause are merged into a first target column set based on the functional dependencies, and the first selectivity of each merged target column in the first target column set is calculated one by one.
[0062] Step S203: Calculate the second selectivity of target columns outside the first target column set in the first constraint clause. This step includes: calculating the selectivity of each target column in the first constraint clause that has no functional dependency as the second selectivity.
[0063] Step S204: Sort the first selection rate and the second selection rate, and select a preset number of target columns of selection rates according to preset rules and substitute them into a preset formula to calculate the total selection rate. This step includes: sorting all the selection rates calculated in steps S202 and S203, and taking out the corresponding number of selection rates according to preset rules and substituting them into a preset formula for calculation to obtain the final selection rate.
[0064] This step is followed by: returning the final selectivity to the optimizer, calling the optimizer to perform cardinality estimation, and then calculating the optimal execution plan.
[0065] This method allows for a more balanced calculation of selectivity for different database operation statements when data correlation cannot be determined. It avoids the tendency to favor complete independence while also minimizing large discrepancies in selectivity calculations when data is completely unrelated, thereby improving database performance.
[0066] This embodiment also provides a machine-readable storage medium and a computer device. Figure 3 This is a schematic diagram of a machine-readable storage medium 301 according to an embodiment of the present invention. Figure 4 This is a schematic diagram of a computer device 403 according to an embodiment of the present invention.
[0067] The machine-readable storage medium 301 stores a machine-executable program 302 thereon, which, when executed by a processor, implements the optimization method for database selectivity calculation of any of the above embodiments.
[0068] Computer device 403 may include memory 401, processor 402 and machine-executable program 302 stored on memory 401 and running on processor 402, and processor 402 executes machine-executable program 302 to implement the optimization method for database selectivity calculation of any of the above embodiments.
[0069] It should be noted that the logic and / or steps represented in the flowchart or otherwise described herein, such as calculating the selection rate, may be specifically implemented in any machine-readable storage medium for use by, or in conjunction with, an instruction execution system, apparatus or device (such as a computer-based system, a processor-included system or other system that can fetch and execute instructions from, an instruction execution system, apparatus or device).
[0070] For the purposes of this embodiment, the machine-readable storage medium 301 can be any means capable of containing, storing, communicating, propagating, or transmitting a program for use by or in conjunction with an instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the machine-readable storage medium 301 include: an electrical connection (electronic device) having one or more wires, a portable computer disk drive (magnetic device), random access memory (RAM), read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable optical disc read-only memory (CDROM). Furthermore, the machine-readable storage medium 301 can even be paper or other suitable media on which the program can be printed, since the program can be obtained electronically, for example, by optically scanning the paper or other medium, followed by editing, interpreting, or otherwise processing as necessary, and then stored in a computer memory.
[0071] It should be understood that various parts of the present invention can be implemented using hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented using software or firmware stored in memory and executed by a suitable instruction execution system.
[0072] Computer device 403 can be, for example, a server, desktop computer, laptop computer, tablet computer, or smartphone. In some examples, computer device 403 can be a cloud computing node. Computer device 403 can be described in the general context of computer system executable instructions (such as program modules) executed by a computer system. Typically, program modules can include routines, programs, object programs, components, logic, data structures, etc., that perform specific tasks or implement specific abstract data types. Computer device 403 can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices linked through a communication network. In a distributed cloud computing environment, program modules can reside on local or remote computing system storage media, including storage devices.
[0073] Computer device 403 may include a processor 402 adapted to execute stored instructions and a memory 401 that provides temporary storage space for the operation of said instructions during operation. The processor 402 may be a single-core processor, a multi-core processor, a computing cluster, or any other configuration. The memory 401 may include random access memory (RAM), read-only memory, flash memory, or any other suitable storage system.
[0074] The processor 402 can be connected via a system interconnect (e.g., PCI, PCI-Express, etc.) to an I / O interface (input / output interface) suitable for connecting the computer device 403 to one or more I / O devices (input / output devices). I / O devices may include, for example, a keyboard and indicating devices, where indicating devices may include a touchpad or touchscreen, etc. I / O devices may be built into the computer device 403 or may be external devices connected to the computing device.
[0075] The processor 402 may also be linked via a system interconnect to a display interface suitable for connecting the computer device 403 to a display device. The display device may include a display screen as a built-in component of the computer device 403. The display device may also include a computer monitor, television, or projector, etc., externally connected to the computer device 403. Furthermore, a network interface controller (NIC) may be adapted to connect the computer device 403 to a network via a system interconnect. In some embodiments, the NIC may use any suitable interface or protocol (such as an Internet Minicomputer System Interface) to transmit data. The network may be a cellular network, a radio network, a wide area network (WAN), a local area network (LAN), or the Internet, etc. Remote devices may connect to the computing device via the network.
[0076] The flowchart provided in this embodiment is not intended to indicate that the operations of the method will be performed in any particular order, or that all operations of the method are included in every case. Furthermore, the method may include additional operations. Within the scope of the technical concept provided by the method in this embodiment, additional variations can be made to the above method.
[0077] Therefore, those skilled in the art should recognize that although numerous exemplary embodiments of the present invention have been shown and described in detail herein, many other variations or modifications conforming to the principles of the present invention can be directly determined or derived from the disclosure of the present invention without departing from the spirit and scope of the invention. Thus, the scope of the present invention should be understood and construed as covering all such other variations or modifications.
Claims
1. An optimization method for database selectivity calculation, comprising: Extract the first constraint clause connected by the predicate AND; The columns with functional dependencies in the first constraint clause are merged into a first target column set according to the functional dependencies, and the first selectivity of each merged target column in the first target column set is calculated one by one; Calculate the second selectivity of the target columns outside the first target column set in the first constraint clause; Sort the first selection rate and the second selection rate, select a preset number of target columns according to preset rules, and substitute them into a preset formula to calculate the total selection rate; The step of sorting the first selection rate and the second selection rate, and selecting a preset number of selection rates of the target column according to preset rules and substituting them into a preset formula to calculate the total selection rate includes: sorting the first selection rate and the second selection rate; extracting a preset number of minimum selection rates and substituting them into the preset formula to calculate the total selection rate; The preset formula includes: Total selection rate = p represents either the first selection rate or the second selection rate.
2. The optimization method for database selectivity calculation according to claim 1, wherein, Prior to the step of extracting the constraint clauses connected by predicate AND, the following is included: Retrieve database operation statements; Determine whether a preset statement exists in the database operation statement, the preset statement being used to enable the optimization method; If the preset statement does not exist, the optimization method ends.
3. The optimization method for database selectivity calculation according to claim 2, wherein, Following the step of determining whether a preset statement exists in the database operation statement, the following is included: If the preset statement exists, determine whether the database operation statement contains a second constraint clause connected by a predicate OR; If so, the second constraint clause is equivalently replaced with a third constraint clause connected by the predicate AND by a preset rule.
4. The optimization method for database selectivity calculation according to claim 1, wherein, The step of merging columns with functional dependencies in the first constraint clause into a first target column set based on the functional dependencies includes: Determine if the first constraint clause contains columns with functional dependencies; If they exist, the columns with functional dependencies are merged according to the functional dependencies to form the first target column set.
5. The optimization method for database selectivity calculation according to claim 1, wherein, After the step of selecting a preset number of target columns according to preset rules and substituting the selection rate into a preset formula to calculate the total selection rate, the following steps are included: The total selectivity is returned to the optimizer.
6. The optimization method for database selectivity calculation according to claim 5, wherein, The step of returning the total selectivity to the optimizer is followed by: The optimizer is invoked to perform cardinality estimation based on the total selectivity and generate the optimal execution plan.
7. A machine-readable storage medium having a machine-executable program stored thereon, wherein the machine-executable program, when executed by a processor, implements the optimization method for database selectivity calculation according to any one of claims 1 to 6.
8. A computer device comprising a memory, a processor, and a machine-executable program stored in the memory and running on the processor, wherein the processor, when executing the machine-executable program, implements an optimization method for database selectivity calculation according to any one of claims 1 to 6.