Data query method and device, electronic equipment and storage medium
By dynamically adjusting the predicate order during the data query process and combining rearrangement strategies with and without statistical information, the problems of slow query speed and difficulty in maintaining the accuracy of compilation results in existing technologies are solved, thus achieving more efficient data query.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING OCEANBASE TECHNOLOGY CO LTD
- Filing Date
- 2023-08-07
- Publication Date
- 2026-06-30
Smart Images

Figure CN117131063B_ABST
Abstract
Description
Technical Field
[0001] This specification relates to the field of database technology, and more particularly to a data query method and apparatus, electronic device and storage medium. Background Technology
[0002] In today's era of rapid internet and information technology development, data generation is exploding, thus placing increasingly higher demands on databases and their management. When querying data within a database using SQL or other data query languages, the query language can first be compiled to generate a compiled result representing the query plan, and then the data query can be completed based on the compiled result. Data query statements include multiple predicates, and the compiled result includes the execution order of each predicate. However, the effectiveness of data queries based on the compiled result in related technologies still needs improvement, for example, query speed needs to be increased. Summary of the Invention
[0003] In view of the above, this specification provides a data query method and apparatus, electronic device and storage medium through one or more embodiments.
[0004] To achieve the above objectives, one or more embodiments of this specification provide the following technical solutions:
[0005] According to a first aspect of one or more embodiments of this specification, a data query method is proposed, the method comprising:
[0006] During the process of querying the data based on the compilation result of the data query language, in response to the end of the data query for the current data unit, the first query time of the current data unit is obtained, and the order of the first predicate and the second predicate in the compilation result of the data query language is swapped, wherein the data being queried includes multiple data units;
[0007] In response to the completion of the data query for the next data unit, obtain the second query time for the next data unit;
[0008] In response to the first query time being greater than or equal to the second query time, the order of the first predicate and the second predicate is preserved in the compilation result of the data query language;
[0009] In response to the first query time being less than the second query time, the order of the first predicate and the second predicate is restored in the compilation result of the data query language.
[0010] In one embodiment of this specification, the method further includes:
[0011] In response to the first query time being greater than or equal to the second query time, the order of the third and fourth predicates in the compilation result of the data query language is swapped.
[0012] In one embodiment of this specification, the method further includes:
[0013] In response to the first query time being less than the second query time, the probability value of swapping the predicate combination formed by the first predicate and the second predicate is reduced;
[0014] The step of swapping the order of the first and second predicates in the compilation result of the data query language includes:
[0015] In the compilation result of the data query language, the order of the two predicates in any predicate combination with the highest probability value is swapped.
[0016] In one embodiment of this specification, the method further includes:
[0017] During the process of querying the data based on the compilation results of the data query language, in response to the end of the data query for at least one data unit, the average execution cost of each predicate in the compilation results of the data query language is obtained.
[0018] The response to the completion of a data query for the current data unit, obtaining the first query time of the current data unit, and swapping the order of the first and second predicates in the compilation result of the data query language, includes:
[0019] If the average execution cost is less than the cost threshold, in response to the end of the data query for the current data unit, the first query time of the current data unit is obtained, and the order of the first predicate and the second predicate in the compilation result of the data query language is swapped.
[0020] In one embodiment of this specification, the method further includes:
[0021] If the average execution cost of each predicate in the compilation result of the data query language is greater than or equal to the cost threshold, statistical information of the corresponding predicate is obtained during the execution of each predicate in the compilation result of the data query language, wherein the statistical information includes selectivity and execution cost;
[0022] Based on the statistical information of each predicate, the order of predicates in the compilation results of the data query language is rearranged.
[0023] In one embodiment of this specification, rearranging the predicate order of the compilation result of the data query language based on the statistics of each predicate includes:
[0024] The weight of each predicate is determined based on the statistical information of each predicate;
[0025] Arrange each predicate in descending order of weight.
[0026] In one embodiment of this specification, the statistical information of the predicate also includes the amount of data targeted during the execution of the predicate;
[0027] The step of rearranging the predicate order of the compiled result of the data query language based on the statistical information of each predicate includes:
[0028] Based on the amount of data executed by each predicate as described in its statistical information, all predicates are filtered.
[0029] Based on the statistical information of each predicate in the filtering results, the order of predicates in the compilation results of the data query language is rearranged.
[0030] In one embodiment of this specification, rearranging the predicate order of the compiled result of the data query language based on the statistical information of each predicate in the filtering result includes:
[0031] If the number of predicates in the filtering results is greater than or equal to the number threshold, the order of predicates in the compilation results of the data query language is rearranged according to the statistical information of each predicate in the filtering results.
[0032] In one embodiment of this specification, the method further includes:
[0033] If the number of predicates in the filtering results is less than the number threshold, in response to the end of the data query for the current data unit, the first query time of the current data unit is obtained, and the order of the first and second predicates in the compilation result of the data query language is swapped.
[0034] In one embodiment of this specification, the method further includes:
[0035] Generate compilation results based on the data query language.
[0036] According to a second aspect of one or more embodiments of this specification, a data query apparatus is provided, the apparatus comprising:
[0037] The first swapping module is used to, in the process of querying the data based on the compilation result of the data query language, in response to the end of the data query for the current data unit, obtain the first query time of the current data unit, and swap the order of the first predicate and the second predicate in the compilation result of the data query language, wherein the data being queried includes multiple data units.
[0038] The acquisition module is used to acquire the second query time of the next data unit in response to the completion of the data query for the next data unit.
[0039] A retention module is configured to retain the order of the first predicate and the second predicate in the compilation result of the data query language in response to the first query time being greater than or equal to the second query time;
[0040] A recovery module is used to restore the order of the first predicate and the second predicate in the compilation result of the data query language in response to the first query time being less than the second query time.
[0041] In one embodiment of this specification, the device further includes a second switching module for:
[0042] In response to the first query time being greater than or equal to the second query time, the order of the third and fourth predicates in the compilation result of the data query language is swapped.
[0043] In one embodiment of this specification, the device further includes a probability module for:
[0044] In response to the first query time being less than the second query time, the probability value of swapping the predicate combination formed by the first predicate and the second predicate is reduced;
[0045] The first switching module is specifically used for:
[0046] In the compilation result of the data query language, the order of the two predicates in any predicate combination with the highest probability value is swapped.
[0047] In one embodiment of this specification, the device further includes a detection module for:
[0048] During the process of querying the data based on the compilation results of the data query language, in response to the end of the data query for at least one data unit, the average execution cost of each predicate in the compilation results of the data query language is obtained.
[0049] The first switching module is specifically used for:
[0050] If the average execution cost is less than the cost threshold, in response to the end of the data query for the current data unit, the first query time of the current data unit is obtained, and the order of the first predicate and the second predicate in the compilation result of the data query language is swapped.
[0051] In one embodiment of this specification, the apparatus further includes a rearrangement module for:
[0052] If the average execution cost of each predicate in the compilation result of the data query language is greater than or equal to the cost threshold, statistical information of the corresponding predicate is obtained during the execution of each predicate in the compilation result of the data query language, wherein the statistical information includes selectivity and execution cost;
[0053] Based on the statistical information of each predicate, the order of predicates in the compilation results of the data query language is rearranged.
[0054] In one embodiment of this specification, when the reordering module reorders the predicate order of the compilation result of the data query language based on the statistics of each predicate, it is specifically used for:
[0055] The weight of each predicate is determined based on the statistical information of each predicate;
[0056] Arrange each predicate in descending order of weight.
[0057] In one embodiment of this specification, the statistical information of the predicate also includes the amount of data targeted during the execution of the predicate;
[0058] The reordering module is used to reorder the predicate order of the compilation result of the data query language based on the statistical information of each predicate. Specifically, it is used for:
[0059] Based on the amount of data executed by each predicate as described in its statistical information, all predicates are filtered.
[0060] Based on the statistical information of each predicate in the filtering results, the order of predicates in the compilation results of the data query language is rearranged.
[0061] In one embodiment of this specification, when the rearrangement module rearranges the predicate order of the compiled result of the data query language based on the statistical information of each predicate in the filtering result, it is specifically used for:
[0062] If the number of predicates in the filtering results is greater than or equal to the number threshold, the order of predicates in the compilation results of the data query language is rearranged according to the statistical information of each predicate in the filtering results.
[0063] In one embodiment of this specification, the device further includes a third switching module for:
[0064] If the number of predicates in the filtering results is less than the number threshold, in response to the end of the data query for the current data unit, the first query time of the current data unit is obtained, and the order of the first and second predicates in the compilation result of the data query language is swapped.
[0065] In one embodiment of this specification, the apparatus further includes a compilation module for:
[0066] Generate compilation results based on the data query language.
[0067] According to a third aspect of one or more embodiments of this specification, an electronic device is provided, comprising:
[0068] processor;
[0069] Memory used to store processor-executable instructions;
[0070] The processor implements the method as described in the first aspect by running the executable instructions.
[0071] According to a fourth aspect of one or more embodiments of this specification, a computer-readable storage medium is provided that stores computer instructions thereon, which, when executed by a processor, implement the steps of the method as described in the first aspect.
[0072] The technical solutions provided in the embodiments of this specification may include the following beneficial effects:
[0073] The data query method provided in this specification operates during the data query process based on the compilation result of the data query language. In response to the completion of the data query for the current data unit, it obtains the first query time of the current data unit and swaps the order of the first and second predicates in the compilation result of the data query language. In response to the completion of the data query for the next data unit, it obtains the second query time of the next data unit. Then, in response to the first query time being greater than or equal to the second query time, it retains the order of the first and second predicates in the compilation result of the data query language; in response to the first query time being less than the second query time, it restores the order of the first and second predicates in the compilation result of the data query language. In other words, during the data query process, the order of the predicates can be continuously adjusted, and the merits of the predicate order before and after the adjustment are measured by the query time of the data unit. This allows for continuous dynamic optimization of the predicate order during the data query process to improve query performance. Furthermore, the adjustment method is simple and convenient, does not rely on statistical information of predicate execution, and does not impose an additional burden on the data query. Attached Figure Description
[0074] Figure 1 This is a flowchart of a data query method provided in an exemplary embodiment.
[0075] Figure 2 This is a schematic diagram of a predicate reordering strategy without statistical information provided in an exemplary embodiment;
[0076] Figure 3 This is a schematic diagram illustrating the selection method of a predicate reordering strategy provided in an exemplary embodiment;
[0077] Figure 4 This is a schematic diagram of the structure of a device provided in an exemplary embodiment.
[0078] Figure 5 This is a block diagram of a data query device provided in an exemplary embodiment. Detailed Implementation
[0079] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numerals in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with one or more embodiments of this specification. Rather, they are merely examples of apparatuses and methods consistent with some aspects of one or more embodiments of this specification as detailed in the appended claims.
[0080] It should be noted that the steps of the corresponding methods are not necessarily performed in the order shown and described in this specification in other embodiments. In some other embodiments, the methods may include more or fewer steps than described in this specification. Furthermore, a single step described in this specification may be broken down into multiple steps in other embodiments; and multiple steps described in this specification may be combined into a single step in other embodiments.
[0081] In today's era of rapid internet and information technology development, data generation is exploding, thus placing increasingly higher demands on databases and their management. When querying data within a database using SQL or other data query languages, the query language can first be compiled to form a compiled result representing the query plan, and then the data query can be completed based on the compiled result. A data query statement includes multiple predicates, which are functions in the data query language that return logical values, such as greater than, equal to, less than, and, or. The compiled result includes the execution order of each predicate. However, in related technologies, the effectiveness of data queries based on the compiled result still needs improvement, for example, query speed needs to be increased.
[0082] For example, when generating compilation results, it's impossible to accurately assess the statistics of predicate execution, leading to inaccurate compilation results. Another example is that data distribution is dynamic; even if the compilation results are relatively accurate, the statistics may change when querying data based on those results, causing the originally accurate compilation results to lose their accuracy.
[0083] Based on this, in a first aspect, at least one embodiment of this specification provides a data query method. This method can be applied to the process of querying data based on the compilation results of a data query language. For example, during the execution phase of the data query (data query includes a compilation phase that generates compilation results based on the data query language and an execution phase that performs data query on the data based on the compilation phase), the order of predicates in the compilation results is continuously and dynamically adjusted to keep the compilation results as accurate as possible in real time, thereby improving the effect of data query, such as query speed.
[0084] Please refer to the appendix. Figure 1 The example illustrates the flow of the data query method, including steps S101 to S104.
[0085] In step S101, during the process of querying the data based on the compilation result of the data query language, in response to the end of the data query for the current data unit, the first query time of the current data unit is obtained, and the order of the first predicate and the second predicate in the compilation result of the data query language is swapped, wherein the data being queried includes multiple data units.
[0086] The execution phase of a data query is the process of querying the data based on the compilation results of the data query language. It can be understood that the compilation phase, preceding the execution phase, generates compilation results based on the data query language. For example, based on information such as tables, columns, and indexes in the queried data targeted by the data query language, a compilation result representing the execution plan is generated.
[0087] The queried data is divided into multiple data units. For example, all rows of the queried data can be divided into multiple row combinations according to a certain number of rows, and each row combination is a data unit. Taking a data table with 10,000 rows as an example, it can be divided into data units of 1,000 rows from the first row to the last row. That is, rows 1 to 1,000 are the first data unit, rows 1,001 to 2,000 are the second data unit, rows 2,001 to 3,000 are the third data unit, rows 3,001 to 4,000 are the fourth data unit, rows 4,001 to 5,000 are the fifth data unit, rows 5,001 to 6,000 are the sixth data unit, rows 6,001 to 7,000 are the seventh data unit, rows 7,001 to 8,000 are the eighth data unit, rows 8,001 to 9,000 are the ninth data unit, and rows 9,001 to 10,000 are the tenth data unit.
[0088] This step can be run at the end of the data query for any data unit; or at the end of the data query for any data unit, and the order of the first predicate and the second predicate after being swapped is preserved in the compilation result of the data query language.
[0089] This step can randomly swap the order of two predicates in the compilation result of the data query language as the first and second predicates; or, swap the order of two predicates in any combination of predicates with the highest probability value in the compilation result of the data query language. The specific details regarding the swapping probability values will be described in detail later and will not be elaborated here.
[0090] In step S102, in response to the end of the data query for the next data unit, the second query time for the next data unit is obtained.
[0091] Time can be set when performing a data query for each data unit. The first query time is the time taken to perform a data query for the current data unit and obtain the query result; the second query time is the time taken to perform a data query for the next data unit and obtain the query result.
[0092] In step S103, in response to the first query time being greater than or equal to the second query time, the order of the first predicate and the second predicate is preserved in the compilation result of the data query language.
[0093] If the first query time is greater than or equal to the second query time, it means that after the first and second predicates are swapped, the query time of the data unit is reduced and the query speed is increased. Therefore, the swapped order can be retained.
[0094] Furthermore, while preserving the order of the first and second predicates after they have been swapped, the second query time can characterize the time taken to execute a query on the data unit using the current compilation result. Therefore, it can be used as the predicate order before the swap in the next attempt to swap the predicate order. In other words, in response to the first query time being greater than or equal to the second query time, the third and fourth predicates in the compilation result of the data query language can be swapped (i.e., the same operation as step S101 is performed); and steps S102 and S103, or steps S102 and S104, can be continued for the compilation result after the third and fourth predicates have been swapped. That is, after executing step S103, the method can be repeated.
[0095] In step S104, in response to the first query time being less than the second query time, the order of the first predicate and the second predicate is restored in the compilation result of the data query language.
[0096] The fact that the first query time is shorter than the second query time indicates that after the first and second predicates are swapped, the query time for data units increases and the query speed decreases. Therefore, the original order can be restored.
[0097] Furthermore, when restoring the order of the first and second predicates before they were swapped, attempts to swap their order can be minimized. Specifically, in response to the first query time being less than the second query time, the swapping probability of the predicate combination formed by the first and second predicates is reduced. Initially, the swapping probability of all predicate combinations in the compilation result is equal. Each execution of step S104 reduces the swapping probability of the restored predicate combination, resulting in differences in swapping probability values. Therefore, when executing step S101, the predicate combination to be swapped can be selected based on the swapping probability value, i.e., the order of the two predicates in any predicate combination with the highest swapping probability value in the compilation result of the data query language.
[0098] The data query method provided in this specification operates during the data query process based on the compilation result of the data query language. In response to the completion of the data query for the current data unit, it obtains the first query time of the current data unit and swaps the order of the first and second predicates in the compilation result of the data query language. In response to the completion of the data query for the next data unit, it obtains the second query time of the next data unit. Then, in response to the first query time being greater than or equal to the second query time, it retains the order of the first and second predicates in the compilation result of the data query language; in response to the first query time being less than the second query time, it restores the order of the first and second predicates in the compilation result of the data query language. In other words, during the data query process, the order of the predicates can be continuously adjusted, and the merits of the predicate order before and after the adjustment are measured by the query time of the data unit. This allows for continuous dynamic optimization of the predicate order during the data query process to improve query performance. Furthermore, the adjustment method is simple and convenient, does not rely on statistical information of predicate execution, and does not impose an additional burden on the data query.
[0099] In summary, the predicate rearrangement method described in the above embodiments can be referred to as a predicate rearrangement strategy without statistical information. Please refer to the appendix. Figure 2 After the strategy starts, two predicates are randomly selected and their execution order is swapped. It is then determined whether the execution efficiency is improved after the swap. If it is improved, the swap order is retained. If it is not improved, the order before the swap is restored and the probability of selecting the same predicates for swapping is reduced next time.
[0100] In addition to the predicate reordering strategies without statistical information described in the above embodiments, this specification also provides a predicate reordering strategy with statistical information. Therefore, some embodiments in this specification can, before reordering the predicates, obtain the average execution cost of each predicate in the compilation result of the data query language during the data query process based on the compilation result of the data query language, in response to the completion of the data query for at least one data unit, to determine which predicate reordering strategy to adopt. Preferably, the average execution cost of each predicate in the compilation result of the data query language is obtained in response to the completion of the data query for multiple data units.
[0101] In this process, data queries can be executed sequentially for multiple data units (e.g., 10) based on the compilation results of the data query language. That is, each predicate in the compilation results is executed, and the total time spent executing data queries for multiple data units is counted. Then, the average time spent for each predicate is calculated based on the total time spent and the number of predicates. The average time spent for each predicate is the average execution cost of the predicate.
[0102] After obtaining the average execution cost of each predicate, if the average execution cost is less than a cost threshold, the predicate reordering strategy without statistical information described above is executed; if the average execution cost is greater than or equal to the cost threshold, the predicate reordering strategy with statistical information described above is executed, for example, the predicate reordering strategy with statistical information is executed periodically. That is:
[0103] If the average execution cost is less than the cost threshold, in response to the end of the data query for the current data unit, the first query time of the current data unit is obtained, and the order of the first predicate and the second predicate in the compilation result of the data query language is swapped (execution step S101).
[0104] If the average execution cost of each predicate in the compilation result of the data query language is greater than or equal to the cost threshold, the statistical information of the corresponding predicate is obtained during the execution of each predicate in the compilation result of the data query language. The statistical information includes selectivity and execution cost. Selectivity can be the probability of the predicate matching in the data row, and execution cost can be the query time of the predicate in the data row. Then, the order of the predicates in the compilation result of the data query language is rearranged according to the statistical information of each predicate.
[0105] For example, rearranging the predicate order of the compilation result of the data query language based on the statistical information of each predicate may include: first, determining the weight of each predicate based on the statistical information of each predicate (e.g., selectivity and execution cost); and then arranging each predicate in descending order of weight.
[0106] The statistical information of the predicate may also include the amount of data targeted during the execution of the predicate. For example, two predicates are executed sequentially. The first predicate queries a complete data unit, while the second predicate queries the result of the first predicate. To illustrate, a data unit may include 100 data rows. The first predicate queries all 100 rows, obtaining 10 rows, and the second predicate then queries those 10 rows. It can be understood that the larger the amount of data targeted during predicate execution, the more effective the statistical information obtained from that predicate; conversely, the smaller the amount of data targeted, the less effective the statistical information obtained. This is because only when a predicate executes a query on a sufficient amount of data can it obtain effective statistical information such as execution cost and selectivity.
[0107] For another example, rearranging the predicate order of the compiled results of the data query language based on the statistical information of each predicate may include: first, filtering all predicates based on the amount of data targeted during the execution of each predicate in the statistical information of each predicate; next, rearranging the predicate order of the compiled results of the data query language based on the statistical information of each predicate in the filtering results. For example, determining the weight of each predicate based on the statistical information of each predicate in the filtering results, arranging each predicate in the filtering results in descending order of weight, and then randomly arranging the remaining predicates after the filtering results.
[0108] This example only performs effective sorting on the filtered predicates; the filtered predicates are not effectively sorted. Therefore, after filtering all predicates, the number of predicates in the filtered results (or the proportion of the number of predicates in the filtered results to the total number of predicates) can be counted. The number of predicates in the filtered results represents the number of predicates from which sufficient effective statistical information can be collected. If the number of predicates in the filtered results is large enough (or the proportion of the number of predicates in the filtered results to the total number of predicates is large enough), then predicate reordering continues according to the predicate reordering strategy with statistical information. If the number of predicates in the filtered results is insufficient (or the proportion of the number of predicates in the filtered results to the total number of predicates is not large enough), then predicate reordering follows the predicate reordering strategy without statistical information. That is:
[0109] If the number of predicates in the filtering results is greater than or equal to the number threshold (or the proportion of the number of predicates in the filtering results to the total number of predicates is greater than or equal to the proportion threshold), the order of predicates in the compilation results of the data query language is rearranged according to the statistical information of each predicate in the filtering results.
[0110] If the number of predicates in the filtering result is less than the number threshold (or the proportion of the number of predicates in the filtering result to the total number of predicates is less than the proportion threshold), in response to the end of the data query for the current data unit, the first query time of the current data unit is obtained, and the order of the first predicate and the second predicate in the compilation result of the data query language is swapped.
[0111] By filtering all predicates in the compilation results in the above example, and by using different rearrangement strategies to rearrange the predicates when the number of predicates in the filtered results is different (or the proportion of the number of predicates in the filtered results to the total number of predicates is different), the accuracy of predicate rearrangement can be guaranteed.
[0112] Please refer to the appendix. Figure 3 This example exemplifies the selection method of the predicate reordering strategy obtained by combining the above-described embodiments. After the execution phase of the data query begins, predicate execution cost information can be collected, and it can be determined whether the average execution cost of the predicate is low (e.g., whether it is low by measuring a cost threshold). If so, a predicate reordering strategy without statistical information is selected; otherwise, statistical information is collected during runtime, and it is determined whether sufficient statistical information has been collected. If not, a predicate reordering strategy without statistical information is selected; if so, a predicate reordering strategy with statistical information is selected.
[0113] This method detects predicate reordering strategies at runtime and adaptively reorders predicates based on the detected strategies. This generates a better execution plan while minimizing the additional load on the strategies themselves, making it more suitable for complex database query environments.
[0114] Figure 4 This is a schematic structural diagram of a device provided in an exemplary embodiment. Please refer to... Figure 4 At the hardware level, the device includes a processor 402, an internal bus 404, a network interface 406, memory 408, and non-volatile memory 410, and may also include other hardware required for tasks. One or more embodiments of this specification can be implemented in software, such as the processor 402 reading the corresponding computer program from the non-volatile memory 410 into memory 408 and then running it. Of course, in addition to software implementation, one or more embodiments of this specification do not exclude other implementation methods, such as logic devices or a combination of hardware and software, etc. That is to say, the execution subject of the following processing flow is not limited to each logic unit, but can also be hardware or logic devices.
[0115] Please refer to Figure 5 Data query devices can be applied to, for example Figure 4 The device shown is used to implement the technical solution of this specification. The device includes:
[0116] The first swapping module 501 is used to, in the process of querying the data based on the compilation result of the data query language, in response to the end of the data query for the current data unit, obtain the first query time of the current data unit, and swap the order of the first predicate and the second predicate in the compilation result of the data query language, wherein the data being queried includes multiple data units.
[0117] The acquisition module 502 is used to acquire the second query time of the next data unit in response to the end of the data query for the next data unit;
[0118] The retention module 503 is used to retain the order of the first predicate and the second predicate in the compilation result of the data query language in response to the first query time being greater than or equal to the second query time;
[0119] The recovery module 504 is used to restore the order of the first predicate and the second predicate in the compilation result of the data query language in response to the first query time being less than the second query time.
[0120] In one embodiment of this specification, the device further includes a second switching module for:
[0121] In response to the first query time being greater than or equal to the second query time, the order of the third and fourth predicates in the compilation result of the data query language is swapped.
[0122] In one embodiment of this specification, the device further includes a probability module for:
[0123] In response to the first query time being less than the second query time, the probability value of swapping the predicate combination formed by the first predicate and the second predicate is reduced;
[0124] The first switching module is specifically used for:
[0125] In the compilation result of the data query language, the order of the two predicates in any predicate combination with the highest probability value is swapped.
[0126] In one embodiment of this specification, the device further includes a detection module for:
[0127] During the process of querying the data based on the compilation results of the data query language, in response to the end of the data query for at least one data unit, the average execution cost of each predicate in the compilation results of the data query language is obtained.
[0128] The first switching module is specifically used for:
[0129] If the average execution cost is less than the cost threshold, in response to the end of the data query for the current data unit, the first query time of the current data unit is obtained, and the order of the first predicate and the second predicate in the compilation result of the data query language is swapped.
[0130] In one embodiment of this specification, the apparatus further includes a rearrangement module for:
[0131] If the average execution cost of each predicate in the compilation result of the data query language is greater than or equal to the cost threshold, statistical information of the corresponding predicate is obtained during the execution of each predicate in the compilation result of the data query language, wherein the statistical information includes selectivity and execution cost;
[0132] Based on the statistical information of each predicate, the order of predicates in the compilation results of the data query language is rearranged.
[0133] In one embodiment of this specification, when the reordering module reorders the predicate order of the compilation result of the data query language based on the statistics of each predicate, it is specifically used for:
[0134] The weight of each predicate is determined based on the statistical information of each predicate;
[0135] Arrange each predicate in descending order of weight.
[0136] In one embodiment of this specification, the statistical information of the predicate also includes the amount of data targeted during the execution of the predicate;
[0137] The reordering module is used to reorder the predicate order of the compilation result of the data query language based on the statistical information of each predicate. Specifically, it is used for:
[0138] Based on the amount of data executed by each predicate as described in its statistical information, all predicates are filtered.
[0139] Based on the statistical information of each predicate in the filtering results, the order of predicates in the compilation results of the data query language is rearranged.
[0140] In one embodiment of this specification, when the rearrangement module rearranges the predicate order of the compiled result of the data query language based on the statistical information of each predicate in the filtering result, it is specifically used for:
[0141] If the number of predicates in the filtering results is greater than or equal to the number threshold, the order of predicates in the compilation results of the data query language is rearranged according to the statistical information of each predicate in the filtering results.
[0142] In one embodiment of this specification, the device further includes a third switching module for:
[0143] If the number of predicates in the filtering results is less than the number threshold, in response to the end of the data query for the current data unit, the first query time of the current data unit is obtained, and the order of the first and second predicates in the compilation result of the data query language is swapped.
[0144] In one embodiment of this specification, the apparatus further includes a compilation module for:
[0145] Generate compilation results based on the data query language.
[0146] The systems, devices, modules, or units described in the above embodiments can be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer, which can take the form of a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email sending and receiving device, game console, tablet computer, wearable device, or any combination of these devices.
[0147] In a typical configuration, a computer includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.
[0148] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.
[0149] Computer-readable media, including both permanent and non-permanent, removable and non-removable media, can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, disk storage, quantum memory, graphene-based storage media or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.
[0150] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0151] The foregoing has described specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired result. In some embodiments, multitasking and parallel processing are possible or may be advantageous.
[0152] The terminology used in one or more embodiments of this specification is for the purpose of describing particular embodiments only and is not intended to limit the scope of one or more embodiments of this specification. The singular forms “a,” “described,” and “the” used in one or more embodiments of this specification and in the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used herein refers to and includes any or all possible combinations of one or more associated listed items.
[0153] The user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties. Furthermore, the collection, use and processing of the relevant data must comply with the relevant laws, regulations and standards of the relevant countries and regions, and corresponding operation entry points are provided for users to choose to authorize or refuse.
[0154] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties. Furthermore, the collection, use and processing of the relevant data must comply with the relevant laws, regulations and standards of the relevant countries and regions, and corresponding operation portals are provided for users to choose to authorize or refuse.
[0155] It should be understood that although the terms first, second, third, etc., may be used to describe various information in one or more embodiments of this specification, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, first information may also be referred to as second information without departing from the scope of one or more embodiments of this specification, and similarly, second information may also be referred to as first information. Depending on the context, the word "if" as used herein may be interpreted as "when," "in response to a determination," or "when," or "in the event of a determination."
[0156] The above description is merely a preferred embodiment of one or more embodiments of this specification and is not intended to limit the scope of one or more embodiments of this specification. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of one or more embodiments of this specification should be included within the protection scope of one or more embodiments of this specification.
Claims
1. A data query method, wherein, The queried data is divided into multiple data units, and each data unit is queried using the same data query language; the method includes: During the data query process of the queried data based on the compilation result of the data query language, in response to the end of the data query for the current data unit, the first query time of the current data unit is obtained, and according to the swapping probability value of each predicate combination in the compilation result of the data query language, the order of the first predicate and the second predicate in any predicate combination with the highest swapping probability value is swapped in the compilation result of the data query language, wherein the initial values of the swapping probability values of each predicate combination in the compilation result of the data query language are equal; In response to the completion of the data query for the next data unit, obtain the second query time for the next data unit; In response to the first query time being greater than or equal to the second query time, the order of the first predicate and the second predicate is preserved in the compilation result of the data query language; In response to the first query time being less than the second query time, the order of the first predicate and the second predicate is restored in the compilation result of the data query language, and the swapping probability value of the predicate combination formed by the maintained first predicate and the second predicate is reduced.
2. The data query method according to claim 1, further comprising: In response to the first query time being greater than or equal to the second query time, the order of the third and fourth predicates in the compilation result of the data query language is swapped.
3. The data query method according to claim 1, further comprising: During the process of querying the data based on the compilation results of the data query language, in response to the end of the data query for at least one data unit, the average execution cost of each predicate in the compilation results of the data query language is obtained. The response to the completion of a data query for the current data unit, obtaining the first query time of the current data unit, and swapping the order of the first and second predicates in the compilation result of the data query language, includes: If the average execution cost is less than the cost threshold, in response to the end of the data query for the current data unit, the first query time of the current data unit is obtained, and the order of the first predicate and the second predicate in the compilation result of the data query language is swapped.
4. The data query method according to claim 3, further comprising: If the average execution cost of each predicate in the compilation result of the data query language is greater than or equal to the cost threshold, statistical information of the corresponding predicate is obtained during the execution of each predicate in the compilation result of the data query language, wherein the statistical information includes selectivity and execution cost; Based on the statistical information of each predicate, the order of predicates in the compilation results of the data query language is rearranged.
5. The data query method according to claim 4, wherein rearranging the predicate order of the compilation result of the data query language based on the statistics of each predicate includes: The weight of each predicate is determined based on the statistical information of each predicate; Arrange each predicate in descending order of weight.
6. The data query method according to claim 4, wherein the statistical information of the predicate further includes the amount of data targeted during the execution of the predicate; The step of rearranging the predicate order of the compiled result of the data query language based on the statistical information of each predicate includes: Based on the amount of data executed by each predicate as described in its statistical information, all predicates are filtered. Based on the statistical information of each predicate in the filtering results, the order of predicates in the compilation results of the data query language is rearranged.
7. The data query method according to claim 6, wherein rearranging the predicate order of the compiled result of the data query language based on the statistical information of each predicate in the filtering result includes: If the number of predicates in the filtering results is greater than or equal to the number threshold, the order of predicates in the compilation results of the data query language is rearranged according to the statistical information of each predicate in the filtering results.
8. The data query method according to claim 7, further comprising: If the number of predicates in the filtering results is less than the number threshold, in response to the end of the data query for the current data unit, the first query time of the current data unit is obtained, and the order of the first and second predicates in the compilation result of the data query language is swapped.
9. The data query method according to claim 7, further comprising: Generate compilation results based on the data query language.
10. A data query device, wherein, The data to be queried is divided into multiple data units, and each data unit is queried using the same data query language; the device includes: The first substitution module is used to, during the process of querying the queried data according to the compilation result of the data query language, in response to the end of the data query for the current data unit, obtain the first query time of the current data unit, and according to the substitution probability value of each predicate combination in the compilation result of the data query language, substituted the order of the first predicate and the second predicate in any predicate combination with the highest substitution probability value in the compilation result of the data query language, wherein the initial values of the substitution probability values of each predicate combination in the compilation result of the data query language are equal; The acquisition module is used to acquire the second query time of the next data unit in response to the completion of the data query for the next data unit; A retention module is configured to retain the order of the first predicate and the second predicate in the compilation result of the data query language in response to the first query time being greater than or equal to the second query time; The recovery module is used to, in response to the first query time being less than the second query time, restore the order of the first predicate and the second predicate in the compilation result of the data query language, and reduce the swapping probability value of the predicate combination formed by the maintained first predicate and the second predicate.
11. An electronic device, comprising: processor; Memory used to store processor-executable instructions; The processor implements the method as described in any one of claims 1-9 by executing the executable instructions.
12. A computer-readable storage medium having stored thereon computer instructions that, when executed by a processor, implement the steps of the method as claimed in any one of claims 1-9.