Database hash connection processing method, apparatus, device, and storage medium
By using a combination of Bloom filters and hash tables in the database, and adaptively selecting the query channel, the problem of low data query efficiency when using hash joins in hardware-accelerated databases is solved, and more efficient data query is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- GUANGDONG INST OF ARTIFICIAL INTELLIGENCE & ADVANCED COMPUTING
- Filing Date
- 2022-11-30
- Publication Date
- 2026-06-12
AI Technical Summary
In the hash join process of hardware-accelerated databases, frequent data readings from external storage lead to poor hardware acceleration and affect data query efficiency.
By combining Bloom filters and hash tables, a Bloom filter is constructed to filter out irrelevant outer rows by adaptively selecting query channels, thereby reducing the amount of data for hash matching and improving query efficiency.
Without increasing computing resource consumption, filtering irrelevant external rows significantly improves the hardware acceleration effect and data query efficiency of the database.
Smart Images

Figure CN116226242B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of database technology, and in particular to a database hash join processing method, apparatus, device, and storage medium. Background Technology
[0002] In hardware-accelerated database query processes, before performing a hash join, the key values of the internal database tables are hashed and distributed in a hash table with limited space, which is then stored on the external storage of the hardware device. During the hash join, the system iterates through all rows in the external database table, performing hash comparisons and key-value comparisons to match the desired data. Therefore, during data retrieval, the hardware device frequently needs to read data from external storage. Reading large, irregularly sized, and non-contiguous memory blocks consumes significant clock cycles, severely impacting the hardware acceleration effect and resulting in low data query efficiency. Summary of the Invention
[0003] This invention provides a database hash join processing method, apparatus, device, and storage medium to solve the defect in the prior art where hardware devices need to frequently read data from external memory when performing hash joins, which seriously affects the hardware acceleration effect and leads to low data query efficiency.
[0004] This invention provides a database hash join processing method, wherein the database has a first query channel and a second query channel, and the first query channel has a Bloom filter table; the method includes:
[0005] Obtain the query statement from the database, as well as the query plan and database table corresponding to the query statement;
[0006] The target query channel corresponding to the query statement is determined based on the query plan;
[0007] If the target query channel is the first query channel, then a Bloom filter is constructed based on the database external table and the Bloom filter table;
[0008] The hash join corresponding to the query plan is executed based on the Bloom filter.
[0009] According to a database hash join processing method provided by the present invention, the step of constructing a Bloom filter based on the database external table and the Bloom filter table includes:
[0010] Obtain the target join key of any row in the database table, and perform a hash operation on the target join key to obtain a hash value of a preset number of bits;
[0011] The hash value is segmented to obtain multiple hash values corresponding to the hash value;
[0012] A Bloom filter is constructed based on the database appearance and the Bloom filter table.
[0013] According to a database hash connection processing method provided by the present invention, the plurality of hash values includes at least a first hash value, a second hash value, and a third hash value. The step of segmenting the hash value to obtain the plurality of hash values corresponding to the hash value includes:
[0014] The hash value is segmented according to the number of bits in the hash value, dividing the hash value into high-order bits and low-order bits;
[0015] The first hash value is determined based on the high-order bit value, and the second hash value is determined based on the low-order bit value;
[0016] The high-order and low-order values are combined according to a preset rule to obtain a third hash value; the third hash value, the first hash value, and the second hash value all have the same number of bits.
[0017] According to a database hash join processing method provided by the present invention, the step of executing the hash join corresponding to the query plan based on the Bloom filter includes:
[0018] Map any target hash value of the Bloom filter to the target array of the Bloom filter table, and obtain the mapping feature value of the mapping bit corresponding to the target hash value in the target array;
[0019] Wherein, the target hash value is any hash value among the multiple hash values of the Bloom filter; the target array is the array corresponding to the target hash value; the mapping feature value is used to determine whether the mapping bit corresponding to the target hash value in the target array is set;
[0020] The database internal tables of the database are filtered according to the mapping feature values, and the hash join corresponding to the query plan is executed based on the filtered database internal tables.
[0021] According to a database hash join processing method provided by the present invention, the step of determining the target query channel corresponding to the query statement based on the query plan includes:
[0022] Obtain the number of outer tuples of the database outer table and the number of inner table tuples of the database inner table;
[0023] Based on the number of inner table tuples and the number of outer table tuples, calculate the estimated cost required to construct the Bloom filter, and the estimated revenue based on the Bloom filter;
[0024] Based on the estimated revenue and the estimated cost, the target query channel corresponding to the query statement is determined.
[0025] According to a database hash join processing method provided by the present invention, the step of determining the target query channel corresponding to the query statement based on the estimated revenue and the estimated cost includes:
[0026] Calculate the ratio of the estimated cost to the estimated revenue;
[0027] The target query channel corresponding to the query statement is determined based on the ratio.
[0028] According to a database hash connection processing method provided by the present invention, the database includes hash nodes and hash connection nodes, and the method further includes:
[0029] Obtain the database internal table of the database, call the preset hash function in the hash node, perform hash operation on the join key of any target row of the database internal table, and obtain the target hash value corresponding to the join key of the target row;
[0030] The target hash value is segmented to obtain multiple hash values corresponding to the target hash value;
[0031] Hash mapping is performed on the multiple hash values to construct the hash table corresponding to the table in the database;
[0032] The multiple hash values are mapped to different arrays of preset Bloom filters to construct the Bloom filter table corresponding to the database table.
[0033] The hash table and the Bloom filter table are saved to the hash join point. The second query channel is generated based on the hash table, and the first query channel is generated based on the Bloom filter table. The hash join point is used to execute the hash join process corresponding to the query plan.
[0034] The present invention also provides a database hash connection processing apparatus, comprising:
[0035] The data acquisition module is used to acquire the query statement of the database, as well as the query plan and database table corresponding to the query statement;
[0036] The channel selection module is used to determine the target query channel corresponding to the query statement based on the query plan;
[0037] A filter building module is used to build a Bloom filter based on the database external table and a preset Bloom filter table if the target query channel is the first query channel.
[0038] The hash join processing module is used to execute the hash join corresponding to the query plan based on the Bloom filter.
[0039] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the database hash connection processing method as described above.
[0040] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the database hash connection processing method as described above.
[0041] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the database hash connection processing method as described above.
[0042] The database hash join processing method, apparatus, device, and storage medium provided by this invention obtain a database query statement, a corresponding query plan, and a database table; determine the target query channel for the query statement based on the query plan; if the target query channel is a first query channel with a Bloom filter table, construct a Bloom filter based on the database table and the Bloom filter table; and execute the hash join corresponding to the query plan based on the Bloom filter. By adaptively adding Bloom filters to execute hash joins for data querying, irrelevant table rows can be filtered out during hash join execution without affecting the use of computing resources, reducing the amount of data for hash matching, improving the hardware acceleration effect of the database, and thus improving data query efficiency. Attached Figure Description
[0043] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0044] Figure 1 This is a flowchart illustrating the database hash connection processing method provided by the present invention;
[0045] Figure 2 This is one of the data query process diagrams provided by the present invention;
[0046] Figure 3 This is the second schematic diagram of the data query process provided by the present invention;
[0047] Figure 4This is a schematic diagram of the structure of the database hash connection processing device provided by the present invention;
[0048] Figure 5 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation
[0049] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0050] The following is combined with Figures 1-5 The present invention describes a database hash connection processing method, apparatus, device, and storage medium.
[0051] The database hash join processing method provided by this invention is applied to databases, particularly hardware-accelerated relational databases. This database has a first query channel and a second query channel, wherein the first query channel contains a Bloom filter table. By adaptively adding Bloom filters to different query channels, the query method can be adjusted according to the amount of data to be queried, reducing query time and improving query efficiency. Specifically, refer to... Figure 1 , Figure 1 This is a flowchart illustrating the database hash join processing method provided in an embodiment of the present invention, based on... Figure 1 The database hash connection processing method provided in this embodiment of the invention includes:
[0052] Step 100: Obtain the query statement of the database, as well as the query plan and database table corresponding to the query statement;
[0053] When querying data in a database, the first step is to obtain the database query statement, its corresponding query plan, and the corresponding database table. Relational databases typically consist of internal tables and external tables, and both can contain multiple tables. Hash joins are commonly used for data queries in relational databases. Joins are a common operation between database tables, extracting tuples from multiple tables that have identical columns to form a new table. If two tables have a large number of tuples, traversing them one by one consumes significant resources. Hash joins are a method to improve join efficiency. Before performing a hash join, the keys of the internal tables are hashed and stored in a hash table. During the hash join, the hash table is read by traversing all rows of the external table using their keys, and the hash values are compared with the keys to obtain the query results. Based on the obtained database query statement, the data to be queried is determined, and the corresponding database table and query plan are obtained based on this data.
[0054] Step 200: Determine the target query channel corresponding to the query statement based on the query plan;
[0055] The target query channel for a query statement is determined based on the query plan. Specifically, this can involve determining the amount of data in the table to be queried based on the query plan, and / or the computational resources required to execute the query plan. Different query channels correspond to different query methods, i.e., different hash join methods, resulting in different amounts of data that can be queried within the same time frame, and / or different computational resources required to query the same amount of data.
[0056] Step 300: If the target query channel is the first query channel, then construct a Bloom filter based on the database table and the Bloom filter table;
[0057] Step 400: Execute the hash join corresponding to the query plan based on the Bloom filter.
[0058] If the target query channel corresponding to the query statement is the first query channel with a Bloom filter table, then a Bloom filter is constructed based on the database table and the Bloom filter table in the first query channel. The hash join corresponding to the query plan is executed based on the constructed Bloom filter to obtain the target data corresponding to the query statement and obtain the query result.
[0059] Bloom filters can be used to check if an element exists in a set, offering relatively high space and query efficiency. The data query process based on a Bloom filter mainly includes a construction phase and a probing phase. The construction phase involves building the Bloom filter table to facilitate probing during subsequent queries. In this phase, a hash function maps an element to a bit in a bit array. The probing phase checks whether the corresponding mapped bit in the bit array is set to determine if the element exists.
[0060] Based on the characteristics of Bloom filters, their high search and filtering efficiency and space efficiency can be leveraged to filter out most irrelevant outer rows in hash joins during data queries. This reduces the number of hash table traversals and reads, thus speeding up the process and improving query efficiency. However, building Bloom filters also consumes computational resources, impacting query efficiency. When the amount of data to be queried is large, i.e., the corresponding internal and external table sizes are large, the number of irrelevant outer rows that can be filtered will decrease accordingly. Consequently, the effect of building Bloom filters on improving data query efficiency will be less significant. By adaptively adding Bloom filters, the optimal query channel can be selected adaptively based on the query statement, thereby improving query efficiency.
[0061] In this embodiment, the query statement from the database, along with the corresponding query plan and database table, are obtained. The target query channel for the query statement is determined based on the query plan. If the target query channel is a first query channel with a Bloom filter table, a Bloom filter is constructed based on the database table and the Bloom filter table. A hash join corresponding to the query plan is executed based on the Bloom filter to obtain the query results corresponding to the query statement. By adaptively adding a Bloom filter and executing a hash join for data querying, irrelevant table rows can be filtered out during hash join execution without affecting the use of computing resources, reducing the amount of data for hash matching, improving the hardware acceleration effect of the database, and thus improving data query efficiency.
[0062] In a preferred embodiment, before data querying, it is first necessary to construct a hash table and a Bloom filter table for the database. In this embodiment, the database includes hash nodes and hash join nodes. The hash nodes are used to construct the hash table, and the hash join nodes are used to execute the hash join process corresponding to the external query plan based on the hash table returned by the hash nodes. Specifically, the construction phase mainly includes the following steps:
[0063] Step 001: Obtain the database internal table of the database, call the preset hash function in the hash node, perform hash operation on the join key of any target row of the database internal table, and obtain the target hash value corresponding to the join key of the target row;
[0064] Step 002: The target hash value is segmented to obtain multiple hash values corresponding to the target hash value;
[0065] Step 003: Perform hash mapping on the multiple hash values to construct the hash table corresponding to the internal table of the database;
[0066] Step 004: Map the multiple hash values to different arrays of preset Bloom filters to construct the Bloom filter table corresponding to the database table.
[0067] Step 005: Save the hash table and the Bloom filter table to the hash join point; generate the second query channel based on the hash table; generate the first query channel based on the Bloom filter table; the hash join point is used to execute the hash join process corresponding to the query plan.
[0068] During the construction phase, it is necessary to build a hash table and a Bloom filter table for the database, and then generate query channels based on the hash table and the Bloom filter table respectively, forming the first query channel and the second query channel. For example... Figure 2 The diagram illustrates the database hash join processing flow. In the database, the first query channel has a Bloom filter table and a hash table. When querying based on this channel, a Bloom filter needs to be added when performing a hash join. This allows the Bloom filter table to filter out irrelevant rows in the hash table, reducing the amount of data that needs to be hash-matched during the hash join. The second query channel has a hash table and can be used to directly perform hash joins. Based on the query plan corresponding to the query statement, the corresponding query channel can be determined, and the hash join can be processed using the query method corresponding to the query channel to complete the data query.
[0069] Preferably, when constructing the hash table and Bloom filter table, the following steps are taken: First, the internal tables of the database are retrieved. A preset hash function in the hash node is called to perform a hash operation on the join key of any target row in the internal tables, obtaining the target hash value corresponding to the join key. This target hash value is then segmented to obtain multiple hash values corresponding to the target hash value. All or part of the multiple hash values corresponding to the same join key are hash-mapped to construct the hash table. The multiple hash values corresponding to the same join key are then hash-mapped to different arrays of a preset Bloom filter to construct the Bloom filter table. The same hash operation and hash mapping operation are performed on each row of all internal tables to construct the hash table and Bloom filter table corresponding to the internal tables. The constructed hash table and Bloom filter table are then saved to the hash join point. A second query channel is generated based on the hash table, and a first query channel is generated based on the Bloom filter table.
[0070] Furthermore, considering that the database also needs to perform hash operations when constructing the hash table, in this embodiment, the database's hash operation method can be used directly when constructing the Bloom filter, without adding additional hash operations, thereby reducing the time and computing resource consumption caused by additional operations.
[0071] Preferably, in step 300, when constructing a Bloom filter based on the database appearance, the following steps are included:
[0072] Step 301: Obtain the target join key of any row in the database table, and perform a hash operation on the target join key to obtain a hash value of a preset number of bits;
[0073] Step 302: The hash value is segmented to obtain multiple hash values corresponding to the hash value;
[0074] Step 303: Construct a Bloom filter based on the database table and the Bloom filter table.
[0075] Specifically, the target join key of any row in the database table is obtained, and a hash operation is performed on the target join key to obtain a hash value of a preset number of bits. The obtained hash value is then segmented to obtain multiple hash values corresponding to the hash value. A Bloom filter is constructed based on these multiple hash values. It should be noted that the hash function used when performing the hash operation during the purchase of the Bloom filter is the same as the hash function used in the construction phase. The hash value returned by this hash function can be 64 bits or 32 bits; no specific limitation is made here. For ease of description, the following explanation uses a hash function that returns a 64-bit hash value as an example.
[0076] The hash value obtained by segmenting the hash value includes at least a first hash value, a second hash value, and a third hash value, wherein the first hash value, the second hash value, and the third hash value have the same number of bits. Step 302, segmenting the hash value to obtain multiple hash values corresponding to that hash value, specifically includes:
[0077] Step 3021: Divide the hash value into segments based on the number of bits in the hash value, and divide the hash value into high-order bits and low-order bits;
[0078] Step 3022: Determine the first hash value based on the high-order bit value, and determine the second hash value based on the low-order bit value;
[0079] Step 3023: Perform a combination operation on the high-order value and the low-order value according to a preset rule to obtain a third hash value; the third hash value, the first hash value, and the second hash value all have the same number of bits.
[0080] Specifically, based on the 64-bit hash value returned by the hash function, it is segmented into high-order bits and low-order bits. The first hash value and the second hash value are determined based on the high-order bits and low-order bits obtained from the segmentation, respectively. The segmentation method includes equal partitioning, that is, dividing the 64-bit hash value into high-order 32 bits and low-order 32 bits. The high-order 32 bits are used as the first hash value and the low-order 32 bits are used as the second hash value. Then, the high-order 32 bits and the low-order 32 bits are combined according to a preset rule to obtain a new 32-bit value as the third hash value.
[0081] In this embodiment, when constructing the Bloom filter and Bloom filter table, the hash value calculated using the database's original hash function is used, eliminating the need for additional hash operations, thus avoiding extra time and computational resource consumption and ensuring query efficiency.
[0082] In a preferred embodiment, when the query channel corresponding to the query statement is a first query channel equipped with a Bloom filter table, a hash join is performed based on the constructed Bloom filter and the Bloom filter table to obtain the target data queried by the query statement; when the query channel corresponding to the query statement is a second query channel equipped with a hash table, a hash join is performed based on the constructed hash table to obtain the target data queried by the query statement. Step 400, executing the query plan corresponding to the query statement based on the constructed Bloom filter, specifically includes:
[0083] Step 401: Map any target hash value of the Bloom filter to the target array of the Bloom filter table, and obtain the mapping feature value of the mapping bit corresponding to the target hash value in the target array;
[0084] Wherein, the target hash value is any hash value among the multiple hash values of the Bloom filter; the target array is the array corresponding to the target hash value; the mapping feature value is used to determine whether the mapping bit corresponding to the target hash value in the target array is set;
[0085] Step 402: Filter the database internal tables of the database according to the mapping feature value, and execute the hash join corresponding to the query plan based on the filtered database internal tables.
[0086] Specifically, refer to Figure 3The diagram illustrates another database hash join processing flow. When the query channel corresponding to the query statement is the first query channel with a Bloom filter, based on the constructed Bloom filter, any target hash value in the Bloom filter is hashed and mapped to the target array in the Bloom filter table. The mapping feature value of the mapping bit corresponding to the target hash value in the target array is then obtained. Here, the target hash value is any one of the multiple hash values in the Bloom filter, and the target array is the array corresponding to the target hash value in the Bloom filter table. The mapping feature value is used to determine whether the mapping bit corresponding to the target hash value in the target array is set. Taking a mapping feature value of 0 or 1 as an example, after hash value mapping, if the mapping feature value of the mapping bit corresponding to the array is 1, then the mapping bit corresponding to the hash value in that array is set; otherwise, the mapping bit corresponding to the hash value in that array is not set.
[0087] The above method is used to hash and map multiple hash values in the Bloom filter to different arrays in the Bloom filter table. Specifically, the first hash value is hashed and mapped to array 1, the second hash value is hashed and mapped to array 2, and the third hash value is hashed and mapped to array 3. Taking the Bloom filter constructed with the first, second, and third hash values as an example, the internal tables of the database are filtered according to the mapping feature values corresponding to the three hash values of the Bloom filter. That is, the hash table is filtered to filter out the data in the hash table that does not need to be hashed and matched. Then, the hash join corresponding to the query plan is executed based on the filtered internal table to obtain the query results corresponding to the query statement. Specifically, during the probing phase of data querying, a hash value is generated based on the join key of any row in the outer table of the database. The hash value is then segmented to obtain three hash values. These three hash values are then hash-mapped to obtain the mapping feature values of the corresponding arrays in the Bloom filter table. This determines whether the mapping bits corresponding to the three hash values in the three arrays are set, thereby determining whether the join key of the outer table row exists in the inner table. In this way, irrelevant outer table rows can be filtered out, reducing the amount of data that needs to be hash-matched between the outer and inner tables during data querying, thus improving data query efficiency.
[0088] Furthermore, in this embodiment, the probing phase specifically involves probing whether the hash value of the join key for each row in the external database table is set in the Bloom filter table. Only when all hash values corresponding to the join key of any row in the external database table are set is the join key of the external row considered to exist in the internal table. Otherwise, if at least one hash value is not set, the join key of the corresponding external database row is considered not to exist in the internal table. During data querying, hash matching between the external row and the internal table can be avoided, reducing the amount of data involved in hash matching.
[0089] As we know, a Bloom filter can be used to check if an element exists in a set. Typically, it maps an element to a bit in a bit array (i.e., a hash function), and the probing phase determines the element's existence by checking if that bit is set. Suppose that all elements x (x∈S) in set S are mapped to the points mapped to elements y, but... During the detection phase, if the mapping point corresponding to element y is set, it is assumed that the element in set S also exists, thus causing a "false positive" event. The higher the probability of a false positive event, the worse the filtering effect. In this embodiment, multiple hash values are used to solve this problem. Specifically, multiple different hash values corresponding to the same element are mapped to mapping points of multiple bit arrays, or mapped to different mapping points of the same bit array. In this way, as long as there is a mapping point that is not set, it proves that the element is not in the set, thereby maintaining a low false positive probability and improving the table filtering effect.
[0090] Furthermore, the Bloom filter table employs a multi-array structure. During the probing phase, the hardware kernel can access different array structures in parallel through different ports, improving throughput. Moreover, the probing phase executes at the hash join node, requiring all rows of the outer table, the hash table returned during the construction phase, and the Bloom filter table as parameters. Similar to the construction phase, during hash matching, only one hash operation is performed with the join key of each row of the outer table as the parameter. Then, the join keys of each outer table row existing in the inner table are probed. Irrelevant outer table rows can be filtered during hash matching. This entire process not only saves computational resources but is also suitable for pipelined computation.
[0091] Preferably, when executing a query in the database, based on the metadata of the database's internal and external interfaces, the query plan, etc., a certain rule is used to determine whether a Bloom filter needs to be added to the hash join, thereby determining the corresponding query channel. Step 200, determining the target query channel corresponding to the query statement based on the query plan, includes:
[0092] Step 201: Obtain the number of external tuples in the database external table and the number of internal tuples in the database internal table;
[0093] Step 202: Calculate the estimated cost required to construct the Bloom filter and the estimated revenue based on the number of inner table tuples and the number of outer table tuples.
[0094] Step 203: Determine the target query channel corresponding to the query statement based on the estimated revenue and the estimated cost.
[0095] Specifically, first, the number of internal and external tuples in the database is obtained, i.e., the number of internal table tuples *n* and the number of external table tuples *m*. Based on the number of internal table tuples *n* and the number of external table tuples *m*, the estimated cost required to construct the Bloom filter is calculated: p = cost(n), and the estimated benefit of the hash joins based on the constructed Bloom filter is calculated: g = cost'(k*m), where K is the estimated filtering ratio, i.e., the proportion of irrelevant external table rows filtered out in the database external table. The estimated cost includes the required time and computing resources, and the estimated benefit includes the saved time and computing resources.
[0096] It's understandable that building a Bloom filter requires time and computational resources, i.e., the estimated cost. However, by filtering outline data, it reduces the amount of data that needs to be matched during hash joins, thus reducing data query time and saving computational resources, resulting in a corresponding benefit, i.e., the estimated benefit. When the estimated benefit is less than the estimated cost, building a Bloom filter has no significant positive effect on improving data query efficiency. Therefore, it's necessary to determine whether to add a Bloom filter to the query statement based on the estimated benefit and estimated cost. Based on whether a Bloom filter is needed, the target query channel for the query statement should be determined, and a query channel with higher query efficiency can be selected.
[0097] Furthermore, step 203, determining the target query channel corresponding to the query statement based on the estimated revenue and estimated cost, also includes:
[0098] Step 2031: Calculate the ratio of the estimated cost to the estimated revenue;
[0099] Step 2032: Determine the target query channel corresponding to the query statement based on the ratio.
[0100] When determining the target query channel for a query statement based on estimated revenue and estimated cost, the ratio between estimated cost and estimated revenue is calculated, and this ratio determines the target query channel. The ratio of estimated cost to estimated revenue characterizes the rate of return of building a Bloom filter. When this rate of return is greater than a set baseline, a Bloom filter process is added to execute the hash join, thus making the first query channel with the Bloom filter table the target query channel for the query statement. When both the internal and external tables of the database are very large, it can affect the filtering ratio of the Bloom filter, resulting in a lower estimated revenue. In this case, the second query channel with the hash table is chosen as the target query channel for the query statement, and a hash join is selected to execute the query plan corresponding to the query statement.
[0101] By calculating the estimated cost and estimated benefit required to construct a Bloom filter, the query channel with the highest query efficiency can be adaptively selected for data querying based on the calculated estimated cost and estimated benefit, which helps to ensure data query efficiency.
[0102] The database hash connection processing apparatus provided by the present invention is described below. The database hash connection processing apparatus described below can be referred to in correspondence with the database hash connection processing method described above.
[0103] Reference Figure 4 The database hash connection processing apparatus provided in this embodiment of the invention includes:
[0104] The data acquisition module 10 is used to acquire the query statement of the database, as well as the query plan and database table corresponding to the query statement;
[0105] Channel selection module 20 is used to determine the target query channel corresponding to the query statement according to the query plan;
[0106] The filter construction module 30 is used to construct a Bloom filter based on the database external table and a preset Bloom filter table if the target query channel is the first query channel.
[0107] The hash join processing module 40 is used to perform hash joins corresponding to the query plan based on the Bloom filter.
[0108] In one embodiment, the filter construction module 30 is further configured to:
[0109] Obtain the target join key of any row in the database table, and perform a hash operation on the target join key to obtain a hash value of a preset number of bits;
[0110] The hash value is segmented to obtain multiple hash values corresponding to the hash value;
[0111] A Bloom filter is constructed based on the database appearance and the Bloom filter table.
[0112] In one embodiment, the plurality of hash values includes at least a first hash value, a second hash value, and a third hash value; the filter construction module 30 is further configured to:
[0113] The hash value is segmented according to the number of bits in the hash value, dividing the hash value into high-order bits and low-order bits;
[0114] The first hash value is determined based on the high-order bit value, and the second hash value is determined based on the low-order bit value;
[0115] The high-order and low-order values are combined according to a preset rule to obtain a third hash value; the third hash value, the first hash value, and the second hash value all have the same number of bits.
[0116] In one embodiment, the hash connection processing module 40 is further configured to:
[0117] Map any target hash value of the Bloom filter to the target array of the Bloom filter table, and obtain the mapping feature value of the mapping bit corresponding to the target hash value in the target array;
[0118] Wherein, the target hash value is any hash value among the multiple hash values of the Bloom filter; the target array is the array corresponding to the target hash value; the mapping feature value is used to determine whether the mapping bit corresponding to the target hash value in the target array is set;
[0119] The database internal tables of the database are filtered according to the mapping feature values, and the hash join corresponding to the query plan is executed based on the filtered database internal tables.
[0120] In one embodiment, the channel selection module 20 is further configured to:
[0121] Obtain the number of outer tuples of the database outer table and the number of inner table tuples of the database inner table;
[0122] Based on the number of inner table tuples and the number of outer table tuples, calculate the estimated cost required to construct the Bloom filter, and the estimated revenue based on the Bloom filter;
[0123] Based on the estimated revenue and the estimated cost, the target query channel corresponding to the query statement is determined.
[0124] In one embodiment, the channel selection module 20 is further configured to:
[0125] Calculate the ratio of the estimated cost to the estimated revenue;
[0126] The target query channel corresponding to the query statement is determined based on the ratio.
[0127] In one embodiment, the database hash connection processing apparatus further includes a channel building module for:
[0128] Obtain the database internal table of the database, call the preset hash function in the hash node, perform hash operation on the join key of any target row of the database internal table, and obtain the target hash value corresponding to the join key of the target row;
[0129] The target hash value is segmented to obtain multiple hash values corresponding to the target hash value;
[0130] Hash mapping is performed on the multiple hash values to construct the hash table corresponding to the table in the database;
[0131] The multiple hash values are mapped to different arrays of preset Bloom filters to construct the Bloom filter table corresponding to the database table.
[0132] The hash table and the Bloom filter table are saved to the hash join point. The second query channel is generated based on the hash table, and the first query channel is generated based on the Bloom filter table. The hash join point is used to execute the hash join process corresponding to the query plan.
[0133] Figure 5 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 5 As shown, the electronic device may include: a processor 510, a communication interface 520, a memory 530, and a communication bus 540, wherein the processor 510, the communication interface 520, and the memory 530 communicate with each other through the communication bus 540. The processor 510 can call logical instructions in the memory 530 to execute a database hash join processing method, which includes:
[0134] Obtain the query statement from the database, as well as the query plan and database table corresponding to the query statement;
[0135] The target query channel corresponding to the query statement is determined based on the query plan;
[0136] If the target query channel is the first query channel, then a Bloom filter is constructed based on the database external table and the Bloom filter table;
[0137] The hash join corresponding to the query plan is executed based on the Bloom filter.
[0138] Furthermore, the logical instructions in the aforementioned memory 530 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, essentially, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0139] On the other hand, the present invention also provides a computer program product, the computer program product comprising a computer program that can be stored on a non-transitory computer-readable storage medium, wherein when the computer program is executed by a processor, the computer is capable of executing the database hash connection processing method provided by the above methods, the method comprising:
[0140] Obtain the query statement from the database, as well as the query plan and database table corresponding to the query statement;
[0141] The target query channel corresponding to the query statement is determined based on the query plan;
[0142] If the target query channel is the first query channel, then a Bloom filter is constructed based on the database external table and the Bloom filter table;
[0143] The hash join corresponding to the query plan is executed based on the Bloom filter.
[0144] In another aspect, the present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, is implemented to perform the database hash connection processing method provided by the methods described above, the method comprising:
[0145] Obtain the query statement from the database, as well as the query plan and database table corresponding to the query statement;
[0146] The target query channel corresponding to the query statement is determined based on the query plan;
[0147] If the target query channel is the first query channel, then a Bloom filter is constructed based on the database external table and the Bloom filter table;
[0148] The hash join corresponding to the query plan is executed based on the Bloom filter.
[0149] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0150] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0151] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A database hash join processing method, characterized in that, The database has a first query channel and a second query channel. The first query channel has a Bloom filter table and a hash table, and the second query channel has the hash table. The method includes: Obtain the query statement from the database, as well as the query plan and database table corresponding to the query statement; The target query channel corresponding to the query statement is determined based on the query plan; If the target query channel is the first query channel, then a Bloom filter is constructed based on the database table and the Bloom filter table, including: obtaining the target join key of any row in the database table, and performing a hash operation on the target join key to obtain a hash value of a preset number of bits; segmenting the hash value to obtain multiple hash values corresponding to the hash value; and constructing a Bloom filter based on the multiple hash values and the Bloom filter table. The hash join corresponding to the query plan is executed based on the Bloom filter.
2. The database hash join processing method according to claim 1, characterized in that, The plurality of hash values includes at least a first hash value, a second hash value, and a third hash value; The step of segmenting the hash value to obtain multiple hash values corresponding to the hash value includes: The hash value is segmented according to the number of bits in the hash value, dividing the hash value into high-order bits and low-order bits; The first hash value is determined based on the high-order bit value, and the second hash value is determined based on the low-order bit value; The high-order and low-order values are combined according to a preset rule to obtain a third hash value; the third hash value, the first hash value, and the second hash value all have the same number of bits.
3. The database hash join processing method according to claim 1, characterized in that, The step of performing the hash join corresponding to the query plan based on the Bloom filter includes: Map any target hash value of the Bloom filter to the target array of the Bloom filter table, and obtain the mapping feature value of the mapping bit corresponding to the target hash value in the target array; Wherein, the target hash value is any hash value among the multiple hash values of the Bloom filter; the target array is the array corresponding to the target hash value; the mapping feature value is used to determine whether the mapping bit corresponding to the target hash value in the target array is set; The database internal tables of the database are filtered according to the mapping feature values, and the hash join corresponding to the query plan is executed based on the filtered database internal tables.
4. The database hash join processing method according to claim 1, characterized in that, The step of determining the target query channel corresponding to the query statement based on the query plan includes: Obtain the number of outer tuples of the database outer table and the number of inner table tuples of the database inner table; Based on the number of inner table tuples and the number of outer table tuples, calculate the estimated cost required to construct the Bloom filter, and the estimated revenue based on the Bloom filter; Based on the estimated revenue and the estimated cost, the target query channel corresponding to the query statement is determined.
5. The database hash join processing method according to claim 4, characterized in that, The step of determining the target query channel corresponding to the query statement based on the estimated revenue and the estimated cost includes: Calculate the ratio of the estimated cost to the estimated revenue; The target query channel corresponding to the query statement is determined based on the ratio.
6. The database hash join processing method according to claim 1, characterized in that, The database includes hash nodes and hash connection nodes, and the method further includes: Obtain the database internal table of the database, call the preset hash function in the hash node, perform hash operation on the join key of any target row of the database internal table, and obtain the target hash value corresponding to the join key of the target row; The target hash value is segmented to obtain multiple hash values corresponding to the target hash value; Hash mapping is performed on the multiple hash values to construct the hash table, and the hash table corresponds to the table in the database; The multiple hash values are mapped to different arrays of preset Bloom filters to construct the Bloom filter table, which corresponds to the internal table of the database. The hash table and the Bloom filter table are saved to the hash join point. The second query channel is generated based on the hash table, and the first query channel is generated based on the Bloom filter table. The hash join point is used to execute the hash join process corresponding to the query plan.
7. A database hash join processing device, characterized in that, The database has a first query channel and a second query channel. The first query channel has a Bloom filter table and a hash table, and the second query channel has the hash table. The device includes: The data acquisition module is used to acquire the query statement of the database, as well as the query plan and database table corresponding to the query statement; The channel selection module is used to determine the target query channel corresponding to the query statement based on the query plan; A filter construction module is used to construct a Bloom filter based on the database table and a preset Bloom filter table if the target query channel is the first query channel. The module includes: obtaining the target join key of any row in the database table, performing a hash operation on the target join key to obtain a hash value of a preset number of bits; segmenting the hash value to obtain multiple hash values corresponding to the hash value; and constructing a Bloom filter based on the multiple hash values and the Bloom filter table. The hash join processing module is used to execute the hash join corresponding to the query plan based on the Bloom filter.
8. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the database hash connection processing method as described in any one of claims 1 to 6.
9. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the database hash connection processing method as described in any one of claims 1 to 6.