Hash join hardware acceleration method, device and equipment for large data table and medium
By receiving connection operation type instructions and allocating memory space on the acceleration board, and using parallel processing technology to perform hash connection calculations, the problem of insufficient cache is solved, and more efficient hash connection acceleration is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANDONG INSPUR SCI RES INST CO LTD
- Filing Date
- 2023-03-27
- Publication Date
- 2026-06-23
Smart Images

Figure CN116361289B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of database heterogeneous acceleration, and particularly to hardware acceleration methods, apparatuses, devices, and media for hash joins of large data tables. Background Technology
[0002] With the growing belief that Moore's Law will eventually fail, hardware acceleration has been introduced into the field of database acceleration. However, when using FPGAs (Field Programmable Gate Arrays) or GPUs (Graphics Processing Units) for hardware acceleration of database query join operators, various caching issues arise, such as the inability of the caching modules within the FPGA or GPU to load the entire contents of a large data table simultaneously. Existing hardware acceleration for hash joins focuses on the hash calculation portion, and its acceleration effect still needs improvement.
[0003] In summary, improving the hardware acceleration performance of hash joins is a problem that needs to be solved in this field. Summary of the Invention
[0004] In view of this, the purpose of this invention is to provide a hardware acceleration method, apparatus, device, and medium for hash joins of large data tables, which can effectively improve the hardware acceleration effect of hash joins. The specific solution is as follows:
[0005] In a first aspect, this application discloses a hardware acceleration method for hash joins of large data tables, applied to an acceleration board, comprising:
[0006] Receive connection operation type instructions;
[0007] The first memory space of the first data table to be joined and the second memory space of the second data table to be joined are requested by using the join table size instruction, so as to store each first fragment of the first data table to be joined into the first memory space and each second fragment of the second data table to be joined into the second memory space.
[0008] Based on the connection operation type instruction and the preset number of parallel processing rows, hash connection calculations are performed on each of the first fragments in the first memory space and each of the second fragments in the second memory space to obtain the corresponding connection table row data.
[0009] Optionally, the step of performing hash join calculations on each first fragment in the first memory space and each second fragment in the second memory space according to the join operation type instruction and the preset number of parallel processing rows to obtain the corresponding join table row data includes:
[0010] If the join operation type instruction is an inner join calculation instruction, then the current first fragment of the first data table to be joined in the first memory space is read, and the current hash table of the current first fragment in the hash calculation construction stage is generated;
[0011] Based on a preset number of parallel processing rows, the row data of the current first partition in the current hash table is obtained, and the row data of the current first partition and the row data of each second partition in the second memory space that meets the first preset condition are used to generate the inner join table row data corresponding to the current first partition.
[0012] Read the next first shard of the first data table to be connected in the first memory space, update the next first shard to the current first shard, and jump back to the step of the current hash table in the hash calculation construction stage of generating the current first shard, until the inner join table row data corresponding to each first shard is obtained.
[0013] Optionally, generating inner join table row data corresponding to the current first shard using the row data of the current first shard and the row data of each second shard in the second memory space that satisfies the first preset condition includes:
[0014] Read the current second fragment of the second data table to be connected in the second memory space, and generate the current hash value of the current second fragment in the hash calculation probing phase;
[0015] Based on a preset number of parallel processing rows, the row data of the current second shard in the current hash value is obtained, and it is determined whether the row data of the current second shard meets the first preset condition. If it does, the row data of the current first shard and the row data of the current second shard are used to generate the inner join table row data corresponding to the current second shard.
[0016] Read the next second shard of the second data table to be joined in the second memory space, update the next second shard to the current second shard, and jump back to the step of generating the current hash value of the hash calculation probe phase of the current second shard until the inner join table row data corresponding to each second shard is obtained, so as to obtain the inner join table row data corresponding to the current first shard.
[0017] Optionally, the step of performing hash join calculations on each first fragment in the first memory space and each second fragment in the second memory space according to the join operation type instruction and the preset number of parallel processing rows to obtain the corresponding join table row data includes:
[0018] If the join operation type instruction is a right join calculation instruction, then the second spare memory corresponding to the second memory space is requested so that the data stored in each of the first fragments in the first memory space, each of the second fragments in the second memory space, and the second spare memory can be right joined based on the preset number of parallel processing rows to obtain the corresponding right join table row data.
[0019] Accordingly, the step of performing a right join calculation on the data stored in each of the first fragments in the first memory space, each of the second fragments in the second memory space, and the second spare memory based on a preset number of parallel processing rows to obtain the corresponding right join table row data includes:
[0020] Read the current first fragment of the first data table to be connected in the first memory space, and generate the current hash table in the hash calculation construction stage of the current first fragment;
[0021] Based on a preset number of parallel processing rows, the row data of the current first shard in the current hash table is obtained. The row data of the current first shard, the row data of each second shard in the second memory space that meets the second preset condition, and the row data of each second shard in the second spare memory that does not meet the second preset condition are used to generate the right join table row data corresponding to the current first shard.
[0022] Read the next first shard of the first data table to be joined in the first memory space, update the next first shard to the current first shard, and jump back to the step of the current hash table in the hash calculation construction stage of generating the current first shard, until the right join table row data corresponding to each first shard is obtained.
[0023] Optionally, the step of performing hash join calculations on each first fragment in the first memory space and each second fragment in the second memory space according to the join operation type instruction and the preset number of parallel processing rows to obtain the corresponding join table row data includes:
[0024] If the join operation type instruction is a left join calculation instruction, then the first spare memory corresponding to the first memory space is requested so as to perform left join calculation on each of the first fragments in the first memory space, each of the second fragments in the second memory space and the data stored in the first spare memory based on the preset number of parallel processing rows, so as to obtain the left join table row data.
[0025] Accordingly, the step of performing a left join calculation on the data stored in each of the first fragments in the first memory space, each of the second fragments in the second memory space, and the first spare memory based on a preset number of parallel processing rows to obtain the left join table row data includes:
[0026] Based on the format of the left-joined data rows, the row data of the first slice in the first memory space and the row data of the second slice in the second memory space are generated respectively.
[0027] Based on a preset number of parallel processing rows, hash join calculations are performed using the row data of the first segment in the first memory space that meets the third preset condition, the row data of the second segment in the second memory space, and the row data of the first segment in the first spare memory that does not meet the third preset condition, to obtain the row data of the left join table.
[0028] Optionally, the step of performing hash join calculations on each first fragment in the first memory space and each second fragment in the second memory space according to the join operation type instruction and the preset number of parallel processing rows to obtain the corresponding join table row data includes:
[0029] If the connection operation type instruction is a fully connected computation instruction, then request the first spare memory corresponding to the first memory space and the second spare memory corresponding to the second memory space;
[0030] Based on a preset number of parallel processing rows, the data stored in each of the first fragments in the first memory space, each of the second fragments in the second memory space, and the second spare memory are right-joined to obtain the corresponding right-join table row data.
[0031] Based on the preset number of parallel processing rows, left join calculations are performed on the data stored in each of the first fragments in the first memory space, each of the second fragments in the second memory space, and the first spare memory to obtain the corresponding left join table row data. The obtained left join table row data is then discarded to obtain the corresponding full join table row data.
[0032] Optionally, storing each first fragment of the first data table to be joined into the first memory space, and storing each second fragment of the second data table to be joined into the second memory space, includes:
[0033] The first data table to be joined and the second data table to be joined are sliced respectively to obtain each first slice of the first data table to be joined and each second slice of the second data table to be joined.
[0034] Each of the first fragments of the first data table to be connected is stored in the first memory space, and each of the second fragments of the second data table to be connected is stored in the second memory space.
[0035] Secondly, this application discloses a hardware acceleration device for hash joins of large data tables, applied to an acceleration board, comprising:
[0036] The instruction receiving module is used to receive connection operation type instructions;
[0037] The sharded storage module is used to apply for a first memory space for a first data table to be joined and a second memory space for a second data table to be joined using the join table size instruction, so as to store each first shard of the first data table to be joined into the first memory space and store each second shard of the second data table to be joined into the second memory space.
[0038] The hash join module is used to perform hash join calculations on each of the first fragments in the first memory space and each of the second fragments in the second memory space according to the join operation type instruction and the preset number of parallel processing rows, so as to obtain the corresponding join table row data.
[0039] Thirdly, this application discloses an electronic device, including:
[0040] Memory, used to store computer programs;
[0041] A processor for executing the computer program to implement the steps of the aforementioned disclosed hardware-accelerated method for hash joins of large data tables.
[0042] Fourthly, this application discloses a computer-readable storage medium for storing a computer program; wherein, when the computer program is executed by a processor, it implements the steps of the aforementioned disclosed hardware-accelerated method for hash join of large data tables.
[0043] The beneficial effects of this application are as follows: First, a join operation type instruction is received; a join table size instruction is used to allocate a first memory space for a first data table to be joined and a second memory space for a second data table to be joined, so as to store each first fragment of the first data table to be joined in the first memory space and each second fragment of the second data table to be joined in the second memory space; according to the join operation type instruction and a preset number of parallel processing rows, a hash join calculation is performed on each first fragment in the first memory space and each second fragment in the second memory space to obtain the corresponding join table row data. Therefore, this application is applied to an acceleration board. After storing each first fragment of the first data table to be connected into the first memory space and storing each second fragment of the second data table to be connected into the second memory space, the first fragment in the first memory space and the second fragment in the second memory space can be hashed together according to the join operation type instruction to obtain the corresponding join table row data. Since the acceleration board can process hash joins of multiple rows of data in parallel, multiple rows of data can be processed simultaneously when hashing the first fragment and the second fragment, thus speeding up the hash join calculation and effectively improving the hardware acceleration effect of hash join. Attached Figure Description
[0044] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.
[0045] Figure 1 This is a flowchart of a hardware acceleration device method for hash join of a large data table disclosed in this application;
[0046] Figure 2 This is a schematic diagram of a specific hardware acceleration structure disclosed in this application;
[0047] Figure 3 This application discloses a specific flowchart of inner link calculation.
[0048] Figure 4 This is a schematic diagram of a specific inner connection calculation disclosed in this application;
[0049] Figure 5 This application discloses a specific flowchart of right join calculation;
[0050] Figure 6 This is a specific schematic diagram of right connection calculation disclosed in this application;
[0051] Figure 7 This application discloses a specific flowchart for left join calculation;
[0052] Figure 8 This is a schematic diagram of a specific left connection calculation disclosed in this application;
[0053] Figure 9 This application discloses a specific fully connected computation flowchart;
[0054] Figure 10 This is a schematic diagram of a hardware acceleration device for hash join of a large data table disclosed in this application.
[0055] Figure 11 This is a structural diagram of an electronic device disclosed in this application. Detailed Implementation
[0056] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present invention.
[0057] With the growing belief that Moore's Law will eventually fail, hardware acceleration has been introduced into the field of database acceleration. However, when using FPGAs or GPUs for hardware acceleration of database query join operators, various caching issues arise, such as the inability of the FPGA or GPU's caching module to load the entire contents of a large data table simultaneously. Existing hardware acceleration for hash joins focuses on the hash calculation portion, and its acceleration effect still needs improvement.
[0058] Therefore, this application provides a hardware acceleration scheme for hash joins of large data tables, which can effectively improve the hardware acceleration effect of hash joins.
[0059] See Figure 1 As shown in the figure, this application discloses a hardware acceleration method for hash joins of large data tables, applied to an acceleration board, including:
[0060] Step S11: Receive connection operation type instruction.
[0061] In this embodiment, for example Figure 2The diagram illustrates a specific hardware acceleration structure. The FPGA hardware acceleration unit receives connection operation type instructions via its PCIe (Peripheral Component Interconnect Express) interface. These instructions specify the type of hash connection between the first and second data tables to be connected. Examples of hash connection types include inner join, full join, left join, and right join. The acceleration board is, for example, a field-programmable gate array (FPGA).
[0062] Step S12: Use the join table size instruction to request the first memory space of the first data table to be joined and the second memory space of the second data table to be joined, so as to store each first fragment of the first data table to be joined into the first memory space and store each second fragment of the second data table to be joined into the second memory space.
[0063] In this embodiment, storing each first fragment of the first data table to be connected to the first memory space and storing each second fragment of the second data table to be connected to the second memory space includes: performing slicing processing on the first data table to be connected and the second data table to be connected respectively to obtain each first fragment of the first data table to be connected and each second fragment of the second data table to be connected; storing each first fragment of the first data table to be connected to the first memory space and storing each second fragment of the second data table to be connected to the second memory space. It is understood that before storing the first data table to be connected to the first memory space and the second data table to be connected to the second memory space, it is necessary to determine whether the first data table to be connected and the second data table to be connected are larger than the cache of the accelerator board. If they are larger, the first data table to be connected and the second data table to be connected must be sliced immediately. The size of the resulting first fragment and second fragment will not exceed the cache size. Then, each first fragment is stored in the first memory space and each second fragment is stored in the second memory space.
[0064] Step S13: According to the connection operation type instruction and the preset number of parallel processing rows, perform hash connection calculation on each of the first fragments in the first memory space and each of the second fragments in the second memory space to obtain the corresponding connection table row data.
[0065] It is important to note that the allocation of backup cache space varies depending on the type of join operation. For example, when the join operation is an inner join instruction, no backup cache space is needed. When the join operation is a right join instruction, a second backup memory of the same size as the second memory space is needed. When the join operation is a left join instruction, a first backup memory of the same size as the first memory space is needed. When the join operation is a full join instruction, a first backup memory of the same size as the first memory space and a second backup memory of the same size as the second memory space are needed to perform subsequent hash join calculations and obtain the corresponding join table row data.
[0066] The beneficial effects of this application are as follows: First, a join operation type instruction is received; a join table size instruction is used to allocate a first memory space for a first data table to be joined and a second memory space for a second data table to be joined, so as to store each first fragment of the first data table to be joined in the first memory space and each second fragment of the second data table to be joined in the second memory space; according to the join operation type instruction and a preset number of parallel processing rows, a hash join calculation is performed on each first fragment in the first memory space and each second fragment in the second memory space to obtain the corresponding join table row data. Therefore, this application is applied to an acceleration board. After storing each first fragment of the first data table to be connected into the first memory space and storing each second fragment of the second data table to be connected into the second memory space, the first fragment in the first memory space and the second fragment in the second memory space can be hashed together according to the join operation type instruction to obtain the corresponding join table row data. Since the acceleration board can process hash joins of multiple rows of data in parallel, multiple rows of data can be processed simultaneously when hashing the first fragment and the second fragment, thus speeding up the hash join calculation and effectively improving the hardware acceleration effect of hash join.
[0067] See Figure 3 The flowchart shown illustrates a specific inner join calculation. This application embodiment discloses a hardware acceleration method for hash joins of large data tables, applied to an acceleration board, including:
[0068] Step S21: Receive connection operation type instruction.
[0069] Step S22: Use the join table size instruction to request the first memory space of the first data table to be joined and the second memory space of the second data table to be joined, so as to store each first fragment of the first data table to be joined into the first memory space and store each second fragment of the second data table to be joined into the second memory space.
[0070] For example, the two data tables to be joined are data table A and data table B. A first memory space is allocated for data table A and a second memory space is allocated for data table B. Then, data table A and data table B are partitioned to obtain each first partition of data table A and each second partition of data table B, so that each first partition of data table A is written to the first memory space and each second partition of data table B is written to the second memory space.
[0071] Step S23: If the join operation type instruction is an inner join calculation instruction, then read the current first fragment of the first data table to be joined in the first memory space, and generate the current hash table of the hash calculation construction stage of the current first fragment.
[0072] In this embodiment, the DDR (Double Data Rate) controller 1 reads the current first fragment 1-1 data in the first memory space cache, sends it to the hash calculation module, generates a hash table for the hash calculation construction stage, and caches it.
[0073] Step S24: Obtain the row data of the current first partition in the current hash table based on the preset number of parallel processing rows, and use the row data of the current first partition and the row data of each second partition in the second memory space that meets the first preset condition to generate the inner join table row data corresponding to the current first partition.
[0074] In this embodiment, the step of generating inner join table row data corresponding to the current first shard using the row data of the current first shard and the row data of each second shard in the second memory space that meets the first preset condition includes: reading the current second shard of the second data table to be joined in the second memory space and generating the current hash value of the hash calculation probing phase of the current second shard; obtaining the row data of the current second shard in the current hash value based on a preset number of parallel processing rows, and determining whether the row data of the current second shard meets the first preset condition. If it does, generating inner join table row data corresponding to the current second shard using the row data of the current first shard and the row data of the current second shard; reading the next second shard of the second data table to be joined in the second memory space, updating the next second shard to the current second shard, and jumping back to the step of generating the current hash value of the hash calculation probing phase of the current second shard, until inner join table row data corresponding to each second shard is obtained, so as to obtain inner join table row data corresponding to the current first shard. The first preset condition is met if the current second shard and the current first shard can generate inner join table row data. If they cannot, it is determined that the first preset condition is not met, and the data of the current second shard 2-1 that cannot generate inner join table row data is discarded. If the first preset condition is met, the generated inner join table row data is cached in the target join cache module, and the data in the target join cache module is sent to the read data DDR module through the bus module to complete the hash join calculation of the current second shard 2-1 data.
[0075] It is important to note that when connecting the current first shard and the current second shard, parallel connection can be used to improve the connection speed. For example, if the preset number of parallel processing rows is set to 5, then 5 rows of data can be connected simultaneously.
[0076] Step S25: Read the next first shard of the first data table to be connected in the first memory space, update the next first shard to the current first shard, and jump back to the step of the current hash table in the hash calculation construction stage of generating the current first shard, until the inner join table row data corresponding to each first shard is obtained.
[0077] It is understandable, for example Figure 4 The diagram illustrates a specific inner join calculation. It requires performing an inner join calculation on each first shard and each second shard to generate corresponding inner join table row data, which is then saved to the target join cache module. The upper module is informed of the completion of the calculation via the data read / write interface in the form of an instruction, along with the storage address of the hash join calculation in the read data DDR module, and waits for the inner join table row data to be read out.
[0078] Therefore, this application can control the output data caching method of the inner join connection method and the data sharding calculation algorithm, which can accelerate the data table connection of any data size. It has a simple and efficient structure and high versatility.
[0079] See Figure 5 The diagram illustrates a specific right join calculation flowchart. This application embodiment discloses a hardware acceleration method for hash joins of large data tables, applied to an acceleration board, including:
[0080] Step S31: Receive connection operation type instruction.
[0081] Step S32: Use the join table size instruction to request the first memory space of the first data table to be joined and the second memory space of the second data table to be joined, so as to store each first fragment of the first data table to be joined into the first memory space and store each second fragment of the second data table to be joined into the second memory space.
[0082] Step S33: If the join operation type instruction is a right join calculation instruction, then allocate the second spare memory corresponding to the second memory space, so as to perform right join calculation on each of the first fragments in the first memory space, each of the second fragments in the second memory space, and the data stored in the second spare memory based on the preset number of parallel processing rows, so as to obtain the corresponding right join table row data.
[0083] In this embodiment, the step of performing a right join calculation on the data stored in the first memory space, the second memory space, and the second spare memory based on a preset number of parallel processing rows to obtain the corresponding right join table row data includes: reading the current first fragment of the first data table to be joined in the first memory space and generating the current hash table for the hash calculation construction stage of the current first fragment; obtaining the row data of the current first fragment in the current hash table based on the preset number of parallel processing rows, and generating the right join table row data corresponding to the current first fragment using the row data of the current first fragment, the row data of each second fragment in the second memory space that meets the second preset condition, and the row data of each second fragment in the second spare memory that does not meet the second preset condition; reading the next first fragment of the first data table to be joined in the first memory space, updating the next first fragment to the current first fragment, and jumping back to the step of generating the current hash table for the hash calculation construction stage of the current first fragment, until the right join table row data corresponding to each of the first fragments is obtained. Among them, satisfying the second preset condition means that the current second partition can be right-joined, for example. Figure 6The diagram illustrates a specific right join calculation. After performing a right join calculation on all the first fragments and the second fragments between them, the data of all right join table rows are output.
[0084] Therefore, this application demonstrates that because the acceleration board can process hash joins of multiple rows of data in parallel, it can simultaneously process multiple rows of data when performing hash join calculations on the first and second fragments, thus accelerating the hash join calculation speed and effectively improving the hardware acceleration effect of hash joins.
[0085] See Figure 7 The diagram illustrates a specific left join calculation flowchart. This application embodiment discloses a specific hardware acceleration method for hash joins of large data tables, applied to an acceleration board, including:
[0086] Step S41: Receive connection operation type instruction.
[0087] Step S42: Use the join table size instruction to request the first memory space of the first data table to be joined and the second memory space of the second data table to be joined, so as to store each first fragment of the first data table to be joined into the first memory space and store each second fragment of the second data table to be joined into the second memory space.
[0088] Step S43: If the join operation type instruction is a left join calculation instruction, then allocate the first spare memory corresponding to the first memory space, so as to perform left join calculation on each of the first fragments in the first memory space, each of the second fragments in the second memory space, and the data stored in the first spare memory based on the preset number of parallel processing rows, so as to obtain the left join table row data.
[0089] In this embodiment, the step of performing a left join calculation on the data stored in the first memory space, the second memory space, and the first spare memory based on a preset number of parallel processing rows to obtain left join table row data includes: generating row data for the first memory space and row data for the second memory space based on the format of the left join data rows; and performing a hash join calculation on the row data of the first memory space that meets a third preset condition, the row data of the second memory space, and the row data of the first memory space that does not meet the third preset condition, based on the preset number of parallel processing rows to obtain left join table row data. Meeting the third preset condition means that a left join calculation can be performed, for example... Figure 8 The diagram illustrates a specific left join operation. After performing a right join operation on all the first fragments and the second fragments between them, the diagram outputs all the rows of data in the left join table.
[0090] Therefore, it can be seen that when performing hash joins on the first and second data tables to be joined, this application utilizes the parallel processing capability of the acceleration board to process multiple rows of data simultaneously, thereby accelerating the join of data tables of any size and significantly improving the speed.
[0091] See Figure 9 The diagram illustrates a specific fully connected computation flowchart. This application discloses a hardware acceleration method for hash joins of large data tables, applied to an acceleration board, comprising:
[0092] Step S51: Receive connection operation type instruction.
[0093] Step S52: Use the join table size instruction to request the first memory space of the first data table to be joined and the second memory space of the second data table to be joined, so as to store each first fragment of the first data table to be joined into the first memory space and store each second fragment of the second data table to be joined into the second memory space.
[0094] Step S53: If the connection operation type instruction is a fully connected computation instruction, then request the first spare memory corresponding to the first memory space and the second spare memory corresponding to the second memory space.
[0095] Step S54: Based on the preset number of parallel processing rows, perform a right join calculation on the data stored in each of the first fragments in the first memory space, each of the second fragments in the second memory space, and the second spare memory to obtain the corresponding right join table row data.
[0096] In this embodiment, if the join operation type instruction is a full join calculation instruction, then a right join calculation needs to be performed first. The right join calculation is performed on data table A and data table B. When calculating the last first fragment of data table A, the non-target join row cache module data of the probe table is sent to the second spare memory of the read data DDR module and the write data DDR module via the bus module.
[0097] Step S55: Based on the preset number of parallel processing rows, perform left join calculations on the data stored in each of the first fragments in the first memory space, each of the second fragments in the second memory space, and the first spare memory to obtain the corresponding left join table row data, and discard the obtained left join table row data to obtain the corresponding full join table row data.
[0098] Understandably, after performing a right join calculation, a left join calculation is required. For the left join calculation of data table A and data table B, data from the second shard 2-N in the second memory space, the second shard 3-N in the second spare memory, and the first shard data in the first spare memory are simultaneously read and sent to the hash calculation module for hash calculation during the construction phase. It's important to note that when discarding data from the left join calculation, all data from the target join cache module must be placed to complete the full join calculation.
[0099] Therefore, this application does not need to process only one row of data each time when completing the full connection calculation. In other words, when performing hash connection calculation on the first and second shards, multiple rows of data can be processed simultaneously, which speeds up the hash connection calculation and effectively improves the hardware acceleration effect of hash connection.
[0100] See Figure 10 As shown in the figure, this application discloses a hardware acceleration device for hash joins of large data tables, applied to an acceleration board, including:
[0101] Instruction receiving module 11 is used to receive connection operation type instructions;
[0102] The sharded storage module 12 is used to apply for a first memory space for a first data table to be connected and a second memory space for a second data table to be connected using the join table size instruction, so as to store each first shard of the first data table to be connected into the first memory space and store each second shard of the second data table to be connected into the second memory space.
[0103] The hash connection module 13 is used to perform hash connection calculations on each of the first fragments in the first memory space and each of the second fragments in the second memory space according to the connection operation type instruction and the preset number of parallel processing rows, so as to obtain the corresponding connection table row data.
[0104] The beneficial effects of this application are as follows: First, a join operation type instruction is received; a join table size instruction is used to allocate a first memory space for a first data table to be joined and a second memory space for a second data table to be joined, so as to store each first fragment of the first data table to be joined in the first memory space and each second fragment of the second data table to be joined in the second memory space; according to the join operation type instruction and a preset number of parallel processing rows, a hash join calculation is performed on each first fragment in the first memory space and each second fragment in the second memory space to obtain the corresponding join table row data. Therefore, this application is applied to an acceleration board. After storing each first fragment of the first data table to be connected into the first memory space and storing each second fragment of the second data table to be connected into the second memory space, the first fragment in the first memory space and the second fragment in the second memory space can be hashed together according to the join operation type instruction to obtain the corresponding join table row data. Since the acceleration board can process hash joins of multiple rows of data in parallel, multiple rows of data can be processed simultaneously when hashing the first fragment and the second fragment, thus speeding up the hash join calculation and effectively improving the hardware acceleration effect of hash join.
[0105] Furthermore, embodiments of this application also provide an electronic device. Figure 11 This is a structural diagram of an electronic device 20 according to an exemplary embodiment. The content of the diagram should not be construed as limiting the scope of this application.
[0106] Figure 11 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Specifically, it may include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input / output interface 25, and a communication bus 26. The memory 22 stores a computer program, which is loaded and executed by the processor 21 to implement the relevant steps in the hardware acceleration method for hash joins of large data tables executed by the electronic device disclosed in any of the foregoing embodiments.
[0107] In this embodiment, the power supply 23 is used to provide operating voltage for various hardware devices on the electronic device; the communication interface 24 can create a data transmission channel between the electronic device and external devices, and the communication protocol it follows can be any communication protocol applicable to the technical solution of this application, and is not specifically limited here; the input / output interface 25 is used to acquire external input data or output data to the outside world, and its specific interface type can be selected according to specific application needs, and is not specifically limited here.
[0108] The processor 21 may include one or more processing cores, such as a quad-core processor or an octa-core processor. The processor 21 may be implemented using at least one hardware form selected from DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array). The processor 21 may also include a main processor and a coprocessor. The main processor, also known as a CPU (Central Processing Unit), is used to process data in the wake-up state; the coprocessor is a low-power processor used to process data in the standby state. In some embodiments, the processor 21 may integrate a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content to be displayed on the screen. In some embodiments, the processor 21 may also include an AI (Artificial Intelligence) processor, which is used to handle computational operations related to machine learning.
[0109] In addition, the memory 22, as a carrier for resource storage, can be a read-only memory, random access memory, disk or optical disk, etc. The resources stored on it include operating system 221, computer program 222 and data 223, etc., and the storage method can be temporary storage or permanent storage.
[0110] The operating system 221 manages and controls the various hardware devices and computer programs 222 on the electronic device to enable the processor 21 to perform calculations and processing on the massive data 223 in the memory 22. The operating system can be Windows, Unix, Linux, etc. The computer program 222, in addition to including a computer program capable of performing the hardware-accelerated hash join method for large data tables executed by the electronic device as disclosed in any of the foregoing embodiments, may further include computer programs capable of performing other specific tasks. The data 223 may include data received by the electronic device from external devices, as well as data collected by its own input / output interface 25.
[0111] Furthermore, embodiments of this application also disclose a computer-readable storage medium storing a computer program. When the computer program is loaded and executed by a processor, it implements the method steps performed during the hardware acceleration process of hash join of a large data table disclosed in any of the foregoing embodiments.
[0112] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0113] The above provides a detailed description of a hardware acceleration method, apparatus, device, and medium for hash joins of large data tables provided by the present invention. Specific examples have been used to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of the present invention. Therefore, the content of this specification should not be construed as a limitation of the present invention.
Claims
1. A hardware acceleration method for hash joins of large data tables, characterized in that, Applications include: Receive connection operation type instructions; The first memory space of the first data table to be joined and the second memory space of the second data table to be joined are requested by using the join table size instruction, so as to store each first fragment of the first data table to be joined into the first memory space and each second fragment of the second data table to be joined into the second memory space. According to the connection operation type instruction and the preset number of parallel processing rows, hash connection calculation is performed on each of the first fragments in the first memory space and each of the second fragments in the second memory space to obtain the corresponding connection table row data. The step of performing hash join calculations on each first fragment in the first memory space and each second fragment in the second memory space according to the join operation type instruction and the preset number of parallel processing rows to obtain the corresponding join table row data includes: If the join operation type instruction is an inner join calculation instruction, then the current first shard of the first data table to be joined in the first memory space is read, and the current hash table of the hash calculation construction stage of the current first shard is generated; the row data of the current first shard in the current hash table is obtained based on the preset number of parallel processing rows, and the row data of the current first shard and the row data of each second shard in the second memory space that meets the first preset condition are used to generate the inner join table row data corresponding to the current first shard; the next first shard of the first data table to be joined in the first memory space is read, the next first shard is updated to the current first shard, and the process jumps back to the step of generating the current hash table of the hash calculation construction stage of the current first shard, until the inner join table row data corresponding to each first shard is obtained; If the join operation type instruction is a right join calculation instruction, then the second spare memory corresponding to the second memory space is requested so that the data stored in each of the first fragments in the first memory space, each of the second fragments in the second memory space, and the second spare memory can be right joined based on the preset number of parallel processing rows to obtain the corresponding right join table row data. If the join operation type instruction is a left join calculation instruction, then the first spare memory corresponding to the first memory space is requested so as to perform left join calculation on each of the first fragments in the first memory space, each of the second fragments in the second memory space and the data stored in the first spare memory based on the preset number of parallel processing rows, so as to obtain the left join table row data. If the join operation type instruction is a full join instruction, then allocate the first spare memory corresponding to the first memory space and the second spare memory corresponding to the second memory space; perform a right join calculation on the data stored in each of the first fragments in the first memory space, each of the second fragments in the second memory space, and the second spare memory based on a preset number of parallel processing rows to obtain the corresponding right join table row data; perform a left join calculation on the data stored in each of the first fragments in the first memory space, each of the second fragments in the second memory space, and the first spare memory based on the preset number of parallel processing rows to obtain the corresponding left join table row data, and discard the obtained left join table row data to obtain the corresponding full join table row data; The step of storing each first fragment of the first data table to be joined into the first memory space, and storing each second fragment of the second data table to be joined into the second memory space, includes: Determine whether the first data table to be connected and the second data table to be connected are larger than the cache of the acceleration board. If they are larger, slice the first data table to be connected and the second data table to be connected to obtain each first slice of the first data table to be connected and each second slice of the second data table to be connected. Store each first slice of the first data table to be connected to the first memory space and store each second slice of the second data table to be connected to the second memory space.
2. The hardware acceleration method for hash joins of large data tables according to claim 1, characterized in that, The step of generating inner join table row data corresponding to the current first shard using the row data of the current first shard and the row data of each second shard in the second memory space that meets the first preset condition includes: Read the current second fragment of the second data table to be connected in the second memory space, and generate the current hash value of the current second fragment in the hash calculation probing phase; Based on a preset number of parallel processing rows, the row data of the current second shard in the current hash value is obtained, and it is determined whether the row data of the current second shard meets the first preset condition. If it does, the row data of the current first shard and the row data of the current second shard are used to generate the inner join table row data corresponding to the current second shard. Read the next second shard of the second data table to be joined in the second memory space, update the next second shard to the current second shard, and jump back to the step of generating the current hash value of the hash calculation probe phase of the current second shard until the inner join table row data corresponding to each second shard is obtained, so as to obtain the inner join table row data corresponding to the current first shard.
3. The hardware acceleration method for hash joins of large data tables according to claim 1, characterized in that, The step of performing a right join calculation on the data stored in each of the first fragments in the first memory space, each of the second fragments in the second memory space, and the second spare memory based on a preset number of parallel processing rows, to obtain the corresponding right join table row data, includes: Read the current first fragment of the first data table to be connected in the first memory space, and generate the current hash table in the hash calculation construction stage of the current first fragment; Based on a preset number of parallel processing rows, the row data of the current first shard in the current hash table is obtained. The row data of the current first shard, the row data of each second shard in the second memory space that meets the second preset condition, and the row data of each second shard in the second spare memory that does not meet the second preset condition are used to generate the right join table row data corresponding to the current first shard. Read the next first shard of the first data table to be joined in the first memory space, update the next first shard to the current first shard, and jump back to the step of the current hash table in the hash calculation construction stage of generating the current first shard, until the right join table row data corresponding to each first shard is obtained.
4. The hardware acceleration method for hash join of large data tables according to claim 3, characterized in that, The step of performing a left join calculation on the data stored in the first memory space, the second memory space, and the first spare memory, based on a preset number of parallel processing rows, to obtain the left join table row data, includes: Based on the format of the left-joined data rows, the row data of the first slice in the first memory space and the row data of the second slice in the second memory space are generated respectively. Based on a preset number of parallel processing rows, hash join calculations are performed using the row data of the first segment in the first memory space that meets the third preset condition, the row data of the second segment in the second memory space, and the row data of the first segment in the first spare memory that does not meet the third preset condition, to obtain the row data of the left join table.
5. A hardware acceleration device for hash joins of large data tables, characterized in that, Applications include: The instruction receiving module is used to receive connection operation type instructions; The sharded storage module is used to apply for a first memory space for a first data table to be joined and a second memory space for a second data table to be joined using the join table size instruction, so as to store each first shard of the first data table to be joined into the first memory space and store each second shard of the second data table to be joined into the second memory space. The hash join module is used to perform hash join calculations on each of the first fragments in the first memory space and each of the second fragments in the second memory space according to the join operation type instruction and the preset number of parallel processing rows, so as to obtain the corresponding join table row data. The hash connection module is specifically used for: If the join operation type instruction is an inner join calculation instruction, then the current first shard of the first data table to be joined in the first memory space is read, and the current hash table of the hash calculation construction stage of the current first shard is generated; the row data of the current first shard in the current hash table is obtained based on the preset number of parallel processing rows, and the row data of the current first shard and the row data of each second shard in the second memory space that meets the first preset condition are used to generate the inner join table row data corresponding to the current first shard; the next first shard of the first data table to be joined in the first memory space is read, the next first shard is updated to the current first shard, and the process jumps back to the step of generating the current hash table of the hash calculation construction stage of the current first shard, until the inner join table row data corresponding to each first shard is obtained; If the join operation type instruction is a right join calculation instruction, then the second spare memory corresponding to the second memory space is requested so that the data stored in each of the first fragments in the first memory space, each of the second fragments in the second memory space, and the second spare memory can be right joined based on the preset number of parallel processing rows to obtain the corresponding right join table row data. If the join operation type instruction is a left join calculation instruction, then the first spare memory corresponding to the first memory space is requested so as to perform left join calculation on each of the first fragments in the first memory space, each of the second fragments in the second memory space and the data stored in the first spare memory based on the preset number of parallel processing rows, so as to obtain the left join table row data. If the join operation type instruction is a full join instruction, then allocate the first spare memory corresponding to the first memory space and the second spare memory corresponding to the second memory space; perform a right join calculation on the data stored in each of the first fragments in the first memory space, each of the second fragments in the second memory space, and the second spare memory based on a preset number of parallel processing rows to obtain the corresponding right join table row data; perform a left join calculation on the data stored in each of the first fragments in the first memory space, each of the second fragments in the second memory space, and the first spare memory based on the preset number of parallel processing rows to obtain the corresponding left join table row data, and discard the obtained left join table row data to obtain the corresponding full join table row data; The fragmented storage module is specifically used for: Determine whether the first data table to be connected and the second data table to be connected are larger than the cache of the acceleration board. If they are larger, slice the first data table to be connected and the second data table to be connected to obtain each first slice of the first data table to be connected and each second slice of the second data table to be connected. Store each first slice of the first data table to be connected to the first memory space and store each second slice of the second data table to be connected to the second memory space.
6. An electronic device, characterized in that, include: Memory, used to store computer programs; A processor for executing the computer program to implement the steps of the hardware-accelerated method for hash joins of large data tables as described in any one of claims 1 to 4.
7. A computer-readable storage medium, characterized in that, Used to store computer programs; wherein, when the computer programs are executed by a processor, they implement the steps of the hardware-accelerated method for hash join of a large data table as described in any one of claims 1 to 4.