A data acceleration processing method based on field programmable logic gate array
By parsing database query requests to generate query feature description data and using an imitation learning strategy model to construct a fusion operator pipeline structure, the problem of the inflexibility of hardware acceleration solutions in existing technologies is solved, and efficient data acceleration processing and resource utilization are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- XINYI ELECTRONIC TECH (SHANGHAI) CO LTD
- Filing Date
- 2026-03-19
- Publication Date
- 2026-06-19
AI Technical Summary
Existing hardware acceleration solutions struggle to flexibly adjust data bypass or operator fusion order based on specific database query requirements, resulting in idle or misaligned computing power and an inability to effectively handle the challenges of massive data volumes and extremely low response latency.
By parsing database query requests, query feature description data is generated, and a fusion execution strategy is generated using an imitation learning strategy model. A fusion operator pipeline structure is constructed, data input channels and on-chip caches are configured, and filtering, joining, aggregation, projection and deduplication processing are performed. Execution status information is collected to update the strategy model, thereby achieving adaptive hardware execution.
It improves the efficiency and resource utilization of accelerated data processing, reduces the probability of mismatch between query features and hardware execution mode, and enhances the processing performance and stability of database query requests.
Smart Images

Figure CN122240663A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data processing and hardware acceleration technology, and in particular to a data acceleration processing method based on field-programmable gate arrays. Background Technology
[0002] With the rapid development of business scenarios such as big data analytics, real-time data fusion, and high-throughput online retrieval, database systems face the dual challenges of massive data processing and extremely low response latency. Field-programmable gate arrays (FPGAs), with their physical advantages of high parallelism, low latency, and reconfigurability, are gradually becoming an important technological path to overcome the traditional CPU computing power bottleneck and achieve hardware acceleration of database queries.
[0003] Existing hardware acceleration solutions mostly employ a static, fixed operator organization architecture. However, real database query requests are highly diverse, with varying relational topologies (such as tree, star, or snowflake schemas), selection rates, cardinality of join fields, and data scales. Traditional solutions execute according to a fixed operator join order and static data transmission paths, making it difficult to perceive specific query structure characteristics and basic data statistical distribution. They also struggle to flexibly adjust data bypassing or operator fusion order based on specific query needs, easily leading to idle computing power in local processing units or misalignment between computing power allocation and actual data load.
[0004] Therefore, this invention proposes a data acceleration processing method based on field-programmable gate arrays (FPGAs). The information disclosed in the background section is only for enhancing understanding of the background of this disclosure and may therefore contain prior art information that is not common knowledge to those skilled in the art. Summary of the Invention
[0005] The purpose of this invention is to address the shortcomings of existing technologies by providing a data acceleration processing method based on field-programmable gate arrays (FPGAs), thereby solving the technical problems mentioned in the background section.
[0006] To achieve the above objectives, the present invention provides the following technical solution: A data acceleration processing method based on field-programmable gate arrays includes the following steps: S1. Obtain the database query request, parse the query plan, extract query structure features and basic data statistics, and generate query feature description data; S2. Input the query feature description data into the imitation learning strategy model to generate a fusion execution strategy. The fusion execution strategy includes operator fusion order, data bypass path, on-chip cache allocation scheme and intermediate result suppression rules. S3. Based on the fusion execution strategy, construct and deploy the fusion operator pipeline, configure the data input channel, inter-stage transmission channel, on-chip buffer and result output channel, and generate execution configuration data; S4. Based on the execution configuration data, import the data block to be processed into the fusion operator pipeline, perform filtering, joining, aggregation, projection and deduplication processing, and generate target result data according to the intermediate result suppression rules. S5. Collect the execution status information corresponding to the target result data, generate execution feedback data, and update the historical high-performance query execution samples based on the execution feedback data so that the imitation learning strategy model can generate a fusion execution strategy in the future.
[0007] S1 specifically includes: obtaining the database query request, extracting the query plan corresponding to the database query request, and converting the query plan into a query plan association structure composed of query nodes and dependency edges; parsing the query plan association structure, identifying filtering nodes, connection nodes, aggregation nodes, projection nodes, and deduplication nodes, extracting the execution feature items corresponding to each query node, and generating a query structure feature set; reading the basic data statistics information corresponding to the database query request, and mapping the basic data statistics information to the feature units of each node in the query structure feature set to generate query feature description data.
[0008] S2 specifically includes: inputting query feature description data into the imitation learning strategy model to generate multiple candidate strategy codes; parsing the candidate strategy codes to generate a candidate execution strategy set, where each candidate execution strategy includes a candidate operator fusion order, a candidate data bypass path, a candidate on-chip cache allocation scheme, and a candidate intermediate result suppression rule; performing dependency consistency verification, data path consistency verification, result semantic preservation verification, and deployability verification on the candidate execution strategy set, and determining the fusion execution strategy from the candidate execution strategies that pass the verification.
[0009] S3 specifically includes: determining the fusion operator pipeline structure based on the fusion execution strategy, and determining the enabled processing segments and their connection relationships among the filtering processing segment, connection processing segment, aggregation processing segment, projection processing segment, and deduplication processing segment; configuring the data input channel, inter-stage transmission channel, data bypass path, on-chip buffer, and result output channel corresponding to the fusion operator pipeline structure according to the fusion execution strategy, and generating channel and buffer configuration results; mapping each hardware operator module to logical resources, mapping each on-chip buffer to on-chip storage resources, mapping the data input channel, inter-stage transmission channel, data bypass path, and result output channel to interface resources according to the fusion operator pipeline structure and channel and buffer configuration results, and generating execution configuration data.
[0010] S4 specifically includes: dividing the original data into data blocks to be processed based on the execution configuration data, and importing the data blocks to be processed into at least one of the corresponding filtering processing segment, joining processing segment, aggregation processing segment, projection processing segment, and deduplication processing segment; controlling each processing segment in the fusion operator pipeline to perform filtering processing, joining processing, aggregation processing, projection processing, and deduplication processing on the data blocks to be processed, generating intermediate processing results; performing truncation processing, compression processing, bypass processing, or merging processing on the intermediate processing results according to the intermediate result suppression rules, and outputting the data records that satisfy the semantics of the database query request after processing as the target result data.
[0011] S5 specifically includes: collecting execution status information corresponding to the target result data, and binding the execution status information with the database query request identifier, the fusion execution strategy identifier, the execution configuration data identifier, and the target result data identifier to generate the original feedback record; retrieving the corresponding query feature description data, fusion execution strategy, execution configuration data, and result scale information of the target result data based on the original feedback record to generate execution feedback data; performing high-performance sample determination based on the execution feedback data, and writing the execution feedback data determined to be new high-performance query execution samples into the historical high-performance query execution sample set for subsequent generation of fusion execution strategies by the imitation learning strategy model.
[0012] The beneficial effects of this invention are as follows: This invention parses database query requests, integrates query structure features with basic data statistics to generate query feature description data, and generates a fusion execution strategy based on this data. This allows the hardware execution method to adaptively adjust to different database query requests, reducing the probability of mismatch between query features and hardware execution methods. By generating the fusion execution strategy through a learning-based model and performing consistency and deployability checks on candidate execution strategies, the invention can determine a more suitable execution scheme for the current database query request while meeting result semantics and hardware deployment constraints, thereby improving the rationality and feasibility of the fusion execution strategy.
[0013] This invention constructs a fusion operator pipeline structure based on a fusion execution strategy and coordinates the configuration of data bypass paths, on-chip cache allocation schemes, and execution configuration data. This improves the coordination between filtering, joining, aggregation, projection, and deduplication processes, thereby reducing inter-stage waiting and increasing data throughput. During the fusion operator pipeline execution, intermediate processing results are truncated, compressed, bypassed, or merged according to intermediate result suppression rules. This suppresses intermediate result expansion, reduces unnecessary data transmission and on-chip cache usage, and thus improves data acceleration processing efficiency.
[0014] This invention generates execution feedback data by collecting execution status information corresponding to target result data, and updates historical high-performance query execution samples based on the execution feedback data. This allows for continuous feedback of historical execution experience to the imitation learning strategy model, thereby improving the targeting and stability of subsequent database query request generation and fusion execution strategies. A closed-loop processing chain is constructed, consisting of "query feature description data generation - fusion execution strategy determination - fusion operator pipeline deployment - target result data output - execution feedback data generation - historical high-performance query execution sample update," which balances query processing efficiency, resource utilization, and strategy iteration capabilities, thereby improving the data acceleration processing performance of field-programmable gate arrays (FPGAs). Attached Figure Description
[0015] Figure 1 This is a schematic diagram of a data acceleration processing method based on a field-programmable gate array according to the present invention. Detailed Implementation
[0016] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0017] Example 1: As Figure 1 As shown, this embodiment provides a data acceleration processing method based on field-programmable gate arrays, including the following steps: S1. Obtain the database query request, parse the query plan, extract query structure features and basic data statistics, and generate query feature description data; S2. Input the query feature description data into the imitation learning strategy model to generate a fusion execution strategy. The fusion execution strategy includes operator fusion order, data bypass path, on-chip cache allocation scheme and intermediate result suppression rules. S3. Based on the fusion execution strategy, construct and deploy the fusion operator pipeline, configure the data input channel, inter-stage transmission channel, on-chip buffer and result output channel, and generate execution configuration data; S4. Based on the execution configuration data, import the data block to be processed into the fusion operator pipeline, perform filtering, joining, aggregation, projection and deduplication processing, and generate target result data according to the intermediate result suppression rules. S5. Collect the execution status information corresponding to the target result data, generate execution feedback data, and update the historical high-performance query execution samples based on the execution feedback data so that the imitation learning strategy model can generate a fusion execution strategy in the future.
[0018] S1 specifically includes the following sub-steps: S110. Obtain the database query request to be executed, call the execution description information output by the database management system for the database query request, and extract the query plan corresponding to the database query request from the execution description information; wherein, the query plan shall at least include the data source table, field reading relationship, filtering conditions, inter-table join relationship, aggregation requirements, field projection relationship, and result deduplication requirements.
[0019] The extracted query plan is structurally transformed into a query plan association structure consisting of multiple query nodes and multiple dependency edges. Each query node corresponds to a query processing operation to be executed, and each dependency edge corresponds to the data transfer relationship between the output data of the preceding query node and the input data of the following query node.
[0020] Write a unified node description field for each query node. Each query node includes at least: node identifier, node type, set of input fields, set of output fields, identifier of the data source table to which it belongs, and dependency identifier. Among them, the node type is used to represent the processing operation category corresponding to the query node, and the dependency identifier is used to represent the predecessor or successor relationship between the query node and other query nodes.
[0021] The query plan association structure after the structured transformation is used as the direct input to step S120, so that step S120 performs node identification, feature extraction and dependency organization based on the query plan association structure.
[0022] S120. Read the query plan association structure output in step S110, traverse each query node according to the node identifier, and perform node classification and identification on each query node according to the node type and node description field; among them, query nodes containing condition judgment relationships are identified as filtering nodes, query nodes containing inter-table field matching relationships are identified as connection nodes, query nodes containing grouping or summarizing operations are identified as aggregation nodes, query nodes containing field trimming relationships are identified as projection nodes, and query nodes containing duplicate record elimination relationships are identified as deduplication nodes.
[0023] For each identified query node, corresponding execution features are extracted. Specifically, when the query node is a filter node, the number of filter conditions, the type of filter conditions, and the set of filter fields are extracted; when the query node is a join node, the set of join fields, the join direction, and the identifier of the data source table involved in the join are extracted; when the query node is an aggregation node, the set of aggregation fields, the aggregation type, and the set of grouping fields are extracted; when the query node is a projection node, the set of target fields is extracted; and when the query node is a deduplication node, the set of deduplication fields is extracted.
[0024] Each query node is encapsulated into a node feature unit, which includes the node type, input field set, output field set, data source table identifier, dependency identifier, and execution feature item. All node feature units are then arranged according to the order of dependencies represented by the dependent edges to generate a query structure feature set. Each node feature unit in the query structure feature set retains its predecessor node identifier and successor node identifier so that data statistics can be accurately mapped to the corresponding node feature unit in the future.
[0025] The query structural feature set is output to step S130 as the structural feature input for generating query feature description data.
[0026] S130. Read the basic data statistics information corresponding to the database query request. The basic data statistics information includes at least the table data size, field value distribution, cardinality of join fields, selection rate of filter fields, and expansion rate of historical result sets. Among them, the table data size is used to represent the total number of data records in the corresponding data source table, the field value distribution is used to represent the degree of dispersion or concentration of the corresponding field values, and the cardinality of join fields is used to represent the number of different values of the fields involved in the join in the corresponding data source table.
[0027] To determine the selectivity of a filter field, first count the total number of data records involved in the calculation of the current filter field, denoted as . Next, count the number of data records that meet the current filtering criteria, and record it as... Selectivity of the filter field Defined as the ratio of the number of data records that meet the current filter criteria to the total number of data records used in the calculation of the current filter field, i.e.: in, Used to characterize the proportion of original data records retained by the current filtering criteria; when When the value is small, it indicates that the current filter condition has a strong filtering effect on the data records; when... A larger value indicates that the current filtering condition has a weaker filtering effect on the data records.
[0028] To determine the historical result set expansion rate, the number of input data records at a specific processing stage during historical execution is counted and denoted as... Next, count the number of output data records for the corresponding processing stage, and record it as... ; Inflation rate of historical result sets Defined as the ratio of the number of output data records to the number of input data records in the corresponding processing stage, i.e.: in, Used to characterize the degree of scaling up of the data size in the corresponding processing stage; when When, it indicates that there is result expansion in this processing stage; when The larger the value, the higher the risk of intermediate result inflation in this processing stage.
[0029] The basic data statistics are mapped to the node feature units output in step S120 according to the data source table identifier, field identifier, or join field identifier, so that each node feature unit contains both query structure features and data statistics features corresponding to the processing object of the query node. Specifically, when a node feature unit corresponds to a filtering node, the selection rate of the filtering field is mapped to the node feature unit; when a node feature unit corresponds to a join node, the cardinality of the join field and the expansion rate of the historical result set are mapped to the node feature unit; when a node feature unit corresponds to an aggregation node, projection node, or deduplication node, the table data size and field value distribution corresponding to its input field set or output field set are mapped to the node feature unit.
[0030] After mapping, feature normalization is performed on each node feature unit. Specifically, discrete features such as node type, aggregation type, connection direction, and filter condition type are encoded, while numerical features such as table data size, cardinality of connection fields, selection rate of filter fields, and expansion rate of historical result sets are normalized. Then, the encoded discrete features and normalized numerical features are concatenated according to the preset field order to obtain the node feature records corresponding to each query node.
[0031] All node feature records are arranged in the order of dependencies represented by the edges to generate query feature description data corresponding to the database query request. The query feature description data is a structured feature sequence organized in node order. Each node feature record in the structured feature sequence includes node type features, field relationship features, dependency relationship features, and data statistical features.
[0032] The query feature description data is output to step S210 as input data for the imitation learning strategy model to generate the fusion execution strategy.
[0033] S2 specifically includes the following sub-steps: S210. Call the pre-trained imitation learning strategy model, read the query feature description data output in step S130, and input the node feature records in the query feature description data into the imitation learning strategy model in sequence according to the order of dependencies represented by the dependency edges; wherein, the input order of each node feature record is consistent with the node arrangement order used when generating the query feature description data in step S130, so that the imitation learning strategy model performs strategy inference processing while maintaining the dependency relationship between query nodes.
[0034] The imitation learning strategy model is trained from historical high-performance query execution samples. The historical high-performance query execution samples at least record the correspondence between query feature description data and high-performance execution schemes. Each high-performance execution scheme corresponds to a set of strategy parameters that can be deployed on a field-programmable gate array and obtain preset performance indicators. The set of strategy parameters includes at least operator fusion order parameters, data bypass parameters, on-chip cache allocation parameters, and intermediate result suppression parameters.
[0035] As a specific implementation, the imitation learning strategy model is built on a neural network architecture that includes a self-attention mechanism, which includes at least an encoder and a decoder. The encoder is used to perform feature embedding mapping on the feature records of each node that are input sequentially, and extracts the global dependencies and local context features between each query node through a multi-head self-attention layer to generate a hidden state sequence. The decoder includes a multi-channel multilayer perceptron (MLP) prediction head, which is used to receive the hidden state sequence and output the prediction probability distribution corresponding to the operator fusion order, data bypass path, on-chip cache allocation scheme and intermediate result suppression rules.
[0036] Furthermore, the control imitation learning strategy model outputs multiple candidate strategy codes based on the predicted probability distribution using a bundle search or sampling strategy; each candidate strategy code is a structured representation of a set of candidate execution strategy parameters, and each candidate strategy code includes at least an operator fusion order label field, a data bypass control field, an on-chip cache allocation field, and an intermediate result suppression field; the multiple candidate strategy codes are output to step S220 as input data for expanding and generating candidate execution strategies.
[0037] S220. Read the multiple candidate policy codes output in step S210, and parse each candidate policy code according to the preset policy field mapping rules. The policy field mapping rules are used to establish a one-to-one correspondence between each field inside the candidate policy code and the actual hardware execution parameters, so that the content of each field in the candidate policy code can be converted into a specific policy item that can be deployed in a field-programmable gate array.
[0038] For each candidate strategy encoding, the corresponding candidate operator fusion order, candidate data bypass path, candidate on-chip cache allocation scheme, and candidate intermediate result suppression rule are parsed to obtain them one-to-one. Among them, the candidate operator fusion order is used to characterize the arrangement order, serial relationship, or parallel relationship of filtering processing, joining processing, aggregation processing, projection processing, and deduplication processing in the fusion operator pipeline; the candidate data bypass path is used to characterize the transmission path of the data block to be processed or the intermediate processing result bypassing part of the processing segment; the candidate on-chip cache allocation scheme is used to characterize the allocation position, allocation range, and allocation priority of on-chip cache resources among each processing segment; and the candidate intermediate result suppression rule is used to characterize the processing rules for truncating, compressing, bypassing, or merging intermediate processing results when preset conditions are met.
[0039] The candidate operator fusion order, candidate data bypass path, candidate on-chip cache allocation scheme, and candidate intermediate result suppression rule obtained by parsing the same candidate strategy code are bound into a complete set of candidate execution strategies, and cross-combination between different strategy items obtained by parsing different candidate strategy codes is prohibited; the above parsing and binding process is repeated for all candidate strategy codes to generate a set of candidate execution strategies corresponding to the database query request; the set of candidate execution strategies is output to step S230 as input data for the fusion execution strategy selection and determination.
[0040] S230. Read the candidate execution strategy set output in step S220, and perform dependency consistency check, data path consistency check, and result semantic preservation check on each candidate execution strategy in the candidate execution strategy set in sequence; wherein, the dependency consistency check is used to determine whether the fusion order of candidate operators in the candidate execution strategy violates the dependency relationship between query nodes; the data path consistency check is used to determine whether the candidate data bypass path in the candidate execution strategy destroys the data transmission correspondence between the input and output of each processing segment; the result semantic preservation check is used to determine whether the candidate intermediate result suppression rule in the candidate execution strategy changes the result generation semantics corresponding to the database query request; the candidate execution strategy that fails any consistency check is directly eliminated.
[0041] For candidate execution strategies that pass the above consistency checks, further resource occupancy checks, interface bandwidth checks, and timing compliance checks are performed. Specifically, the resource occupancy check determines whether the fusion operator pipeline corresponding to the candidate execution strategy exceeds the logic resources of the field-programmable gate array and the on-chip storage resources. The interface bandwidth check determines whether the data input channel, inter-stage transmission channel, and result output channel corresponding to the candidate execution strategy exceed the preset bandwidth range. The timing compliance check determines whether the processing segment combination corresponding to the candidate execution strategy meets the preset clock cycle requirements. Candidate execution strategies that fail any of the deployability checks are directly eliminated.
[0042] For each candidate execution strategy that passes the consistency and deployability checks, calculate its comprehensive score; let the... The candidate execution strategies that pass the verification are denoted as: where i is a positive integer, and , The total number of candidate execution strategies that passed the verification is denoted as . The predicted data throughput is denoted as The predicted on-chip cache utilization rate is denoted as The overall score of the candidate execution strategy is denoted as Then the overall score Defined as: in, This represents the strategy prediction execution delay weighting coefficient. This represents the weighting coefficient for predicting data throughput in the strategy. This indicates the weighting coefficients for predicting on-chip cache occupancy and execution latency. Used to characterize the processing time required for a candidate execution strategy to complete the current database query request under estimated execution conditions, and to predict data throughput. Used to characterize the amount of data that a candidate execution strategy can process per unit of time, and to predict on-chip cache utilization. Used to characterize the on-chip cache resource consumption of candidate execution strategies; a comprehensive score is selected. Largest candidate execution strategy This serves as a fusion execution strategy that matches the current database query request.
[0043] The fusion execution strategy is output to step S310 as a direct input for constructing the fusion operator pipeline structure, configuring the data bypass path, and configuring the on-chip cache allocation scheme.
[0044] S3 specifically includes the following sub-steps: S310. Read the fusion execution strategy output in step S230, and extract the operator fusion order, data bypass path, on-chip cache allocation scheme, and intermediate result suppression rule from the fusion execution strategy; wherein, the operator fusion order, data bypass path, on-chip cache allocation scheme, and intermediate result suppression rule are used to determine the enabling status of each processing segment, the connection relationship between each processing segment, the data transmission relationship corresponding to each processing segment, and the cache resource usage relationship corresponding to each processing segment, respectively.
[0045] Based on the operator fusion order in the fusion execution strategy and the actual query requirements in the database query request, determine whether to enable the filtering processing segment, join processing segment, aggregation processing segment, projection processing segment, and deduplication processing segment. Specifically, the corresponding processing segment is enabled when the database query request contains the corresponding processing requirement, and disabled when the database query request does not contain the corresponding processing requirement. The filtering processing segment is used to perform filtering condition judgment processing, the join processing segment is used to perform inter-table field matching processing, the aggregation processing segment is used to perform grouping statistics or summary calculation processing, the projection processing segment is used to perform target field retention processing, and the deduplication processing segment is used to perform duplicate record elimination processing.
[0046] Based on the operator fusion order in the fusion execution strategy, a sequential arrangement relationship is established for each enabled processing segment. Specifically, when there is a sequential dependency relationship between the query nodes corresponding to two processing segments, the two processing segments are configured as a contiguous relationship according to the sequential dependency relationship. When there is no direct dependency relationship between the query nodes corresponding to two processing segments, and the fusion execution strategy allows parallel execution, the two processing segments are configured as a parallel relationship. When the output of a certain processing segment needs to be supplied to multiple subsequent processing segments simultaneously, the processing segment is configured as a branch output relationship.
[0047] Each enabled processing segment is mapped to at least one hardware operator module, enabling each hardware operator module to perform the same query processing function as the corresponding processing segment. The hardware operator module is a configurable logic processing unit deployed in a field-programmable gate array, used to perform field judgment, field matching, field summarization, field selection, or duplicate record discrimination operations on the input data according to a preset control relationship.
[0048] All processing segments with determined activation status, serial connections, parallel connections, and branch output connections, along with their corresponding hardware operator modules, are organized into a fusion operator pipeline structure. This fusion operator pipeline structure is then output to step S320 as the structural basis for configuring the data input channel, inter-stage transmission channel, on-chip buffer, and result output channel.
[0049] S320. Read the fusion operator pipeline structure output in step S310, and read the data bypass path and on-chip cache allocation scheme corresponding to the fusion operator pipeline structure in the fusion execution strategy; wherein, the data bypass path is used to indicate the transmission path of the data block to be processed or the intermediate processing result to bypass part of the processing segment when the preset conditions are met, and the on-chip cache allocation scheme is used to indicate the allocation location, allocation range, access method and priority of on-chip cache resources among each processing segment.
[0050] A data input channel is configured for the first-level processing segment in the fusion operator pipeline structure, enabling external input data to enter the first-level processing segment; an inter-stage transmission channel is configured between adjacent processing segments, enabling the output data of the preceding processing segment to be transmitted to the following processing segment; and a result output channel is configured for the final-level processing segment, enabling the target result data after processing to be output to the host side or external storage side. Among these, the data input channel, inter-stage transmission channel, and result output channel are all established according to the predetermined connection relationship between processing segments, without changing the serial connection relationship, parallel relationship, and branch output relationship determined in step S310.
[0051] For each data bypass path, a bypass start processing segment and a bypass target processing segment are determined, and a direct data transmission path is established between the output of the bypass start processing segment and the input of the bypass target processing segment. Specifically, when a data block to be processed or an intermediate processing result meets the bypass conditions specified in the fusion execution strategy, the data block to be processed or the intermediate processing result is controlled to no longer enter the bypassed processing segment, but is directly transmitted to the bypass target processing segment via the corresponding data bypass path, so as to reduce transmission waiting and duplicate processing in unnecessary processing.
[0052] The on-chip storage resources in the field-programmable gate array (FPGA) are divided according to the processing segment granularity or the data type granularity to form multiple on-chip cache areas. When divided by processing segment granularity, each on-chip cache area corresponds to one processing segment or a group of collaborative processing segments. When divided by data type granularity, different on-chip cache areas correspond to one or more of the following: data to be reused, intermediate index data, local aggregation results, and deduplication intermediate results. Then, according to the on-chip cache allocation scheme, each on-chip cache area is allocated to the corresponding processing segment, so that each processing segment has a local cache space that matches its processing object.
[0053] For each on-chip cache, its write source processing segment, read target processing segment, cache data type, and access priority are further configured. The write source processing segment indicates which processing segment will write data to the on-chip cache, the read target processing segment indicates which processing segment will read data from the on-chip cache, the cache data type limits whether the on-chip cache stores data to be reused, intermediate index data, local aggregation results, or deduplication intermediate results, and the access priority limits the access order when multiple processing segments compete to access the same on-chip cache. This enables each processing segment to complete on-chip cache writing and on-chip cache reading according to the data flow specified by the fusion execution strategy.
[0054] After configuring the data input channel, inter-stage transmission channel, data bypass path, on-chip buffer, and result output channel, the channel and buffer configuration results corresponding to the fusion operator pipeline structure are generated, and the channel and buffer configuration results are output to step S330 as the input basis for performing hardware resource mapping and generating execution configuration data.
[0055] S330: Read the fusion operator pipeline structure output in step S310 and the channel and cache configuration results output in step S320. Map each hardware operator module in the fusion operator pipeline structure to the logic resources in the field-programmable gate array (FPGA). Map each on-chip cache area in the channel and cache configuration results to the on-chip storage resources in the FPGA. Map the data input channel, inter-stage transmission channel, data bypass path, and result output channel to the interface resources in the FPGA. The logic resources are used to carry the query processing operation logic, the on-chip storage resources are used to carry the local data caching logic, and the interface resources are used to carry the external input / output transmission logic and the data exchange logic between processing segments.
[0056] After completing the above resource mapping, the connection relationships between each hardware operator module, each on-chip buffer, and each transmission channel are deployed and corrected to ensure that the input end of each hardware operator module can receive data from a predetermined source, and the output end of each hardware operator module can send the processing result to the subsequent processing segment, data bypass path, or result output channel according to the established connection relationship. Specifically, when a hardware operator module corresponds to a parallel processing relationship, it is configured with parallel input-output connections; when a hardware operator module corresponds to a branch output relationship, it is configured with multiple output connections; and when a hardware operator module corresponds to a bypassed target processing segment, it is configured with input connections from the corresponding data bypass path.
[0057] Based on the activation status of each processing segment, the connection relationship between each processing segment, the start and destination positions of each data bypass path, the allocation results of each on-chip cache, and the configuration results of the result output channel, execution configuration data corresponding to this deployment is generated. Among them, the execution configuration data is organized by processing segment, and each processing segment corresponds to a set of configuration records. Each set of configuration records includes at least: processing segment activation flag, input source flag, output destination flag, data bypass control flag, on-chip cache access flag, and result output control flag.
[0058] Among them, the processing segment enable flag indicates whether the corresponding processing segment participates in the processing during the execution of the current database query request; the input source flag indicates which one or more of the following: the input data of the corresponding processing segment comes from the data input channel, the output of the preceding processing segment, or the input of the data bypass path; the output destination flag indicates which one or more of the following: the output data of the corresponding processing segment is sent to the following processing segment, the data bypass path, or the result output channel; the data bypass control flag indicates whether the corresponding processing segment allows data that meets the bypass conditions to be imported into the data bypass path; the on-chip cache access flag indicates which on-chip cache the corresponding processing segment accesses and whether it uses the read mode, write mode, or read-write parallel mode; and the result output control flag indicates whether the output of the corresponding processing segment can be directly used as the target result data for output.
[0059] The execution configuration data is written to the control register unit or configuration storage unit so that the subsequent step S410 can control the start and stop, data flow direction, bypass selection and cache access behavior of each processing segment according to the execution configuration data when importing the data block to be processed; and the execution configuration data is output to step S410 as the direct control input for the accelerated data processing based on the pipelined execution of the fusion operator.
[0060] S4 specifically includes the following sub-steps: S410. Read the execution configuration data output in step S330 and call the data interface connected to the field programmable gate array to read the original data from the data source table corresponding to the database query request; wherein, the original data includes at least the data records and corresponding field values required for filtering, joining, aggregation, projection and deduplication processes.
[0061] The read raw data is divided into multiple data blocks to be processed according to a preset number of records, field width, or interface transmission width, so that each data block meets the capacity requirements of a single transmission and processing of the field-programmable gate array. Specifically, when the preset number of records is used for division, each data block contains a fixed number of data records; when the field width is used for division, each data block contains a set of data records whose total field width does not exceed the preset upper limit of bit width; when the interface transmission width is used for division, each data block contains a set of data records that can be completely transmitted in one or a limited number of interface transmission cycles.
[0062] Based on the processing segment enable flag and input source flag in the execution configuration data, the primary processing segment to which each data block to be processed enters is determined. When there is only one primary processing segment, the corresponding data block to be processed is directly imported into that primary processing segment. When there are multiple primary processing segments and the execution configuration data indicates a parallel processing relationship, the same data block to be processed is copied and distributed to multiple primary processing segments, or the same data block to be processed is split and distributed to multiple primary processing segments according to field category or record category. When the execution configuration data indicates a branch input relationship, the corresponding data block to be processed is sent to the primary processing segments of each branch respectively.
[0063] Based on the data bypass control flag and on-chip cache access flag in the execution configuration data, corresponding data input paths and on-chip cache access paths are established for each imported data block to be processed. The data input path is used to determine whether the data block to be processed enters the first-level processing segment via the data input channel or enters the non-first-level processing segment via the data bypass path. The on-chip cache access path is used to determine whether the data block to be processed needs to read the reusable data, intermediate index data, local aggregation results, or deduplication intermediate results from the corresponding on-chip cache before entering the corresponding processing segment.
[0064] After the path is determined, each data block to be processed is imported into at least one of the following processing segments: filtering processing segment, connection processing segment, aggregation processing segment, projection processing segment, and deduplication processing segment, which correspond to its processing requirements. The imported data blocks to be processed are then output to step S420 as input objects for performing the fusion operator pipelined processing.
[0065] S420: Read each data block to be processed imported in step S410, and control each processing segment in the fusion operator pipeline to perform corresponding processing on the data block to be processed according to the execution configuration data output in step S330; wherein, each processing segment performs data processing according to the serial connection relationship, parallel relationship or branch output relationship determined in step S310, and the processing method of each processing segment on the data block to be processed is consistent with its corresponding query processing function.
[0066] When the data block to be processed enters the filtering processing section, the hardware operator module in the filtering processing section is called to perform filtering condition judgment on each data record in the data block to be processed, remove data records that do not meet the filtering conditions, and retain data records that meet the filtering conditions, generating the filtering processing result; wherein, the filtering processing result serves as one of the inputs to the subsequent connection processing section, aggregation processing section, projection processing section or deduplication processing section.
[0067] When a block of data to be processed or a result of filtering enters the join processing segment, the hardware operator module in the join processing segment is called to perform field matching processing on the data records from different data source tables according to the join field set and join direction, generating associated records that satisfy the join relationship, forming the result of the join processing; when the join processing segment needs to call intermediate index data, it reads the required intermediate index data from the corresponding on-chip cache, and uses the read intermediate index data together with the current input data for field matching processing.
[0068] When a data block to be processed, a result in filtering, or a result in joining enters the aggregation processing segment, the hardware operator module in the aggregation processing segment is called to perform group statistics or summary calculations on the input data according to the grouping field set, generating a local aggregation result, which forms the result in the aggregation processing. Among them, when there are multiple data blocks to be processed with the same grouping key in the aggregation processing segment, the corresponding local aggregation result is written to the allocated on-chip cache, and when subsequent data with the same grouping key arrives, the historical local aggregation result in the on-chip cache is read to continue the aggregation update processing.
[0069] When the data block to be processed, the result of filtering, the result of joining, or the result of aggregation enters the projection processing section, the hardware operator module in the projection processing section is called to retain the field values corresponding to the target field set from the input data and delete the field values corresponding to the non-target field set, forming the result of projection processing; when the input data enters the deduplication processing section, the hardware operator module in the deduplication processing section is called to identify duplicate records according to the deduplication field set, delete duplicate records, retain unique records, forming the result of deduplication processing.
[0070] When processing segments are connected in series, all processing results from the preceding processing segment are used as input for the following processing segment. When processing segments are connected in parallel, the same data block to be processed is copied and input into multiple parallel processing segments, or it is split according to field category or record category and input into multiple parallel processing segments. When processing segments are connected in a branching output relationship, the same processing result is sent to multiple subsequent processing segments simultaneously. Each processing segment processes each data block to be processed and forms a corresponding local processing output, which is recorded as an intermediate processing result.
[0071] All intermediate processing results output by each processing segment are recorded according to the processing segment order, processing branch and data source, and the intermediate processing results are output to step S430 as input objects for performing intermediate result suppression processing and generating target result data.
[0072] S430. Read the intermediate processing results output in step S420 and read the intermediate result suppression rules in the fusion execution strategy. Perform suppression condition judgment on each intermediate processing result. The intermediate result suppression rules include at least one or more of the following: the data record number threshold of the intermediate processing result, the intermediate result expansion threshold corresponding to the current processing segment, the on-chip cache occupancy rate threshold, and the repetition mode judgment condition.
[0073] Let the number of data records input to the current processing segment be... The number of data records in the intermediate processing results output by the current processing segment is The inflation rate of the intermediate results corresponding to the current processing segment is denoted as . The intermediate result inflation rate Defined as: Among them, when If the value exceeds a preset inflation threshold, the current intermediate processing result is deemed to have an inflation risk; when If the inflation rate is not greater than the preset inflation threshold, it is determined that there is no significant inflation risk in the current intermediate processing result; the inflation rate determination result of the intermediate result is used as one of the bases for triggering the intermediate result suppression processing.
[0074] When an intermediate processing result meets the truncation condition in the intermediate result suppression rule, truncation processing is performed on the intermediate processing result. The truncation processing is used to retain only the data records that meet the priority condition, only the first preset number of data records, or only the data records that meet the target result generation condition, and delete the remaining data records that will no longer participate in subsequent processing, so as to reduce the amount of data received by the subsequent processing segment.
[0075] When an intermediate processing result meets the compression condition in the intermediate result suppression rule, compression processing is performed on the intermediate processing result. The compression processing is used to perform field compression, record compression, or key-value encoding compression on redundant fields, duplicate fields, or coded fields in the intermediate processing result, and write the compressed intermediate processing result to the corresponding on-chip buffer or directly transmit it to the subsequent processing segment to reduce the amount of data transmission and on-chip buffer usage.
[0076] When an intermediate processing result meets the bypass condition in the intermediate result suppression rule, bypass processing is performed on the intermediate processing result. The bypass processing is used to determine that a subsequent processing segment does not have the necessary processing effect on the current intermediate processing result. When the determination is true, the intermediate processing result is controlled not to enter the subsequent processing segment, but is directly transmitted to the corresponding bypass target processing segment through the data bypass path configured in step S320, so as to reduce unnecessary duplicate processing.
[0077] When an intermediate processing result meets the merging condition in the intermediate result suppression rule, a merging process is performed on the intermediate processing result. The merging process is used to merge multiple intermediate processing results with the same join key, the same grouping key, the same deduplication key, or the same combination of target fields into one or more merged results, and use the merged result as the input of the subsequent processing segment or as the candidate output result, so as to reduce the number of duplicate records and the degree of expansion of intermediate results.
[0078] Data records that still satisfy the semantics of the database query request after being truncated, compressed, bypassed, or merged will be output as the target result data. Among them, data records that are deleted after being truncated, data records that are only used as compression intermediates but have not entered the target output path, data records that have not entered the target path due to bypass failure, and original intermediate processing results that have become invalid after merging will not be directly output as the target result data.
[0079] The target result data is transmitted to step S510 as the result input for collecting execution status information and generating execution feedback data.
[0080] S5 specifically includes the following sub-steps: S510. Read the target result data output in step S430, and collect the execution status information corresponding to this database query request during the output of the target result data and during the execution of each processing segment; wherein, the execution status information includes at least the actual execution latency, data throughput, on-chip cache utilization, inter-stage waiting status, intermediate result compression effect, and result output volume.
[0081] For each processing segment, corresponding stage-level execution status information is collected. This stage-level execution status information includes at least the data input time, data output time, processing segment wait start time, processing segment wait end time, on-chip buffer access count, on-chip buffer usage, number of data records before compression, number of data records after compression, and the amount of data output by the processing segment. After the target result data is output, all stage-level execution status information is summarized to form the overall execution status information corresponding to this database query request.
[0082] Among them, actual execution latency is used to characterize the time length from the first data block to be processed entering the first-level processing segment to the completion of the output of all target result data; data throughput is used to characterize the amount of data processed per unit time; on-chip cache utilization rate is used to characterize the ratio of the actual utilization of the allocated on-chip cache area to the total available on-chip cache; inter-stage waiting status includes at least inter-stage waiting time and inter-stage waiting number; intermediate result compression effect includes at least one or more of the following: number of data records before compression, number of data records after compression, and change in total field width before and after compression; result output quantity is used to characterize the number of output data records or the total number of output fields in the target result data.
[0083] The overall execution status information is bound to the identifier of this database query request, the identifier of the fusion execution strategy adopted, the identifier of the execution configuration data adopted, and the identifier of the target result data to generate an original feedback record that corresponds one-to-one with this database query request; and the original feedback record is output to step S520 as the input object for generating execution feedback data.
[0084] S520. Read the original feedback record output in step S510, and based on the database query request identifier, fusion execution strategy identifier, execution configuration data identifier, and target result data identifier bound in the original feedback record, retrieve the query feature description data, fusion execution strategy, execution configuration data, and result scale information of the target result data corresponding to the original feedback record from the corresponding storage area; wherein, the result scale information includes at least one or more of the following: the number of output data records, the number of output fields, and the total width of the output fields in the target result data.
[0085] Based on the database query request identifier, the query feature description data, fusion execution strategy, execution configuration data, execution status information, and result scale information of the target result data in the original feedback record are subjected to field alignment and sequential concatenation processing. Among them, field alignment processing is used to make various fields corresponding to the same database query request correspond to each other according to a unified field name and a unified field position, and sequential concatenation processing is used to combine the feature field, strategy field, execution configuration field, performance field, and result field into a single structured feedback record in a preset field order.
[0086] A single structured feedback record after field alignment and sequential concatenation is defined as execution feedback data. Execution feedback data consists of a structured feedback record that corresponds one-to-one with a single database query request. Each structured feedback record includes at least: a feature field group representing query node characteristics and data statistical characteristics; a strategy field group representing the content of the fusion execution strategy; an execution configuration field group representing the content of the execution configuration data; a performance field group representing the content of the execution status information; and a result field group representing the content of the target result data scale.
[0087] The execution feedback data is written to the feedback data storage area and output to step S530 as the input object for high-performance sample screening and historical high-performance query execution sample update; so that the execution feedback data can fully represent the correspondence between "query feature description data - fusion execution strategy - execution configuration data - execution status information - target result data".
[0088] S530. Read the execution feedback data output in step S520. First, write the execution feedback data into the candidate sample area. Then, perform high-performance sample judgment on the execution feedback data in the candidate sample area according to the preset sample screening conditions. The preset sample screening conditions include at least one or more of the following: actual execution latency is lower than a set threshold, data throughput is higher than a set threshold, on-chip cache occupancy is within a set range, intermediate result compression effect meets set requirements, and result output meets the semantic requirements of database query request results.
[0089] Let the actual execution delay of this database query request be denoted as . Data throughput is denoted as On-chip cache utilization is denoted as The intermediate compression effect value is denoted as The overall performance score corresponding to the feedback data from this execution is recorded as follows: Then the overall performance score Defined as: in, This represents the weighting coefficient for the execution delay of the feedback evaluation. This represents the weighting coefficient for the data throughput in the feedback evaluation. This indicates the weighting coefficient for on-chip cache utilization in the feedback evaluation. Indicates the weighting coefficient of the compression effect of intermediate results; overall performance score. It is used to characterize the overall performance level of the current execution feedback data in terms of processing efficiency, processing throughput, cache resource usage, and intermediate result suppression effect.
[0090] Overall performance score Compare with a preset sample retention threshold; when the overall performance score is... When the value is greater than or equal to the preset sample retention threshold, the corresponding execution feedback data will be judged as a newly added high-performance query execution sample; when the comprehensive performance score is... When the data is less than the preset sample retention threshold, the corresponding execution feedback data will not be transferred to the historical high-performance query execution sample set. Specifically, for the execution feedback data that is determined to be a new high-performance query execution sample, it will be transferred from the candidate sample area to the historical high-performance query execution sample set, and the database query request identifier, sample generation time identifier, fusion execution strategy identifier, and comprehensive performance score identifier will be written for the new high-performance query execution sample.
[0091] The updated historical high-performance query execution sample set is processed to update the sample index, so that the newly added high-performance query execution samples can be retrieved according to query node characteristics, data statistical characteristics, fusion execution strategy type or comprehensive performance score range. Then, the updated historical high-performance query execution sample set is written into the training sample pool or sample retrieval library corresponding to the imitation learning strategy model, so that it can be called in subsequent steps S210 when inferring the execution strategy or when retraining the execution model, so that subsequent database query requests can generate a more suitable fusion execution strategy based on the updated historical high-performance query execution samples.
[0092] The updated set of historical high-performance query execution samples is output as the basic input source for the samples in step S210, completing the closed-loop update process of "target result data output - execution status information collection - execution feedback data generation - high-performance sample screening - historical high-performance query execution sample update - imitation learning strategy model invocation".
[0093] All the above formulas are performed using dimensionless numerical calculations; the relevant formulas are based on empirical models that approximate the real situation, obtained through extensive data collection and software simulation fitting. The preset parameters and thresholds involved in the formulas can be conventionally set and adjusted by those skilled in the art according to the physical constraints of the actual application scenario.
[0094] Those skilled in the art will recognize that the modules and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
[0095] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and modules described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0096] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple modules or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or modules may be electrical, mechanical, or other forms.
[0097] The above are merely specific embodiments of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
[0098] In conclusion, the above are merely preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A data acceleration processing method based on field-programmable gate arrays, characterized in that, Includes the following steps: S1. Obtain the database query request, parse the query plan, extract query structure features and basic data statistics, and generate query feature description data; S2. Input the query feature description data into the imitation learning strategy model to generate a fusion execution strategy. The fusion execution strategy includes operator fusion order, data bypass path, on-chip cache allocation scheme and intermediate result suppression rules. S3. Based on the fusion execution strategy, construct and deploy the fusion operator pipeline, configure the data input channel, inter-stage transmission channel, on-chip buffer and result output channel, and generate execution configuration data; S4. Based on the execution configuration data, import the data block to be processed into the fusion operator pipeline, perform filtering, joining, aggregation, projection and deduplication processing, and generate the target result data according to the intermediate result suppression rules.
2. The data acceleration processing method based on field-programmable gate arrays according to claim 1, characterized in that, Also includes: S5. Collect the execution status information corresponding to the target result data, generate execution feedback data, and update the historical high-performance query execution samples based on the execution feedback data so that the imitation learning strategy model can generate a fusion execution strategy in the future.
3. The data acceleration processing method based on field-programmable gate arrays according to claim 1, characterized in that, S1 specifically includes: Obtain the database query request, extract the query plan corresponding to the database query request, and convert the query plan into a query plan association structure composed of query nodes and dependency edges; The query plan association structure is analyzed, and the filtering nodes, joining nodes, aggregation nodes, projection nodes and deduplication nodes are identified. The execution feature items corresponding to each query node are extracted to generate a query structure feature set. Read the basic data statistics corresponding to the database query request, and map the basic data statistics to the feature units of each node in the query structure feature set to generate query feature description data.
4. The data acceleration processing method based on field-programmable gate arrays according to claim 1, characterized in that, S2 specifically includes: Input the query feature description data into the imitation learning strategy model to generate multiple candidate strategy codes; The candidate strategy encoding is parsed to generate a candidate execution strategy set. Each candidate execution strategy in the candidate execution strategy set includes the candidate operator fusion order, candidate data bypass path, candidate on-chip cache allocation scheme and candidate intermediate result suppression rule. The candidate execution strategy set is subjected to dependency consistency verification, data path consistency verification, result semantic preservation verification, and deployability verification, and the fusion execution strategy is determined from the candidate execution strategies that pass the verification.
5. The data acceleration processing method based on field-programmable gate arrays according to claim 1, characterized in that, S3 specifically includes: Based on the fusion execution strategy, the fusion operator pipeline structure is determined, and the enabled processing segments and their connection relationships among the filtering processing segment, connection processing segment, aggregation processing segment, projection processing segment, and deduplication processing segment are determined. Based on the fusion execution strategy, configure the data input channel, inter-stage transmission channel, data bypass path, on-chip buffer, and result output channel corresponding to the fusion operator pipeline structure, and generate channel and buffer configuration results; Based on the fusion operator pipeline structure and channel and cache configuration results, each hardware operator module is mapped to logical resources, each on-chip cache is mapped to on-chip storage resources, and data input channels, inter-stage transmission channels, data bypass paths, and result output channels are mapped to interface resources, and execution configuration data is generated.
6. The data acceleration processing method based on field-programmable gate arrays according to claim 1, characterized in that, S4 specifically includes: Based on the execution configuration data, the original data is divided into data blocks to be processed, and the data blocks to be processed are imported into at least one of the corresponding filtering processing segment, joining processing segment, aggregation processing segment, projection processing segment and deduplication processing segment. The control fusion operator pipeline performs filtering, joining, aggregation, projection and deduplication on the data blocks to be processed, generating intermediate processing results.
7. A data acceleration processing method based on a field-programmable gate array according to claim 6, characterized in that, It also includes: performing truncation, compression, bypassing, or merging processing on intermediate processing results according to intermediate result suppression rules, and outputting the data records that satisfy the semantics of the database query request after processing as the target result data.
8. A data acceleration processing method based on a field-programmable gate array according to claim 2, characterized in that, S5 specifically includes: Collect the execution status information corresponding to the target result data, and bind the execution status information with the database query request identifier, the fusion execution strategy identifier, the execution configuration data identifier, and the target result data identifier to generate the original feedback record; Based on the original feedback records, retrieve the corresponding query feature description data, fused execution strategy, execution configuration data, and target result data to generate execution feedback data.
9. A data acceleration processing method based on a field-programmable gate array according to claim 8, characterized in that, Also includes: Based on the execution feedback data, high-performance samples are determined, and the execution feedback data of those determined to be new high-performance query execution samples are written into the historical high-performance query execution sample set for subsequent generation of fusion execution strategies by the imitation learning strategy model.