Data compression method, electronic device, and program product
By dynamically adjusting the upper limit of the number of aggregated data blocks and dividing data block groups, the problem of rigid resource allocation caused by changes in system load was solved, improving data compression efficiency and the adaptability of the storage system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- DAWNING INFORMATION IND (BEIJING) CO LTD
- Filing Date
- 2026-03-05
- Publication Date
- 2026-06-30
AI Technical Summary
Existing data compression technologies suffer from low efficiency due to resource contention and response delays under high system loads and resource waste under low loads.
By obtaining the current and predicted load status of the target system, the upper limit of the number of aggregated data blocks is dynamically adjusted, data block groups are divided, and merging and compression processing is performed to optimize resource utilization and compression efficiency.
It achieves an adaptive optimal balance between processing performance and storage efficiency during system load variation cycles, improving data compression efficiency and the elasticity of the storage system.
Smart Images

Figure CN122309470A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of data processing technology, and more particularly to a data compression method, electronic device, and program product. Background Technology
[0002] With the acceleration of digitalization, the amount of data generated in data centers, cloud computing platforms, and big data application scenarios is growing exponentially. Taking enterprise-level storage systems as an example, users need to process hundreds of millions of data blocks every day, and these data blocks usually need to be compressed to reduce their space usage during storage.
[0003] The relevant compression technology has significant bottlenecks: when the system is under high load, the background merging and compression operation may preempt the foreground business resources, resulting in response delay or system performance degradation; while when the system load is low, the fixed merging and compression parameters cannot make full use of idle resources, resulting in resource waste and a decrease in compression rate.
[0004] During the above process, resource contention occurs when the system load is high and resource waste occurs when the load is low, resulting in low data compression efficiency. Summary of the Invention
[0005] This application provides a data compression method, electronic device, and program product to solve the technical problem of low data compression efficiency in related technologies.
[0006] In a first aspect, this application provides a data compression method applied to a target system, the method comprising:
[0007] Obtain the current and predicted load status of the target system;
[0008] Determine the upper limit of the current number of aggregates based on the current load status and the predicted load status;
[0009] Based on the current maximum aggregation number, the multiple data blocks to be processed are divided into at least one data block group, and the number of data blocks contained in the data block group is no greater than the current maximum aggregation number.
[0010] The data block groups are merged and compressed to obtain compressed data, which is then persistently stored.
[0011] In this way, by introducing a dynamic adjustment mechanism, based on real-time perception and response of system load, the coordinated optimization of resource utilization and compression efficiency is achieved, solving the problem of rigid resource allocation caused by a fixed number of aggregates, while improving the elasticity and adaptability of the storage system and increasing data compression efficiency.
[0012] Optionally, the method described above divides the multiple data blocks to be processed into at least one data block group based on the current upper limit of the aggregation number, including:
[0013] Retrieves the historical aggregate count cap values used before the current aggregate count cap value took effect;
[0014] Determine if the current maximum number of aggregates is greater than the historical maximum number of aggregates;
[0015] If so, based on the current upper limit of the number of aggregates, the multiple data blocks to be processed are divided into at least one data block group by using at least one of the following algorithms: new block aggregation algorithm, append aggregation algorithm, and merge rewrite algorithm;
[0016] If not, based on the current upper limit of the number of aggregates, the multiple data blocks to be processed are divided into at least one data block group using a new block aggregation algorithm.
[0017] In this way, by comparing the current and historical upper limits of the number of aggregates, optimization strategies such as appending and merging rewriting are flexibly enabled when the system load decreases to improve the compression ratio, while only basic new block aggregation is used when the load increases or remains the same to ensure processing efficiency. Thus, an adaptive optimal balance between processing performance and storage efficiency is achieved throughout the entire cycle of system load changes, thereby improving data compression efficiency.
[0018] Optionally, as described above, based on the current upper limit of the aggregation number, a new block aggregation algorithm is used to divide the multiple data blocks to be processed into at least one data block group, including:
[0019] From multiple data blocks to be processed, extract multiple data blocks with matching data features to form an initial data set;
[0020] Using the current maximum aggregation value as the grouping threshold, multiple data blocks in the initial dataset are divided into at least one data block group.
[0021] In this way, by prioritizing the clustering of data with similar content and reasonably grouping them within the limits of system resources, the data redundancy within each group is significantly improved, thereby increasing data compression efficiency.
[0022] Optionally, the method described above, based on the current upper limit of the aggregation count, uses an append aggregation algorithm to divide the multiple data blocks to be processed into at least one data block group, including:
[0023] Identify at least one first historical data block group, which is a data block group that has been compressed and persisted according to the upper limit of the historical aggregation number;
[0024] For any given first historical data block group, the first historical data block group is decompressed to obtain multiple original data blocks;
[0025] At least one first data block is determined from the multiple data blocks to be processed, wherein the first data block is a data block that matches the features of the multiple original data blocks;
[0026] Multiple original data blocks and at least one first data block are merged to form a target data block set, and the number of data blocks corresponding to the target data block set is no greater than the current aggregation limit.
[0027] The target data block set corresponding to at least one first historical data block group is determined as at least one data block group.
[0028] In this way, when the system load decreases, the overall data compression rate and storage space utilization are significantly improved without significantly increasing processing overhead by dynamically recombining the compressed small aggregate blocks with newly added similar data blocks into large aggregate blocks that meet the new upper limit, thus improving data compression efficiency.
[0029] Optionally, as described above, based on the current upper limit of the aggregation number, a merge-rewrite algorithm is used to divide the multiple data blocks to be processed into at least one data block group, including:
[0030] Identify at least two second historical data block groups, which are data block groups that have been compressed and persisted according to the upper limit of the historical aggregation number and whose features match;
[0031] Decompress at least two groups of second historical data blocks to obtain multiple historical data blocks;
[0032] Multiple historical data blocks are merged to form a merged rewritten data set;
[0033] Using the current maximum aggregation value as the grouping threshold, multiple historical data blocks in the rewritten dataset are regrouped to obtain at least one data block group.
[0034] In this way, by merging and reorganizing multiple small-scale historical compressed blocks that match the current higher aggregation limit into a larger data block group, the storage structure is proactively optimized during periods of low system load. This significantly reduces the metadata and compression redundancy overhead caused by historical small-scale compression, thereby further improving the overall storage space utilization and data compression efficiency.
[0035] Optionally, the method described above obtains the current load state and predicted load state of the target system, including:
[0036] Monitor at least one resource utilization metric of the target system, including one or more of processing utilization, memory utilization, throughput or network bandwidth utilization;
[0037] Determine the current load status of the target system based on at least one resource utilization indicator;
[0038] Obtain the historical load status of the target system;
[0039] Based on the historical and current load status of the target system, the load status for a preset future period is predicted using a time series forecasting model, thus obtaining the predicted load status.
[0040] In this way, by combining real-time resource monitoring with load prediction based on historical data, the future pressure on the system can be judged, thus providing a more accurate and timely decision-making basis for the dynamic adjustment of the upper limit of the aggregation number. This effectively avoids the strategy lag or frequent fluctuations that may be caused by relying solely on the current load, and enhances the adaptability and stability of the entire compression system.
[0041] Optionally, the method described above determines the current upper limit of the aggregation count based on the current load status and the predicted load status, including:
[0042] Determine the load status value of the target system based on the current load status and the predicted load status;
[0043] Obtain the first mapping relationship, which includes multiple load preset ranges and the upper limit preset value of the aggregation number corresponding to each load preset range;
[0044] Based on the load status value and the first mapping relationship, determine the current upper limit of the aggregation number.
[0045] In this way, by quantifying the overall load status into specific values and quickly and deterministically determining the upper limit of the aggregation scale based on the preset quantization mapping relationship, the consistency of resource allocation and storage optimization behavior of the system under different load conditions is ensured.
[0046] Secondly, this application provides a data compression apparatus for use in a target system, the apparatus comprising:
[0047] The acquisition module is used to acquire the current load status and predicted load status of the target system;
[0048] The determination module is used to determine the upper limit of the current aggregation number based on the current load status and the predicted load status;
[0049] The partitioning module is used to divide multiple data blocks to be processed into at least one data block group according to the current upper limit of the aggregation number. The number of data blocks contained in the data block group is not greater than the current upper limit of the aggregation number.
[0050] The compression module is used to perform merging and compression processing on each data block group to obtain compressed data, and then persistently store the compressed data.
[0051] Optionally, in the above-described device, the modules are specifically used for:
[0052] Retrieves the historical aggregate count cap values used before the current aggregate count cap value took effect;
[0053] Determine if the current maximum number of aggregates is greater than the historical maximum number of aggregates;
[0054] If so, based on the current upper limit of the number of aggregates, the multiple data blocks to be processed are divided into at least one data block group by using at least one of the following algorithms: new block aggregation algorithm, append aggregation algorithm, and merge rewrite algorithm;
[0055] If not, based on the current upper limit of the number of aggregates, the multiple data blocks to be processed are divided into at least one data block group using a new block aggregation algorithm.
[0056] Optionally, in the above-described device, the modules are specifically used for:
[0057] From multiple data blocks to be processed, extract multiple data blocks with matching data features to form an initial data set;
[0058] Using the current maximum aggregation value as the grouping threshold, multiple data blocks in the initial dataset are divided into at least one data block group.
[0059] Optionally, in the above-described device, the modules are specifically used for:
[0060] Identify at least one first historical data block group, which is a data block group that has been compressed and persisted according to the upper limit of the historical aggregation number;
[0061] For any given first historical data block group, the first historical data block group is decompressed to obtain multiple original data blocks;
[0062] At least one first data block is determined from the multiple data blocks to be processed, wherein the first data block is a data block that matches the features of the multiple original data blocks;
[0063] Multiple original data blocks and at least one first data block are merged to form a target data block set, and the number of data blocks corresponding to the target data block set is no greater than the current aggregation limit.
[0064] The target data block set corresponding to at least one first historical data block group is determined as at least one data block group.
[0065] Optionally, in the above-described device, the modules are specifically used for:
[0066] Identify at least two second historical data block groups, which are data block groups that have been compressed and persisted according to the upper limit of the historical aggregation number and whose features match;
[0067] Decompress at least two groups of second historical data blocks to obtain multiple historical data blocks;
[0068] Multiple historical data blocks are merged to form a merged rewritten data set;
[0069] Using the current maximum aggregation value as the grouping threshold, multiple historical data blocks in the rewritten dataset are regrouped to obtain at least one data block group.
[0070] Optionally, in the above-described apparatus, the acquisition module is specifically used for:
[0071] Monitor at least one resource utilization metric of the target system, including one or more of processing utilization, memory utilization, throughput or network bandwidth utilization;
[0072] Determine the current load status of the target system based on at least one resource utilization indicator;
[0073] Obtain the historical load status of the target system;
[0074] Based on the historical and current load status of the target system, the load status for a preset future period is predicted using a time series forecasting model, thus obtaining the predicted load status.
[0075] Optionally, the above-described device determines that the module is specifically used for:
[0076] Determine the load status value of the target system based on the current load status and the predicted load status;
[0077] Obtain the first mapping relationship, which includes multiple load preset ranges and the upper limit preset value of the aggregation number corresponding to each load preset range;
[0078] Based on the load status value and the first mapping relationship, determine the current upper limit of the aggregation number.
[0079] Thirdly, this application provides an electronic device, including: a processor, and a memory communicatively connected to the processor;
[0080] The memory stores the instructions that the computer executes;
[0081] The processor executes computer-executable instructions stored in memory to implement any of the methods of the first aspect.
[0082] Fourthly, this application provides a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, are used to implement the method as described in any of the first aspects.
[0083] Fifthly, this application provides a computer program product, including a computer program that, when executed by a computer, implements the method as described in any of the first aspects.
[0084] The data compression method, electronic device, and program product provided in this application obtain the current and predicted load status of the target system; determine the upper limit of the current aggregation number based on the current and predicted load status; divide the multiple data blocks to be processed into at least one data block group based on the upper limit of the current aggregation number, wherein the number of data blocks contained in each data block group does not exceed the upper limit of the current aggregation number; perform merging and compression processing on each data block group to obtain compressed data, and persistently store the compressed data. In this way, by introducing a dynamic adjustment mechanism based on real-time perception and response of system load, it achieves coordinated optimization of resource utilization and compression efficiency, solves the problem of rigid resource allocation caused by a fixed aggregation number, and improves the elasticity and adaptability of the storage system, thereby increasing data compression efficiency. Attached Figure Description
[0085] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.
[0086] Figure 1 This is a structural diagram illustrating an application scenario provided in an embodiment of this application.
[0087] Figure 2 A schematic flowchart illustrating a data compression method provided in an embodiment of this application;
[0088] Figure 3 A flowchart illustrating another data compression method provided in an embodiment of this application;
[0089] Figure 4 A flowchart illustrating another data compression method provided in an embodiment of this application;
[0090] Figure 5 A flowchart illustrating another data compression method provided in this application embodiment;
[0091] Figure 6 A flowchart illustrating another data compression method provided in an embodiment of this application;
[0092] Figure 7 This is a schematic diagram of the structure of a data compression device provided in an embodiment of this application;
[0093] Figure 8 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application.
[0094] The accompanying drawings illustrate specific embodiments of this application, which will be described in more detail below. These drawings and descriptions are not intended to limit the scope of the concept in any way, but rather to illustrate the concept of this application to those skilled in the art through reference to particular embodiments. Detailed Implementation
[0095] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this application as detailed in the appended claims.
[0096] It should be noted that although the terms "first," "second," etc., are used to describe various types of information in the embodiments of this application, this information should not be limited to these terms. These terms are only used to distinguish information of the same type from each other. Optionally, without departing from the scope of this application, first information may also be referred to as second information, and similarly, second information may also be referred to as first information.
[0097] It should be understood that the terms "comprising" or "including" indicate the presence of the previously mentioned features, steps, or operations, but do not preclude the presence, occurrence, or addition of one or more other features, steps, or operations. The terms "and / or," etc., used in this application can be interpreted as inclusive, or mean any one or any combination thereof. Optionally, "A and / or B" means "any one of the following: A; B; A and B." Additionally, the character " / " generally indicates that the preceding and following objects are in an "or" relationship.
[0098] With the acceleration of digitalization, the amount of data generated in data centers, cloud computing platforms, and big data application scenarios is growing exponentially. Taking enterprise-level storage systems as an example, users need to process hundreds of millions of data blocks every day, and these data blocks usually need to be compressed to reduce their space usage during storage.
[0099] The relevant compression technology has significant bottlenecks: when the system is under high load, the background merging and compression operation may preempt the foreground business resources, resulting in response delay or system performance degradation; while when the system load is low, the fixed merging and compression parameters cannot make full use of idle resources, resulting in resource waste and a decrease in compression rate.
[0100] During the above process, resource contention occurs when the system load is high and resource waste occurs when the load is low, resulting in low data compression efficiency.
[0101] To address the aforementioned technical issues, this application provides a data compression method. This method dynamically determines the upper limit of the aggregation number based on the current and predicted load of the target system. By rationally dividing data block groups and performing merge compression, it can increase the aggregation scale to optimize compression efficiency and significantly reduce data storage space when the system load is controllable. It can also flexibly adjust the aggregation strategy to adapt to load fluctuations, avoiding excessive system resource consumption during compression that could affect the normal operation of the target system. Simultaneously, it achieves persistent storage of compressed data to ensure data reliability, thus achieving a dual improvement in storage optimization and system performance stability, thereby increasing data compression efficiency.
[0102] Below, in conjunction with Figure 1 This section provides examples of application scenarios for data compression methods.
[0103] Figure 1 This is a structural diagram illustrating an application scenario provided in an embodiment of this application. Please refer to [link / reference]. Figure 1 The core component of this scenario is the target system.
[0104] The target system may include at least a data storage unit, a load monitoring unit, a policy scheduling unit, and a data compression unit.
[0105] The various units work together to support efficient compression and storage management of massive amounts of data. The target system is applicable to business scenarios that require a balance between storage costs and system stability, such as distributed storage, log archiving, and big data backup.
[0106] In such scenarios, the system needs to continuously receive scattered data blocks pushed by upstream business modules. It must ensure the secure persistence of data, minimize storage usage, and avoid consuming too much computing power and bandwidth resources during the data compression process, so as not to affect the normal operation of upstream businesses.
[0107] To more clearly demonstrate the practical application process of the data compression method of this application, the following explanation is based on a typical scenario of target system load reduction:
[0108] When the load monitoring unit monitors the system load and it drops from a high level to a reasonable range, the policy scheduling unit can adjust the data aggregation rules, increasing the upper limit of the number of data blocks aggregated in a single instance from the original standard, such as 16, to a higher specification, such as 32. At this time, the system can use append aggregation to perform collaborative compression processing on historical stored data and newly added data. The specific operation process is as follows:
[0109] Before the strategy adjustment, a large number of compressed historical data packets were already stored in the data storage unit. These historical data packets were compressed and persistently stored according to the original aggregation standard. Each historical data packet corresponds to 16 original data blocks. After compression, the storage space occupied has been greatly reduced.
[0110] When the system load decreases and the aggregation specification is adjusted to 32, the data compression unit can actively retrieve historical compressed data packets from the data storage unit and restore the original 16 data blocks through decompression operations, ensuring data integrity and losslessness, and preparing for subsequent merging processing with newly added data.
[0111] Subsequently, the data compression unit selects data from the newly added scattered data blocks pushed from upstream that matches the attributes of the restored historical data. It selects 16 matching new data blocks and merges them with the 16 restored historical data blocks to form a complete data set containing 32 data blocks, which just matches the adjusted aggregation specifications.
[0112] After the data merging is completed, the data compression unit performs integrated merging and compression processing on the set of 32 data blocks to generate a highly compressed data file, which is then pushed to the data storage unit for persistent storage.
[0113] This concludes a complete append aggregation and compression process. The process will continue to cycle through the remaining data to process more historical and new data.
[0114] The above application process fully demonstrates the scenario adaptability of the compression method in this application. When the system load is idle, historical data is reused and new data is integrated through append aggregation, increasing the data scale of a single compression. This not only further reduces the overall storage footprint but also efficiently utilizes idle system resources, avoiding wasted computing power. Simultaneously, the entire process does not interrupt upstream business data pushes and does not affect system stability due to excessive resource consumption during compression operations. It achieves dual protection of data storage optimization and stable business operation, adapting to the actual operational needs of various massive data processing scenarios.
[0115] The technical solution of this application and how the technical solution of this application solves the above-mentioned technical problems are described in detail below with specific embodiments. These specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments. The embodiments of this application will now be described with reference to the accompanying drawings.
[0116] Figure 2 This is a flowchart illustrating a data compression method provided in an embodiment of this application. The execution entity in this embodiment can be the target system or a processor within the target system. The processor can be implemented in software or a combination of software and hardware. Please refer to... Figure 2 The method includes:
[0117] S201. Obtain the current load status and predicted load status of the target system.
[0118] Current load status can be a quantitative description of the actual utilization and busyness of hardware resources such as computing, memory, and input / output of the target system within a preset time period.
[0119] Predicting load status can refer to the prediction of system load trends over a future period based on historical operating data and analytical models.
[0120] Multiple performance metrics can be periodically collected through system monitoring tools, aggregated and standardized to determine the current load status, and the predictive analysis module can be invoked to read the time series of historical load data, perform calculations using predictive algorithms, and output the predicted load status.
[0121] Alternatively, system load metrics can be obtained directly by calling the monitoring application interface provided by an external cloud platform or container orchestration engine, without the need to deploy a monitoring agent inside the system.
[0122] Optionally, the current load status and predicted load status of the target system can be obtained by: monitoring at least one resource utilization indicator of the target system; determining the current load status of the target system based on at least one resource utilization indicator; obtaining the historical load status of the target system; and predicting the load status for a preset future period using a time series prediction model based on the historical load status and the current load status of the target system, thereby obtaining the predicted load status.
[0123] Resource utilization indicators include one or more of the following: processing utilization, memory utilization, throughput, or network bandwidth utilization.
[0124] Resource utilization rate indicators can refer to the measurement parameters used to quantify the busyness and efficiency of various core resources in a system.
[0125] Historical load status can refer to a sequence of system load status data recorded and saved at a series of consecutive points in the past.
[0126] Time series forecasting models can refer to machine learning models used to analyze and predict future trends of data sequences arranged in chronological order.
[0127] S202. Determine the upper limit of the current aggregation number based on the current load status and the predicted load status.
[0128] The current maximum aggregation limit can refer to the maximum number of data blocks that are allowed to be combined into the same compression unit during the execution of this data compression task.
[0129] Optionally, the current load status and the predicted load status can be used as joint input conditions and input into the aggregate upper limit prediction model to obtain the current aggregate upper limit value.
[0130] Among them, the upper limit prediction model for the aggregation number can be a pre-trained model.
[0131] Optionally, the current scenario can be determined, and the weight values corresponding to the current load state and the predicted load state can be determined based on the current scenario. The load state value can be determined based on the current load state, the predicted load state, and the weight values corresponding to the current load state and the predicted load state. The upper limit of the current aggregation number can be determined based on the load state value.
[0132] Optionally, the current upper limit of the aggregation number can be determined based on the current load status and the predicted load status in the following manner: determine the load status value of the target system based on the current load status and the predicted load status; obtain a first mapping relationship, which includes multiple load preset ranges and a preset upper limit value of the aggregation number corresponding to each load preset range; and determine the current upper limit value of the aggregation number based on the load status value and the first mapping relationship.
[0133] Among them, the load status value can refer to a quantitative value used to comprehensively characterize the overall pressure level of the system.
[0134] The first mapping relationship can refer to a predefined, systematic table of correspondence rules that clarifies the correspondence between different system load levels and the upper limit of the aggregation number to be adopted.
[0135] The load preset range can refer to multiple continuous or discontinuous numerical intervals that are pre-divided for the load state value in the first mapping relationship.
[0136] The maximum aggregate count preset value can refer to the maximum aggregate count value that uniquely corresponds to each load preset range.
[0137] The current load status and the predicted load status can be used as inputs to calculate the load using a preset formula. This formula can be a weighted sum or a rule-based logical combination, and there are no restrictions on that.
[0138] The calculated load status value can be compared with each load preset range in the first mapping table to determine which interval it falls into. Then, the upper limit preset value of the aggregation number corresponding to that interval can be directly selected as the current upper limit value of the aggregation number for this compression task.
[0139] Below, in conjunction with Figure 3 The process of determining the current upper limit of the aggregation number based on the load status value and the first mapping relationship is illustrated with an example.
[0140] Figure 3 This is a schematic diagram illustrating a process for dynamically adjusting the upper limit of the aggregation number, as provided in an embodiment of this application. Please refer to... Figure 3 .
[0141] The current load and the predicted load can be weighted and merged to obtain the final load status value.
[0142] For example, if the current CPU utilization is 75% and the query concurrency is 2000 QPS, and the load is predicted to rise to 80% in the next 10 minutes, the calculated load status value falls within the high load preset range; if the current CPU utilization is 30% and the query concurrency is 500 QPS, and the load is predicted to remain stable, the load status value falls within the low load preset range; if the load status value is in the middle range, it corresponds to the normal load preset range.
[0143] You can invoke the pre-stored first mapping relationship; see [link / reference]. Figure 3 The relationship can be configured as follows: a high load preset range (load status value ≥ 70) corresponds to an upper limit preset value of 8 for the number of aggregates; a normal load preset range (30 < load status value < 70) corresponds to an upper limit preset value of 16 for the number of aggregates; and a low load preset range (load status value ≤ 30) corresponds to an upper limit preset value of 32 for the number of aggregates. The quantified load status value is matched with the above ranges one by one to determine the load preset interval to which it belongs.
[0144] Based on the matching results, the upper limit of the aggregation count for the corresponding load preset range is determined as the current upper limit of the aggregation count. For example, when the load status value is 75 (high load range), the current upper limit of the aggregation count is 8; when the load status value is 50 (normal load range), the current upper limit of the aggregation count is 16; when the load status value is 25 (low load range), the current upper limit of the aggregation count is 32.
[0145] The system continuously collects load data in real time and repeats the above process to achieve dynamic updates of the upper limit of the aggregated number.
[0146] When the system load drops from high load to normal load, the maximum aggregation count is automatically adjusted from 8 to 16; when the load drops from normal load to low load, the maximum aggregation count is simultaneously increased to 32, and vice versa, ensuring that the maximum aggregation count always matches the system load status, taking into account both system performance and storage efficiency.
[0147] When the upper limit of the aggregation number is reduced, there is no need to actively modify the compressed data blocks. Newly written data will generate aggregation blocks according to the new small aggregation number. The compressed data blocks will be naturally eliminated over time, and the data block update will be completed gradually, achieving a smooth transition.
[0148] S203. Based on the current upper limit of the aggregation number, divide the multiple data blocks to be processed into at least one data block group.
[0149] The number of data blocks contained in a data block group is no greater than the current maximum number of aggregates.
[0150] A data block can refer to a basic data unit with a fixed or variable size.
[0151] A data block can be the smallest object that is compressed.
[0152] A data block group can refer to a set of data blocks that are divided together according to certain rules and will be compressed together.
[0153] Feature extraction can be performed on each data block to be compressed to obtain the data features corresponding to each data block. Based on the data features corresponding to each data block, all data blocks are classified, and data blocks with the same data features or a data feature matching degree greater than or equal to a threshold are grouped into the same temporary set. For any temporary set, the upper limit of the current aggregation number is used as the upper limit of the number of data blocks in each group, and the data blocks in the set are divided to obtain at least one data block group.
[0154] The partitioning process can be done using a simple sequential slicing method, or a more optimized algorithm can be used to allocate the most similar data blocks in the same group, provided that the current aggregation limit is not exceeded, in order to improve the subsequent compression efficiency.
[0155] Optionally, the multiple data blocks to be processed can be divided into at least one data block group based on the current upper limit of the aggregation number as follows: obtain the historical upper limit of the aggregation number used before the current upper limit of the aggregation number takes effect; determine whether the current upper limit of the aggregation number is greater than the historical upper limit of the aggregation number; if so, divide the multiple data blocks to be processed into at least one data block group based on the current upper limit of the aggregation number using at least one of the following algorithms: new block aggregation algorithm, append aggregation algorithm, and merge rewrite algorithm; if not, divide the multiple data blocks to be processed into at least one data block group based on the current upper limit of the aggregation number using the new block aggregation algorithm.
[0156] Specifically, when the current upper limit of the aggregation number is greater than the historical upper limit of the aggregation number, the newly arrived data blocks and the historical data blocks can be divided to generate at least one data block group with a larger aggregation number. When the current upper limit of the aggregation number is less than or equal to the historical upper limit of the aggregation number, the newly arrived data blocks can be divided to generate at least one data block group with a smaller aggregation number.
[0157] The partitioning operation performed by the new block aggregation algorithm is only for newly generated data blocks that have not yet been compressed in this round.
[0158] An append aggregation algorithm can be to select a group of data blocks that have been previously compressed and stored according to the historical aggregation limit, decompress them to restore the original data blocks, and then mix them with newly arrived data blocks that have the same data characteristics or whose data characteristics match the threshold. Together they are then re-divided and grouped according to the current aggregation limit.
[0159] The merge rewrite algorithm can select two or more historical data block groups that have been compressed and stored and have the same data characteristics or a data characteristic matching degree greater than or equal to a threshold, decompress the above historical data block groups respectively, merge all their original data blocks, and redivide them according to the current upper limit of the aggregation number.
[0160] S204. Perform merging and compression processing on each data block group to obtain compressed data, and persist the compressed data for storage.
[0161] Merging can refer to the process of logically or physically combining multiple independent data blocks within the same data block group into a larger, continuous data whole.
[0162] Compression can refer to applying specific data compression algorithms to the merged big data as a whole to remove redundant data information, thereby reducing the physical storage space it ultimately occupies.
[0163] Persistent storage refers to writing compressed data to non-volatile storage devices or distributed storage systems to achieve long-term data preservation.
[0164] For any group of data blocks, the compression engine can sequentially concatenate all the data blocks in the group into a continuous data buffer, call the pre-selected compression algorithm to compress the buffer, generate compressed data, write the compressed data to the backend persistent storage device, and establish a correspondence between the compressed data and the original data blocks in the system's metadata index.
[0165] The compression process can employ two-stage compression: first, a fast, lightweight compression is performed on each independent data block within the group, and then a deep compression is performed on the merged whole to balance compression speed and compression ratio.
[0166] When performing persistent storage, data blocks can be stored on media with different performance levels, depending on their characteristics or compression strategies.
[0167] Before compression, the order of data blocks within a data block group can be optimized and rearranged to make the most similar data blocks physically adjacent, thereby improving the detection efficiency of the compression algorithm.
[0168] The data compression method provided in this embodiment obtains the current and predicted load status of the target system; determines the upper limit of the current aggregation number based on the current and predicted load status; divides the multiple data blocks to be processed into at least one data block group based on the upper limit of the current aggregation number, wherein the number of data blocks contained in each data block group does not exceed the upper limit of the current aggregation number; performs merging and compression processing on each data block group to obtain compressed data, and persistently stores the compressed data. In this way, by introducing a dynamic adjustment mechanism based on real-time perception and response of system load, it achieves coordinated optimization of resource utilization and compression efficiency, solves the problem of rigid resource allocation caused by a fixed aggregation number, and improves the elastic adaptability of the storage system, thereby increasing data compression efficiency.
[0169] Below, in conjunction with Figure 4 The process of dividing multiple data blocks to be processed into at least one data block group based on the current upper limit of the aggregation number and using a new block aggregation algorithm is explained.
[0170] Figure 4 This is a schematic flowchart illustrating another data compression method provided in an embodiment of this application. Based on the above embodiments, see also... Figure 4 The method includes:
[0171] S401. Extract multiple data blocks with matching data features from multiple data blocks to be processed to form an initial data set.
[0172] Data features can refer to the identifying information extracted from the content of a data block to characterize its content attributes.
[0173] Data features can be generated using hash functions or feature extraction algorithms.
[0174] Data feature matching can refer to the process by which two or more data blocks are determined to have the same features or a data feature matching degree greater than or equal to a threshold by comparing their data features.
[0175] The initial dataset can refer to a temporary logical set composed of multiple data blocks that are highly similar in content, after feature matching and filtering.
[0176] For each data block to be processed, its data features can be calculated. The feature values of all data blocks can be compared. Data blocks with the same feature value or falling within the preset similarity threshold range are judged as data feature matches. All mutually matching data blocks are classified into the same logical container to obtain the initial data set.
[0177] S402. Using the current maximum aggregation value as the grouping threshold, divide multiple data blocks in the initial data set into at least one data block group.
[0178] The grouping threshold can be a hard constraint on the number of data blocks when dividing them into groups.
[0179] A data block group can refer to the final determined data block units that will be merged and compressed together.
[0180] The current maximum number of aggregates can be used as the grouping threshold, and multiple data blocks in the initial dataset can be divided into at least one data block group through a partitioning strategy.
[0181] The partitioning strategy can be a simple sequential partitioning or a preset partitioning algorithm. Under the premise of not exceeding a threshold, the data blocks with the highest content similarity are grouped together to maximize the redundancy of data within the group, thereby improving the efficiency of the subsequent compression stage.
[0182] After partitioning, the initial data set is transformed into one or more data block groups ready to enter the compression process.
[0183] The implementation details of each step in this application embodiment can be found in the description of the corresponding steps or operations in the above method embodiments; repeated content will not be repeated.
[0184] The data compression method provided in this embodiment extracts multiple data blocks with matching data features from multiple data blocks to be processed, forming an initial data set; using the current upper limit of the aggregation number as a grouping threshold, the multiple data blocks in the initial data set are divided into at least one data block group. In this way, by preferentially clustering data with similar content and reasonably grouping them within the range allowed by system resources, the data redundancy within each group is significantly improved, thereby improving data compression efficiency.
[0185] Below, in conjunction with Figure 5 The process of dividing multiple data blocks to be processed into at least one data block group by using an append aggregation algorithm based on the current upper limit of the aggregation number is explained.
[0186] Figure 5 This is a flowchart illustrating another data compression method provided in an embodiment of this application. Based on the above embodiments, see also... Figure 5 The method includes:
[0187] S501. Determine at least one first historical data block group.
[0188] The first historical data block group is the data block group that has been compressed and persisted according to the upper limit of the historical aggregation number.
[0189] The first historical data block group can refer to a complete data unit that has been compressed and stored in a past compression cycle according to the then-effective upper limit of the aggregation number.
[0190] The historical aggregate limit can refer to the old aggregate limit that was in effect before the current aggregate limit was determined.
[0191] Identify at least one first historical data block group in the storage space.
[0192] S502. For any first historical data block group, decompress the first historical data block group to obtain multiple original data blocks.
[0193] Decompression refers to the process of applying a decompression algorithm corresponding to the compression process to restore the compressed data block group to the original data block sequence before compression.
[0194] For each first historical data block group, the decompression engine can be invoked based on the metadata corresponding to the first historical data block group to completely decompress the group of data. After decompression, the multiple original data blocks contained in the data block group before the initial compression can be restored.
[0195] S503. Determine at least one first data block among the multiple data blocks to be processed.
[0196] The first data block is a data block that matches the features of multiple original data blocks.
[0197] The first data block can refer to a specific data block that is selected from the current new data blocks to be processed and used to merge with the historical original data blocks.
[0198] Feature matching can refer to the same features as the original historical data block or a data feature matching degree greater than or equal to a threshold.
[0199] The characteristics of a group of historical data blocks can be determined, and based on the characteristics of the group of historical data blocks, at least one first data block with the same characteristics or a data feature matching degree greater than or equal to a threshold can be determined from multiple data blocks to be processed.
[0200] S504. Merge multiple original data blocks and at least one first data block to form a target data block set.
[0201] The number of data blocks corresponding to the target data block set is no greater than the current upper limit of the aggregation number.
[0202] Based on the current upper limit of the aggregation number, multiple original data blocks and at least one first data block can be merged to form a target data block set.
[0203] S505. Determine at least one set of target data blocks corresponding to at least one first historical data block group as at least one data block group.
[0204] The target data block set corresponding to each first historical data block group is determined as the corresponding data block group.
[0205] The implementation details of each step in this application embodiment can be found in the description of the corresponding steps or operations in the above method embodiments; repeated content will not be repeated.
[0206] The data compression method provided in this embodiment determines at least one first historical data block group, which is a group of data blocks that have been compressed and persisted according to the upper limit of the historical aggregation number. For any one first historical data block group, the first historical data block group is decompressed to obtain multiple original data blocks. Among the multiple data blocks to be processed, at least one first data block is determined, which is a data block that matches the characteristics of the multiple original data blocks. The multiple original data blocks and at least one first data block are merged to form a target data block set, and the number of data blocks corresponding to the target data block set is not greater than the current upper limit of the aggregation number. The target data block sets corresponding to at least one first historical data block group are determined as at least one data block group. In this way, when the system load decreases, by dynamically recombining the compressed small aggregation blocks with newly added similar data blocks into large aggregation blocks that meet the new upper limit, the overall data compression rate and storage space utilization are significantly improved without significantly increasing processing overhead, thus improving data compression efficiency.
[0207] Below, in conjunction with Figure 6 The process of dividing multiple data blocks to be processed into at least one data block group based on the current upper limit of the aggregation number and through a merge rewrite algorithm is explained.
[0208] Figure 6 A schematic flowchart illustrating another data compression method provided in this application embodiment. Based on the above embodiments, see [link to relevant documentation]. Figure 6 The method includes:
[0209] S601. Identify at least two second historical data block groups.
[0210] At least two second historical data block groups are data block groups that have been compressed and persisted according to the upper limit of the historical aggregation number and whose features match.
[0211] The second historical data block group can refer to the feature-matched compressed data units generated and stored during the historical compression cycle.
[0212] Identify at least two groups of second historical data blocks in the storage space.
[0213] S602. Decompress at least two second historical data block groups to obtain multiple historical data blocks.
[0214] Historical data blocks can refer to the original, independent data blocks that are recovered from historical data block groups before compression.
[0215] Each second historical data block group can be read sequentially, and the corresponding decompression routine can be called according to its stored compression algorithm metadata to completely decompress it. After all the selected second historical data block groups have been decompressed, the original data blocks that are released from different groups and match the characteristics are collected to obtain multiple historical data blocks.
[0216] S603. Merge multiple historical data blocks to form a merged rewritten data set.
[0217] All historical data blocks decompressed from each of the second historical data block groups can be placed into the same buffer to obtain a merged rewritten data set.
[0218] S604. Using the current maximum aggregation value as the grouping threshold, regroup multiple historical data blocks in the rewritten data set to obtain at least one data block group.
[0219] The current maximum aggregation value can be used as the grouping threshold, and multiple historical data blocks in the rewritten dataset can be regrouped using a grouping algorithm to obtain at least one data block group.
[0220] Among them, the grouping algorithm can ensure that the number of data blocks in each new group does not exceed the group threshold as a hard constraint. When grouping, you can simply perform sequential splitting, or you can use an algorithm to allocate the data blocks with the most similar content and the greatest compression potential to the same group, provided that the threshold is not exceeded.
[0221] The data compression method provided in this embodiment determines at least two second historical data block groups, which are data block groups that have been compressed and persisted according to the upper limit of historical aggregation and have matching features; decompresses the at least two second historical data block groups to obtain multiple historical data blocks; merges the multiple historical data blocks to form a merged rewritten data set; and regroups the multiple historical data blocks in the rewritten data set using the current upper limit of aggregation as a grouping threshold to obtain at least one data block group. In this way, by merging and reorganizing multiple small-scale historical compressed blocks with matching features into a larger data block group that meets the current higher aggregation limit, the storage structure is proactively optimized during periods of low system load. This significantly reduces metadata and compression redundancy overhead caused by small-scale historical compression, thereby further improving overall storage space utilization and data compression efficiency.
[0222] Figure 7 This is a schematic diagram of a data compression device provided in an embodiment of this application. Please refer to... Figure 7 The data compression device 700 includes an acquisition module 701, a determination module 702, a division module 703, and a compression module 704, wherein...
[0223] The acquisition module 701 is used to acquire the current load status and predicted load status of the target system;
[0224] The determination module 702 is used to determine the upper limit of the current aggregation number based on the current load status and the predicted load status;
[0225] The partitioning module 703 is used to divide the multiple data blocks to be processed into at least one data block group according to the current upper limit of the aggregation number, wherein the number of data blocks contained in the data block group is not greater than the current upper limit of the aggregation number;
[0226] Compression module 704 is used to perform merging and compression processing on each data block group to obtain compressed data, and to persistently store the compressed data.
[0227] Optionally, in the above-described apparatus, the partition module 703 is specifically used for:
[0228] Retrieves the historical aggregate count cap values used before the current aggregate count cap value took effect;
[0229] Determine if the current maximum number of aggregates is greater than the historical maximum number of aggregates;
[0230] If so, based on the current upper limit of the number of aggregates, the multiple data blocks to be processed are divided into at least one data block group by using at least one of the following algorithms: new block aggregation algorithm, append aggregation algorithm, and merge rewrite algorithm;
[0231] If not, based on the current upper limit of the number of aggregates, the multiple data blocks to be processed are divided into at least one data block group using a new block aggregation algorithm.
[0232] Optionally, in the above-described apparatus, the partition module 703 is specifically used for:
[0233] From multiple data blocks to be processed, extract multiple data blocks with matching data features to form an initial data set;
[0234] Using the current maximum aggregation value as the grouping threshold, multiple data blocks in the initial dataset are divided into at least one data block group.
[0235] Optionally, in the above-described apparatus, the partition module 703 is specifically used for:
[0236] Identify at least one first historical data block group, which is a data block group that has been compressed and persisted according to the upper limit of the historical aggregation number;
[0237] For any given first historical data block group, the first historical data block group is decompressed to obtain multiple original data blocks;
[0238] At least one first data block is determined from the multiple data blocks to be processed, wherein the first data block is a data block that matches the features of the multiple original data blocks;
[0239] Multiple original data blocks and at least one first data block are merged to form a target data block set, and the number of data blocks corresponding to the target data block set is no greater than the current aggregation limit.
[0240] The target data block set corresponding to at least one first historical data block group is determined as at least one data block group.
[0241] Optionally, in the above-described apparatus, the partition module 703 is specifically used for:
[0242] Identify at least two second historical data block groups, which are data block groups that have been compressed and persisted according to the upper limit of the historical aggregation number and whose features match;
[0243] Decompress at least two groups of second historical data blocks to obtain multiple historical data blocks;
[0244] Multiple historical data blocks are merged to form a merged rewritten data set;
[0245] Using the current maximum aggregation value as the grouping threshold, multiple historical data blocks in the rewritten dataset are regrouped to obtain at least one data block group.
[0246] Optionally, in the above-described apparatus, the acquisition module 701 is specifically used for:
[0247] Monitor at least one resource utilization metric of the target system, including one or more of processing utilization, memory utilization, throughput or network bandwidth utilization;
[0248] Determine the current load status of the target system based on at least one resource utilization indicator;
[0249] Obtain the historical load status of the target system;
[0250] Based on the historical and current load status of the target system, the load status for a preset future period is predicted using a time series forecasting model, thus obtaining the predicted load status.
[0251] Optionally, in the above-described apparatus, the determining module 702 is specifically used for:
[0252] Determine the load status value of the target system based on the current load status and the predicted load status;
[0253] Obtain the first mapping relationship, which includes multiple load preset ranges and the upper limit preset value of the aggregation number corresponding to each load preset range;
[0254] Based on the load status value and the first mapping relationship, determine the current upper limit of the aggregation number.
[0255] Figure 8 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Please refer to... Figure 8 The electronic device 800 may include: a memory 801, a processor 802, and a transceiver 803.
[0256] Memory 801 is used to store program instructions;
[0257] The processor 802 is used to execute the program instructions stored in the memory so that the electronic device 800 performs the above-described method.
[0258] Transceiver 803 may include a transmitter and / or a receiver. The transmitter may also be referred to as a transmitter, transmitter port, or transmitter interface, and the receiver may also be referred to as a receiver port, receiver interface, or similar descriptions. Exemplarily, memory 801, processor 802, and transceiver 803 are interconnected via bus 804.
[0259] This application also provides a computer program product that can be executed by a processor, and when the computer program product is executed, the above-described method can be implemented.
[0260] The data compression apparatus, electronic device, computer-readable storage medium, and computer program product of the embodiments of this application can execute the technical solutions shown in the above-described data compression method embodiments. Their implementation principles and beneficial effects are similar and will not be repeated here.
[0261] It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that this application is not limited to the described order of actions, as some steps may be performed in other orders or simultaneously according to this application. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are all optional embodiments, and the actions and modules involved are not necessarily essential to this application.
[0262] It should be further noted that although the steps in the flowchart are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowchart may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these sub-steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the sub-steps or stages of other steps.
[0263] It should be understood that the above-described device embodiments are merely illustrative, and the device of this application can also be implemented in other ways. For example, the division of units / modules in the above embodiments is only a logical functional division, and there may be other division methods in actual implementation. For example, multiple units, modules, or components may be combined, or integrated into another system, or some features may be ignored or not executed.
[0264] Furthermore, unless otherwise specified, the functional units / modules in the various embodiments of this application can be integrated into one unit / module, or each unit / module can exist physically separately, or two or more units / modules can be integrated together. The integrated units / modules described above can be implemented in hardware or as software program modules.
[0265] When integrated units / modules are implemented in hardware, the hardware can be digital circuits, analog circuits, etc. The physical implementation of the hardware structure includes, but is not limited to, transistors, memristors, etc. Unless otherwise specified, the processor can be any suitable hardware processor, such as a CPU, GPU, FPGA, DSP, and ASIC, etc. Unless otherwise specified, the storage unit can be any suitable magnetic or magneto-optical storage medium, such as Resistive Random Access Memory (RRAM), Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), Enhanced Dynamic Random Access Memory (EDRAM), High-Bandwidth Memory (HBM), Hybrid Memory Cube (HMC), etc.
[0266] If the integrated unit / module is implemented as a software program module and sold or used as an independent product, it can be stored in a computer-readable storage device (CMD). Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a memory and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned memory includes various media capable of storing program code, such as a USB flash drive, read-only memory (ROM), random access memory (RAM), portable hard drive, magnetic disk, or optical disk.
[0267] In the above embodiments, the descriptions of each embodiment have their own emphasis. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments. The technical features of the above embodiments can be combined arbitrarily. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as the combination of these technical features does not contradict each other, it should be considered within the scope of this specification.
[0268] Other embodiments of this application will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this application are indicated by the following claims.
[0269] It should be understood that this application is not limited to the precise structure described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this application is limited only by the appended claims.
Claims
1. A data compression method, characterized in that, Applied to a target system, the method includes: Obtain the current load status and predicted load status of the target system; Based on the current load status and the predicted load status, determine the upper limit of the current aggregation count; Based on the current upper limit of the aggregation number, the multiple data blocks to be processed are divided into at least one data block group, and the number of data blocks contained in the data block group is not greater than the current upper limit of the aggregation number; The data block groups are merged and compressed to obtain compressed data, which is then persistently stored.
2. The method according to claim 1, characterized in that, Based on the current upper limit of the aggregation number, the multiple data blocks to be processed are divided into at least one data block group, including: Retrieve the historical aggregation limit values used before the current aggregation limit value took effect; Determine whether the current upper limit of the number of aggregates is greater than the historical upper limit of the number of aggregates; If so, based on the current upper limit of the number of aggregates, the multiple data blocks to be processed are divided into at least one data block group by using at least one of the following algorithms: new block aggregation algorithm, append aggregation algorithm, and merge rewrite algorithm; If not, based on the current upper limit of the number of aggregates, the multiple data blocks to be processed are divided into at least one data block group using the new block aggregation algorithm.
3. The method according to claim 2, characterized in that, Based on the current upper limit of the aggregation number, the new block aggregation algorithm divides the multiple data blocks to be processed into at least one data block group, including: From the multiple data blocks to be processed, extract multiple data blocks with matching data features to form an initial data set; Using the current maximum aggregation value as the grouping threshold, the multiple data blocks in the initial data set are divided into at least one data block group.
4. The method according to claim 2, characterized in that, Based on the current upper limit of the aggregation count, the append aggregation algorithm divides the multiple data blocks to be processed into at least one data block group, including: Identify at least one first historical data block group, wherein the first historical data block group is a data block group that has been compressed and persisted according to the upper limit of the historical aggregation number; For any given first historical data block group, the first historical data block group is decompressed to obtain multiple original data blocks; At least one first data block is determined from a plurality of data blocks to be processed, wherein the first data block is a data block that matches the features of the plurality of original data blocks; The plurality of original data blocks and the at least one first data block are merged to form a target data block set, wherein the number of data blocks corresponding to the target data block set is not greater than the current aggregation limit value; The target data block set corresponding to the at least one first historical data block group is determined as at least one data block group.
5. The method according to claim 2, characterized in that, Based on the current upper limit of the aggregation number, the merge and rewrite algorithm divides the multiple data blocks to be processed into at least one data block group, including: Identify at least two second historical data block groups, wherein the at least two second historical data block groups are data block groups that have been compressed and persisted according to the upper limit of the historical aggregation number and whose features are matched; Decompress the at least two second historical data block groups to obtain multiple historical data blocks; The multiple historical data blocks are merged to form a merged rewritten data set; Using the current maximum aggregation value as the grouping threshold, multiple historical data blocks in the rewritten data set are regrouped to obtain at least one data block group.
6. The method according to any one of claims 1-5, characterized in that, Obtaining the current load state and predicted load state of the target system includes: Monitor at least one resource utilization metric of the target system, wherein the resource utilization metric includes one or more of processing utilization, memory utilization, throughput or network bandwidth utilization; The current load status of the target system is determined based on at least one of the resource utilization indicators. Obtain the historical load status of the target system; Based on the historical load status and the current load status of the target system, the load status for a preset future period is predicted using a time series prediction model to obtain the predicted load status.
7. The method according to any one of claims 1-5, characterized in that, Based on the current load status and the predicted load status, the upper limit of the current aggregation number is determined, including: The load state value of the target system is determined based on the current load state and the predicted load state. Obtain a first mapping relationship, which includes multiple load preset ranges and a preset value for the upper limit of the aggregation number corresponding to each load preset range; The current upper limit of the number of aggregates is determined based on the load status value and the first mapping relationship.
8. An electronic device, characterized in that, include: Memory, processor; The memory stores computer-executed instructions; The processor executes computer execution instructions stored in the memory, causing the processor to perform the method as described in any one of claims 1-7.
9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-executable instructions, which, when executed by a processor, are used to implement the method as described in any one of claims 1-7.
10. A computer program product, characterized in that, Includes a computer program that, when executed by a processor, implements the method described in any one of claims 1-7.