Data processing method and apparatus, computer device and storage system

By distributing EC computing tasks across the storage system, utilizing multiple intermediate task nodes to process shards in parallel, and having aggregation task nodes aggregate the results, the problem of high computational overhead per node is solved, thereby improving the read/write performance and data reconstruction efficiency of the storage system.

WO2026138694A1PCT designated stage Publication Date: 2026-07-02HUAWEI TECH CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
HUAWEI TECH CO LTD
Filing Date
2025-12-19
Publication Date
2026-07-02

Smart Images

  • Figure CN2025144112_02072026_PF_FP_ABST
    Figure CN2025144112_02072026_PF_FP_ABST
Patent Text Reader

Abstract

A data processing method and apparatus, a computer device, and a storage system. The method comprises: when a target node in a storage system receives an instruction of performing EC computation on the basis of m fragments, selecting a plurality of intermediate task nodes and at least one aggregation task node from the storage system; then sending corresponding intermediate task requests to the plurality of intermediate task nodes, and an aggregation task request to the at least one aggregation task node, so that the plurality of intermediate task nodes are scheduled to jointly undertake an EC computing task for computing an intermediate result of each target fragment, and the aggregation task node is scheduled to aggregate the intermediate results obtained by the intermediate task nodes, thereby solving the problem of high node load that occurs when the EC computing task is executed by a single node.
Need to check novelty before this filing date? Find Prior Art

Description

Data processing methods, apparatus, computer equipment and storage systems

[0001] This application claims priority to Chinese Patent Application No. 202411990441.0, filed with the State Intellectual Property Office of China on December 28, 2024, entitled “Data Processing Method, Apparatus, Computer Equipment and Storage System”, the entire contents of which are incorporated herein by reference. Technical Field

[0002] This application relates to the field of data storage, and more particularly to a data processing method, apparatus, computer equipment, and storage system. Background Technology

[0003] Erasure coding (EC) technology mainly uses the EC algorithm to encode the original data to obtain redundant data, and then stores the original data and the redundant data together to achieve fault tolerance.

[0004] In related technologies, when writing data to a storage system, the target node in the storage system can perform EC encoding on multiple data fragments included in the original data to be written in the write request to obtain at least one parity fragment. Then, these multiple data fragments are distributed and stored across multiple data nodes, and the at least one parity fragment is distributed as redundant data across at least one parity node in the storage system. Subsequently, if some of the multiple data fragments and at least one parity fragment are damaged, the target node can read the undamaged fragments from the storage system and then perform EC decoding based on the undamaged fragments to recover the damaged fragments. Currently, with the continuous increase in data traffic in storage systems, the computational overhead of the target node is becoming increasingly significant, seriously affecting the read and write performance of the storage system. Summary of the Invention

[0005] This application provides a data processing method, apparatus, computer equipment, and storage system, which can solve the problem of high computational overhead when performing EC calculations by a single node and improve the read and write performance of the storage system.

[0006] To achieve the above objectives, this application adopts the following technical solution:

[0007] In a first aspect, a data processing method is provided, applied to a target node in a storage system. The method includes: responding to a first instruction, determining a plurality of intermediate task nodes and at least one aggregation task node from the storage system, wherein the first instruction instructs EC calculation based on m shards to obtain n target shards, where m is not less than 2 and n is not less than 1; sending corresponding intermediate task requests to the plurality of intermediate task nodes and corresponding aggregation task requests to the at least one aggregation task node; wherein the intermediate task request requests the corresponding intermediate task node to obtain task data and, based on the task data, determine n intermediate results, wherein the task data includes a portion of the m shards, and different intermediate task nodes obtain different task data; the n intermediate results correspond one-to-one with the n target shards; and the aggregation task request requests the corresponding aggregation task node to calculate the specified target shard based on the multiple intermediate results provided by the plurality of intermediate task nodes corresponding to the specified target shard.

[0008] It should be noted that the target node mentioned above can also be called a control node or a scheduling node. The target node can be a host in the server, or it can be a storage device in the server that is configured with a certain computing power, such as a computing storage drive (CSD) device.

[0009] In this application, when a target node in the storage system receives an instruction to perform EC calculation on m shards, it can select multiple intermediate task nodes and at least one aggregation task node from the storage system. Based on this, the target node sends corresponding intermediate task requests to the multiple intermediate task nodes and an aggregation task request to at least one aggregation task node. Thus, each intermediate task node, based on the received intermediate task request, obtains a subset of the m shards to be EC calculated, and uses its obtained subset to determine n intermediate results corresponding one-to-one with the n target shards. Different intermediate task nodes obtain different shards from the m shards, resulting in different intermediate results determined by different intermediate task nodes for the same target shard. Based on this, any aggregation task node, based on the aggregation task request and the different intermediate results determined by the multiple intermediate task nodes for calculating the same target shard, can obtain the corresponding target shard. Therefore, in this application, the target node can schedule multiple intermediate task nodes to jointly undertake the EC computation task of calculating the intermediate results for each target shard, and schedule aggregation task nodes to aggregate the intermediate results obtained by the intermediate task nodes. This solves the problem of high node input / output (IO) load and computational load when a single node performs the EC computation task. Furthermore, multiple intermediate task nodes can execute their respective computation tasks in parallel, which can shorten the EC computation time and thus improve the read / write performance of the storage system.

[0010] The data processing method provided in this application can be applied to a data writing scenario. In this case, the first instruction is a write instruction, the m fragments are m data fragments to be EC encoded, and the n target fragments are n check fragments.

[0011] In write data scenarios, using the data processing method provided in this application, each of the multiple intermediate task nodes only needs to acquire a portion of the data fragments from the m data fragments, and the multiple intermediate task nodes can execute the EC encoding task in parallel. This reduces the time required for data fragment transmission and EC encoding, thereby shortening write latency and improving the write performance of the storage system.

[0012] Optionally, in a data writing scenario, the intermediate task request includes the task data, or the intermediate task request includes the identifiers of each fragment in the task data. For example, when the target node is a host, the host can directly send the task data to the intermediate task node in the intermediate task request to reduce communication costs. As another example, when the target node is a storage device such as a CSD device, the host can directly write m data fragments to multiple CSD devices acting as data nodes. Based on this, the write command received by the target node from the host may not include the m data fragments. Therefore, the target node can send the identifiers of each fragment included in the task data in the intermediate task request to the intermediate task node, so that the intermediate task node can directly read the required task data from the data nodes storing the data fragments. Of course, when the target node is a CSD device, the write command sent by the host to the target node can also include the m data fragments. In this scenario, the target node can write m data shards to multiple data nodes and send intermediate task requests to multiple intermediate task nodes. At this time, the intermediate task requests may include task data.

[0013] Optionally, in the data writing scenario, the method further includes: sending corresponding write requests to multiple data nodes in the storage system, wherein the write request includes at least one data fragment to be written to the corresponding data node from the m data fragments, and the data node is a different node from the intermediate task node; in response to a first write response from each data node and a second write response from each intermediate task node, determining that the writing of the m data fragments is complete, wherein the first write response is used to indicate that the corresponding data node has saved the received data fragment, and the second write response is used to indicate that the corresponding intermediate task node has saved the task data.

[0014] In this application, after receiving the first write response from each data node and the second write response from each intermediate task node, the target node can determine that the m data fragments not only exist in the multiple data nodes but also in the multiple intermediate task nodes. Since the multiple intermediate task nodes and the multiple data nodes are different nodes, the m data fragments held by the multiple intermediate task nodes and the m data fragments held by the multiple data nodes form a backup, thus improving the reliability of data during EC encoding.

[0015] Furthermore, in this application, since partial data fragments are transmitted to each intermediate task node to form a backup of the data fragments stored in the data node, the transmission time required to transmit partial data fragments to each intermediate task node is shorter compared to transmitting all data fragments to a single node to form a backup. Each intermediate task node also returns a second write response faster. Based on this, the time required for the target node to receive a write response from the start of processing a write instruction for m data fragments is shorter. Consequently, the resources occupied by processing the write instruction can be released more quickly. Therefore, this application improves data reliability during EC encoding while also shortening foreground I / O time and reducing resource consumption on the target node.

[0016] Optionally, the data processing method provided in this application can also be applied to EC decoding scenarios, for example, to downgrade read scenarios or data reconstruction scenarios. Accordingly, the first instruction is a downgrade read instruction or a data reconstruction instruction, the m fragments are fragments to be EC decoded, and the n target fragments are data fragments to be recovered.

[0017] In a degraded read scenario, using the data processing method provided in this application, each of the multiple intermediate task nodes only needs to acquire a portion of the m fragments. Furthermore, the multiple intermediate task nodes execute the EC decoding task in parallel. This reduces the time required for fragment transmission and EC decoding, thereby shortening read latency and improving the read performance of the storage system.

[0018] In the data reconstruction scenario, using the data processing method provided in this application, each of the multiple intermediate task nodes only needs to acquire a portion of the m fragments, and the multiple intermediate task nodes execute the EC decoding task in parallel. This can shorten the time required for fragment transmission and EC decoding, thereby reducing the data reconstruction latency.

[0019] Optionally, in the EC decoding scenario, since the m fragments to be EC decoded are intact fragments stored in the storage system, the intermediate task request includes the identifiers of each fragment in the task data. Each intermediate task node can read the corresponding fragment from the node storing the corresponding fragment in the storage system based on the identifier of each fragment.

[0020] Optionally, if the first instruction is a degraded read instruction, the aggregation task request includes the degraded read instruction, and the method further includes: receiving a read response sent by the at least one aggregation task node in response to the degraded read instruction, the read response including the target shard calculated by the corresponding aggregation task node.

[0021] Optionally, each of the at least one aggregate task node is also used to persistently store the target fragments it has computed.

[0022] In write data scenarios, each aggregation task node is used to compute at least one check shard. Furthermore, each aggregation task node is also used to persistently store the at least one check shard it computed. That is, check shards can be selected as aggregation task nodes. This way, after the aggregation task node computes the check shard, it does not need to forward the check shard to other nodes for storage, reducing data transfer overhead.

[0023] In degraded read scenarios or data reconstruction scenarios, each aggregation task node is used to compute at least one target shard to be recovered. Furthermore, each aggregation task node is also used to persistently store the target shard it has computed. That is, nodes originally allocated for storing the corresponding target shards can be used as aggregation task nodes. This way, after computing the target shard, the aggregation task node does not need to forward the target shard to other nodes for storage, reducing data transfer overhead.

[0024] Optionally, the at least one aggregation task node includes some or all of the plurality of intermediate task nodes.

[0025] In this application, some or all of the intermediate task nodes can also act as aggregation task nodes to aggregate intermediate results. In this way, these aggregation task nodes themselves will hold some intermediate results. Based on this, the amount of intermediate results that need to be obtained from other intermediate task nodes is reduced, thereby reducing the IO overhead caused by the transmission of intermediate results.

[0026] Secondly, a data processing method is provided, applied to intermediate task nodes in a storage system, wherein there are multiple intermediate task nodes, the storage system further includes a target node and at least one aggregation task node, the method comprising: responding to an intermediate task request from the target node, acquiring task data, the task data including a portion of m shards to be computed (EC), the task data obtained by different intermediate task nodes being different, wherein m is not less than 2; determining n intermediate results based on the task data, wherein n is not less than 1; and providing at least one of the n intermediate results to each aggregation task node, wherein the at least one intermediate result corresponds to at least one target shard, and each intermediate result is used to compute the corresponding target shard.

[0027] Optionally, this application can be applied to EC encoding scenarios, for example, to write data scenarios. Accordingly, the m fragments are m data fragments to be EC encoded, and the n target fragments are n check fragments.

[0028] Optionally, the storage system further includes a target node and multiple data nodes. Each data node is used to respond to a write request from the target node, persistently store at least one of the m data shards, and return a first write response to the target node, the first write response indicating that the at least one data shard has been saved. The data node and the intermediate task node are different nodes. Based on this, after obtaining the task data, the system further includes sending a second write response to the target node, the second write response indicating that the task data has been saved.

[0029] Optionally, this application can be applied to EC decoding scenarios, for example, in downgrade read or data reconstruction scenarios. Accordingly, the m fragments are fragments to be EC decoded, and the n target fragments are data fragments to be recovered.

[0030] Optionally, the intermediate task request includes the identifiers of each shard in the task data; the process of obtaining the task data may include: obtaining each shard from the nodes storing each shard in the storage system based on the identifiers of each shard in the intermediate task request, thereby obtaining the task data.

[0031] Optionally, the intermediate task request includes the identifier of each aggregate task node and the identifier of the corresponding target shard. The implementation process of providing at least one intermediate result among the n intermediate results to each aggregate task node may include: sending a first intermediate result among the n intermediate results to the first aggregate task node based on the identifier of the first aggregate task node and the identifier of the corresponding first target shard. The first intermediate result is used to calculate the first target shard.

[0032] In this application, the intermediate task request can carry the identifiers of each aggregation task node and the corresponding target shard identifier. Thus, after calculating n intermediate results corresponding to n target shards, the intermediate task node can directly send the required intermediate results to each aggregation task node, enabling the respective aggregation task node to aggregate and obtain the corresponding target shard. That is, there is no need for the aggregation task node to make a request, reducing communication costs.

[0033] Optionally, the process of providing at least one intermediate result among the n intermediate results to each aggregation task node may include: receiving a data acquisition request sent by the first aggregation task node, the data acquisition request including an identifier of the first target shard; and sending a first intermediate result among the n intermediate results to the first aggregation task node based on the identifier of the first target shard, the first intermediate result being used to calculate the first target shard.

[0034] Optionally, the aggregation task node is also used to persistently store the target fragments it has calculated.

[0035] Optionally, the at least one aggregation task node includes some or all of the nodes among a plurality of intermediate task nodes.

[0036] Thirdly, a data processing method is provided, applied to a storage system including a first aggregation task node in at least one aggregation task node, the storage system further including multiple intermediate task nodes and a target node, the method comprising: in response to an aggregation task request from the target node, obtaining multiple intermediate results corresponding to a first target shard provided by the multiple intermediate task nodes, each intermediate result being obtained by the corresponding intermediate task node based on task data it has obtained, the task data including a portion of m shards to be computed for EC, the task data obtained by different intermediate task nodes including different shards, the m being not less than 2; and determining the first target shard based on the multiple intermediate results corresponding to the first target shard.

[0037] The first aggregation task node can be one of multiple intermediate task nodes, or it can be any other node besides the multiple intermediate task nodes and the target node.

[0038] Optionally, the process of obtaining multiple intermediate results corresponding to the first target shard provided by the multiple intermediate task nodes may include: receiving the intermediate results corresponding to the first target shard sent by each intermediate task node.

[0039] Optionally, the aggregation task request includes the identifiers of the plurality of intermediate task nodes and the identifier of the first target shard. Before receiving the intermediate results corresponding to the first target shard sent by each intermediate task node, the method further includes: sending a data acquisition request to the plurality of intermediate task nodes based on the identifiers of the plurality of intermediate task nodes, wherein the data acquisition request includes the identifier of the first target shard.

[0040] Optionally, the m fragments are m data fragments to be EC encoded, and the first target fragment is a verification fragment; or, the m fragments are fragments to be EC decoded, and the first target fragment is a data fragment to be recovered; after determining the first target fragment, the method further includes: persistently storing the first target fragment.

[0041] Optionally, the m fragments are fragments to be EC decoded, and the aggregation task request further includes a downgrade read instruction; after determining the first target fragment, the method further includes: sending a read response to the downgrade read instruction to the target node, the read response including the first target fragment.

[0042] Fourthly, a data processing device is provided, which is applied to a target node in a storage system. The data processing device includes a node selection module and a request sending module.

[0043] The node selection module, in response to a first instruction, determines multiple intermediate task nodes and at least one aggregation task node from the storage system. The first instruction instructs EC calculation based on m shards to obtain n target shards, where m is not less than 2 and n is not less than 1. The request sending module sends corresponding intermediate task requests to the multiple intermediate task nodes and corresponding aggregation task requests to the at least one aggregation task node. The intermediate task request requests the corresponding intermediate task node to obtain task data and, based on the task data, determines n intermediate results. The task data includes a portion of the m shards, and different intermediate task nodes obtain different task data. The n intermediate results correspond one-to-one with the n target shards. The aggregation task request requests the corresponding aggregation task node to calculate the specified target shard based on the multiple intermediate results provided by the multiple intermediate task nodes.

[0044] Fifthly, a data processing device is provided, applied to intermediate task nodes in a storage system, wherein there are multiple intermediate task nodes, the storage system further includes a target node and at least one aggregation task node, and the data processing device includes: a task data acquisition module, an EC calculation module, and an intermediate result feedback module.

[0045] The task data acquisition module is used to acquire task data in response to an intermediate task request from the target node. The task data includes a portion of the m shards to be computed using EC. Different intermediate task nodes obtain different task data, and m is not less than 2. The EC computation module is used to determine n intermediate results based on the task data, and n is not less than 1. The intermediate result feedback module is used to provide at least one of the n intermediate results to each aggregation task node. The at least one intermediate result corresponds to at least one target shard, and each intermediate result is used to compute the corresponding target shard.

[0046] In a sixth aspect, a data processing apparatus is provided, applied to a storage system including a first aggregation task node in at least one aggregation task node, the storage system further including multiple intermediate task nodes and a target node, the data processing apparatus including: an intermediate result acquisition module and an intermediate result aggregation module.

[0047] The intermediate result acquisition module is used to respond to the aggregation task request from the target node and acquire multiple intermediate results corresponding to the first target shard provided by the multiple intermediate task nodes. Each intermediate result is obtained by the corresponding intermediate task node based on its own task data. The task data includes a portion of the m shards to be computed for EC. Different intermediate task nodes have different shards included in their task data, and m is not less than 2. The intermediate result aggregation module is used to determine the first target shard based on the multiple intermediate results corresponding to the first target shard.

[0048] A seventh aspect provides a computer device comprising a processor, the processor being configured to execute at least one program instruction or code stored in a memory to implement the data processing method described in the first, second, or third aspect above.

[0049] Eighthly, a computer-readable storage medium is provided, wherein instructions are stored therein, which, when executed on a computer device, cause the computer device to perform the data processing method described in the first, second, or third aspect above.

[0050] Ninthly, a computer program product containing instructions is provided, which, when run on a computer device, causes the computer device to perform the data processing method described in the first, second, or third aspect above.

[0051] The technical effects achieved by the second to ninth aspects mentioned above are similar to those achieved by the corresponding technical means in the first aspect, and will not be repeated here. Attached Figure Description

[0052] Figure 1 is a schematic diagram of RS encoding in related technologies;

[0053] Figure 2 is an architecture diagram of a distributed storage system used in the data processing method provided in the embodiments of this application;

[0054] Figure 3 is a flowchart of a data processing method provided in an embodiment of this application;

[0055] Figure 4 is a schematic diagram of a target node issuing intermediate task requests and aggregate task requests according to an embodiment of this application;

[0056] Figure 5 is a schematic diagram of an intermediate task node acquiring task data according to an embodiment of this application;

[0057] Figure 6 is a schematic diagram of another intermediate task node acquiring task data provided in an embodiment of this application;

[0058] Figure 7 is a schematic diagram of intermediate task nodes and aggregate task nodes jointly implementing EC coding in an EC coding scenario provided by an embodiment of this application;

[0059] Figure 8 is a schematic diagram of how intermediate task nodes and aggregate task nodes jointly implement EC decoding in an EC decoding scenario provided by an embodiment of this application;

[0060] Figure 9 is a schematic diagram of the intermediate task node and the aggregation task node jointly implementing EC encoding in the data writing scenario provided in the embodiments of this application;

[0061] Figure 10 is a schematic diagram of the intermediate task node and the aggregation task node jointly implementing EC decoding in the downgraded read scenario provided in the embodiments of this application;

[0062] Figure 11 is a schematic diagram of intermediate task nodes and aggregation task nodes jointly implementing EC decoding in the data reconstruction scenario provided in the embodiments of this application;

[0063] Figure 12 is a schematic diagram of the structure of a data processing device provided in an embodiment of this application;

[0064] Figure 13 is a schematic diagram of another data processing device provided in an embodiment of this application;

[0065] Figure 14 is a schematic diagram of another data processing device provided in an embodiment of this application. Detailed Implementation

[0066] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the implementation methods of this application will be further described in detail below with reference to the accompanying drawings.

[0067] Before providing a detailed explanation of the embodiments of this application, let's first introduce the application scenarios involved in the embodiments of this application.

[0068] Currently, to ensure data reliability, storage systems can employ EC (Error Correction) technology. Common EC technologies include array EC technologies such as Redundant Array of Independent Disks Level 5 (RAID5) and RAID6, as well as Reed-Solomon (RS) codes. EC technology primarily uses the EC algorithm to encode original data fragments into parity fragments, storing both data fragments and parity fragments together to achieve fault tolerance. Subsequently, when some data fragments become abnormal—for example, due to damage or loss—the abnormal fragments can be recovered from other intact data fragments and parity fragments stored in the storage system. The process of generating parity fragments is called EC encoding, and the process of recovering abnormal fragments is called EC decoding.

[0069] Taking RS code as an example, RS can usually be used. m+k The RS encoding type is used to label the data, where m represents the number of original data fragments and k represents the number of parity fragments obtained through encoding. The ratio of m to k can be called the EC ratio. In related technologies, referring to Figure 1, it is assumed that m equals 4 and k equals 2, that is, the EC ratio is 4:2. When writing data, the host can send the four data fragments C1, C2, C3, and C4 to be written to a computing node, which performs EC encoding based on C1, C2, C3, and C4 to obtain two parity fragments P1 and P2. Afterwards, the computing node can store the four data fragments and two parity fragments on six storage nodes D1 to D6. At this time, the four data fragments and two parity fragments located on storage nodes D1 to D6 form a logical stripe S1. Subsequently, when a fragment (including the original data fragment and the redundant check fragment) in the logical stripe S1 is abnormal, and the number of abnormal fragments is no more than 2, the compute node can recover the abnormal fragment by reading the remaining intact fragments in the logical stripe S1 and performing EC decoding.

[0070] Therefore, in related technologies, whether performing EC encoding or EC decoding, the shards to be computed are aggregated to a single computing node for computation. However, in scenarios such as distributed storage, big data, and databases, the data traffic of storage systems is often very large. In this case, because all the shards to be computed must be transmitted to the computing node and all EC computations must be performed by that computing node, the IO load and computation load of the computing node are high, and the transmission latency and computation latency are large, which seriously affects the read and write performance and data reconstruction performance of the storage system.

[0071] To address the problems existing in related technologies, this application provides a data processing method. In this method, when a target node in a storage system receives an instruction to perform EC calculation on m shards, it can select multiple intermediate task nodes and at least one aggregation task node from the storage system. It then sends corresponding intermediate task requests to the multiple intermediate task nodes and an aggregation task request to the at least one aggregation task node. Each intermediate task node, based on the received intermediate task request, obtains a portion of the m shards to be EC calculated, and uses its obtained portion of the shards to determine n intermediate results corresponding one-to-one with the n target shards. Different intermediate task nodes obtain different shards from the m shards, thus different intermediate task nodes determine different intermediate results for the same target shard. Based on this, any aggregation task node, based on the aggregation task request and the different intermediate results determined by the multiple intermediate task nodes for calculating the same target shard, can obtain the corresponding target shard. Therefore, in this embodiment, the target node in the storage system can schedule multiple intermediate task nodes to jointly undertake the EC computation task of calculating the intermediate results of each target shard, and schedule aggregation task nodes to aggregate the intermediate results obtained by the intermediate task nodes. This solves the problem of high node I / O and computational load when a single node performs the EC computation task. Furthermore, since each intermediate task node only needs to receive a portion of the shards, and multiple intermediate task nodes can execute their respective computation tasks in parallel, the time required for shard transmission and EC computation can be shortened. Thus, in write scenarios, the write latency of the storage system can be reduced; in read scenarios, the read latency of the storage system can be reduced; and in data reconstruction scenarios, the data reconstruction latency can be reduced.

[0072] The data processing method provided in this application can be applied to various types of storage systems. For example, Figure 2 is an architecture diagram of a distributed storage system provided in this application. As shown in Figure 2, the distributed storage system includes multiple servers 200 (three servers 200 are shown in Figure 2, but it is not limited to three servers 200), and the servers 200 can communicate with each other. Each server 200 is a device that has both computing and storage capabilities.

[0073] For example, as shown in FIG2, server 200 may include host 210 and multiple CSD devices 220. Host 210 and multiple CSD devices 220 may be connected via a bus.

[0074] At the hardware level, as shown in Figure 2, the host 210 includes at least a processor 211 and memory 212. The processor 211 and memory 212 are connected via a bus. At the software level, the host 210 includes an operating system (OS) 213 and applications 214 (hereinafter referred to as applications) running on the OS 213. Application 214 is a collective term for various applications presented to the user. In some embodiments, in response to a write command triggered by application 214, the processor 211 interacts with the CSD device 220 to write the data to be written carried in the write command to the CSD device 220. The processor 211 is also used to receive data from the CSD device 220 and feed the data back to the application 214.

[0075] In one example, processor 211 includes a central processing unit (CPU) 2110. Figure 2 shows only one CPU 2110. In practical applications, there are often multiple CPUs 2110, and each CPU 2110 has one or more CPU cores. This embodiment does not limit the number of CPUs or the number of CPU cores.

[0076] It should be noted that the CPU 2110 can be used to process data access requests from outside the server 200 (application server or other server 200), and can also be used to process requests generated internally by the server 200. These data access requests include write instructions and read instructions. In addition, the CPU 2110 can also be used for data computation or processing, such as metadata management, deduplication, data verification, data compression, virtualization of storage space, and address translation. For example, in this embodiment, the host 210 can act as a target node, and the CPU 2110 can be used to execute the operations performed by the target node in the data processing methods described in the following embodiments.

[0077] In another example, processor 211 may also include programmable electronic components, such as a data processing unit (DPU). A DPU possesses the versatility and programmability of a CPU, but is more specialized, capable of efficiently operating on network packets, storage requests, or analysis requests. A DPU differs from a CPU by its high degree of parallelism (requiring the processing of a large number of requests). Based on this, a DPU can be used instead of a CPU to process received data access requests. For example, in this embodiment, when host 210 is the target node, the DPU can replace the CPU to execute the steps performed by the target node in the data processing method provided in the following embodiments. Optionally, the DPU here can also be replaced by a graphics processing unit (GPU), an embedded neural network processing unit (NPU), or other processing chips.

[0078] Memory 212 refers to the internal memory that directly exchanges data with processor 211. It can read and write data at any time and at a very fast speed, serving as temporary data storage for the operating system or other running programs. For example, in this embodiment, memory 212 may store at least one program instruction or code, which processor 211 can execute to achieve the operations performed by the target node in the following embodiments.

[0079] For example, the memory includes at least two types of memory, such as random access memory (RAM) or read-only memory (ROM). For instance, the RAM may be dynamic random access memory (DRAM) or storage class memory (SCM). Memory 212 may also include other types of RAM, such as static random access memory (SRAM). For read-only memory, for example, it may be programmable read-only memory (PROM) or erasable programmable read-only memory (EPROM). Additionally, memory 212 may be a dual in-line memory module (DIMM), i.e., a module composed of DRAM. In practical applications, server 200 may be configured with multiple memory modules 212, and different types of memory modules 212. This embodiment does not limit the number or type of memory modules 212. Furthermore, memory modules 212 can be configured to have power-saving functionality. Power-saving function means that the data stored in memory 212 will not be lost when the system loses power and then regains power. Memory with power-saving function is called non-volatile memory.

[0080] CSD device 220 can not only provide storage resources for storing data, but also provide computing resources for data computation and processing. Because CSD device 220 can provide computing resources, some computing tasks of the processor 211 in the host 210 can be offloaded to CSD device 220, thereby saving the computing power and bandwidth overhead of the processor 211. For example, CSD device 220 can be a solid-state disk (SSD) with an integrated computing unit. This computing unit can include a processor, which is implemented through a CPU, a DPU, or other type of computing core.

[0081] For example, in this embodiment, the CSD device 220 can serve as a data node for persistent data storage. Alternatively, the CSD device 220 can serve as an intermediate task node, in which case the processor in the CSD device 220 can execute at least one program instruction or code to implement the operations performed by the intermediate task node in the following embodiments. Furthermore, the CSD device 220 can also serve as an aggregation task node, in which case the processor in the CSD device 220 can execute at least one program instruction or code to implement the operations performed by the aggregation task node in the following embodiments. Optionally, the CSD device 220 can also replace the host 210 as the target node, in which case the processor in the CSD device 220 can execute at least one program instruction or code to implement the operations performed by the target node in the data processing method in the following embodiments.

[0082] Additionally, it is worth noting that in some embodiments, the aforementioned servers 200 can communicate via a network.

[0083] In other embodiments, the OS 213 of each of the servers 200 described above may run a non-volatile memory express (NVMe) driver or an interconnect bus driver. Based on the NVMe driver or peer-to-peer bus interconnect driver, the hosts 210 and CSD devices 220 in the multiple servers 200 can be interconnected using an interconnect bus. For example, the interconnect bus can be a unified bus (UB), compute express link (CXL), etc. In this case, the CSD devices 220 in the multiple servers 200 form a storage pool. The processor 211 on any server 200 can access not only the CSD device 220 in its own server 200, but also the CSD devices 220 in other servers 200 through the interconnect bus. Furthermore, the CSD devices 220 in different servers 200 can communicate through this interconnect bus.

[0084] The distributed storage system shown in Figure 2 is an example provided in an embodiment of this application. In some possible cases, the data processing method provided in this application embodiment can also be applied to other types of distributed storage systems. For example, in a compute-storage separated distributed storage system including a computing device cluster and a storage device cluster, each computing device in the computing device cluster can communicate with each storage device in the storage device cluster via a network. In this case, any computing device can act as a target node, executing the operations performed by the target node in the following embodiments. Correspondingly, multiple storage devices in the storage device cluster can act as intermediate task nodes or aggregation task nodes to execute the operations performed by the corresponding nodes in the following embodiments.

[0085] The following describes the detailed implementation process of the data processing method provided in this application embodiment, applied to the distributed storage system shown in Figure 2. For example, referring to Figure 3, the process includes the following steps:

[0086] S301: The target node responds to the first instruction to determine multiple intermediate task nodes and at least one aggregate task node from the storage system. The first instruction is used to instruct EC calculation based on m shards to obtain n target shards.

[0087] In this embodiment, the target node can be host 210 in a server 200 of the distributed storage system shown in Figure 2. Alternatively, it can be a CSD device 220 in a server. Furthermore, the data processing method provided in this embodiment can be applied to both EC encoding and EC decoding scenarios. EC encoding scenarios can include data writing scenarios, and EC decoding scenarios can include degraded read and data reconstruction scenarios. Depending on the application scenario, the first instruction differs, and the implementation method of the target node determining intermediate task nodes and aggregation task nodes based on the first instruction also differs. The implementation methods for various scenarios will be described in detail below.

[0088] Scenario 1: EC Encoding Scenario

[0089] EC encoding scenarios include data writing scenarios. Taking the data writing scenario as an example, the first instruction can be a write instruction to instruct the writing of data into the storage system. Since this embodiment uses EC technology to write data into the storage system, the write instruction can indirectly instruct EC calculation based on the m data fragments to be written, to obtain n parity fragments. Here, m is the number of original data fragments for each EC encoding as indicated by the EC ratio configured in the storage system, and m is not less than 2. n is equal to the number k of parity fragments indicated by the EC ratio, and k is not less than 1.

[0090] For example, when the target node is a host, the first instruction can be a write instruction received by the host from an external source or triggered by an internal application. In this case, the first instruction can carry the m data fragments to be written. When the target node is a CSD device, the first instruction can be a write instruction received by the CSD device from the host. In this case, the first instruction can carry the identifiers of the m data fragments to be written, and optionally, it can also carry the storage addresses of the m data fragments.

[0091] In the case that the first instruction is a write instruction, in response to the first instruction, the target node can select from the multiple CSD devices included in the storage system multiple intermediate task nodes and at least one aggregate task node for EC encoding.

[0092] For example, the target node can obtain first task parameters, which include the number of intermediate task nodes h1 and the number of aggregation task nodes t1. Where h1 is no greater than m and t1 is no greater than n. Then, the target node can select h1 CSD devices from the storage system as intermediate task nodes and t1 CSD devices as aggregation task nodes.

[0093] It should be noted that the first task parameter can be pre-configured based on the EC ratio and the expected write latency. For example, the number of intermediate task nodes h1 can be configured based on the expected write latency and the number of data shards indicated by the EC ratio, and the number of aggregation task nodes t1 can be configured based on the expected write latency and the number of check shards indicated by the EC ratio. Alternatively, the first instruction can carry the user's expected write latency, and this first task parameter can be determined by the target node based on the user's expected write latency and the EC ratio. The shorter the expected write latency, the more intermediate task nodes h1 and / or aggregation task nodes t1 can be. This allows the EC encoding task to be distributed across more nodes for parallel execution, thereby improving EC encoding speed and reducing write latency. Of course, considering the overhead of subsequent aggregation of intermediate results, a larger number of intermediate task nodes is not always better.

[0094] For example, the EC ratio is 24:4, meaning there are 24 original data shards and 4 parity shards. Based on this, when the expected write latency is the first latency, the number of intermediate task nodes h1 can be equal to 4, and the number of aggregation task nodes t1 can be equal to 2. That is, 4 intermediate task nodes can handle the computation of the 24 data shards, and 2 aggregation task nodes can aggregate the intermediate results from the 4 intermediate task nodes to obtain 4 parity shards. For example, each intermediate task node can be used to compute on 6 data shards; each aggregation task node can be used to compute two parity shards.

[0095] For example, with an EC ratio of 24:4, when the expected write latency is the second latency, and the second latency is greater than the first latency, the number of intermediate task nodes h1 can be equal to 6, and the number of aggregation task nodes t1 can be equal to 4. That is, 6 intermediate task nodes can be used to handle the computation of 24 data shards, and 4 aggregation task nodes can be used to aggregate the intermediate results of the 6 intermediate task nodes to obtain 4 parity shards. For example, each of the 6 intermediate task nodes can be used to compute 4 data shards, and each of the 4 aggregation task nodes can be used to compute one parity shard. In this way, compared to EC encoding by 4 intermediate task nodes and 2 aggregation task nodes, the EC encoding task is distributed to more nodes for parallel computation, which can improve the EC encoding speed and thus reduce write latency.

[0096] After obtaining the number h1 intermediate task nodes, the target node can select h1 devices as intermediate task nodes from the CSD devices in the storage system, excluding the CSD device used for persistent storage of the m data fragments. The CSD device used for persistent storage of the data fragments can be referred to as a data node. That is, in this embodiment, during data writing, the intermediate task node used for EC encoding and the data node used for persistent storage of the m data fragments to be written can be different nodes.

[0097] For example, if h1 equals 4, there are 24 data shards to be written. These 24 data shards can be persistently stored using 24 data nodes. Based on this, the target node can select 4 other CSD devices as intermediate task nodes. In this way, the 24 data shards are persistently stored on the 24 data nodes, and simultaneously, they are aggregated to the 4 intermediate task nodes for EC (Extended Computation) calculation. Furthermore, the 24 data shards held by the 4 intermediate task nodes and the 24 data shards held by the data nodes form a data backup, which helps improve data reliability.

[0098] Additionally, it should be noted that, as described above, m data shards and k parity shards generated from the m data shards can form a logical stripe. For any logical stripe, the k parity shards in the logical stripe can be stored on k CSD devices in the storage system, and these k CSD devices can be called parity nodes. Based on this, in the embodiments of this application, the selected h1 intermediate task nodes may include some or all of the k parity nodes subsequently used to store the k target shards. Alternatively, the selected h1 intermediate task nodes may not include the parity nodes; that is, the h1 intermediate task nodes can be nodes other than the parity nodes and the data nodes used to store the m data shards to be written.

[0099] After obtaining the number t1 of aggregation task nodes, the target node can select t1 CSD devices from the storage system as aggregation task nodes.

[0100] As described above, for any logical stripe, the k parity shards within that stripe can be stored in k parity nodes within the storage system. Based on this, the target node can select t1 parity nodes from the k parity nodes subsequently used to store the k target shards as aggregation task nodes. Optionally, the target node can also randomly select t1 CSD devices from the storage system as aggregation task nodes. Alternatively, the target node can select CSD devices with more idle computing resources as aggregation task nodes to achieve load balancing. Alternatively, if h1 is not less than t1, t1 nodes can be selected from the intermediate task nodes as aggregation task nodes; if h1 is less than t1, h1 intermediate task nodes and (t1-h1) other CSD devices can be used as aggregation task nodes, thus saving some of the transmission overhead when intermediate task nodes transmit intermediate results to the aggregation task nodes.

[0101] In summary, in the embodiments of this application, a CSD device may be both an aggregation task node and an intermediate task node. In other words, an aggregation task node may include some or all of the intermediate task nodes. Alternatively, an aggregation task node may not include any of the intermediate task nodes.

[0102] Scenario 2: EC Decoding Scenario

[0103] EC decoding scenarios can include downgraded read scenarios and data reconstruction scenarios, which will be introduced separately below.

[0104] 1. Degradation reading scenario

[0105] In a degraded read scenario, the first instruction can be a degraded read instruction. This instruction directs the recovery of n abnormal data fragments from the target logical stripe based on the m non-abnormal fragments stored in the storage system, and then reads the recovered n data fragments. The m non-abnormal fragments include data fragments and parity fragments, where n is no greater than the number of parity fragments k indicated by the EC ratio. The abnormal data fragments to be read are the n target fragments to be recovered.

[0106] For example, when the target node is a host, the host can monitor the operation of each CSD device in the server. Based on this, after receiving a target read command triggered by an external or internal application, the host can determine whether the data fragment to be read is an abnormal data fragment based on the identifier of the data fragment to be read carried in the target read command and the operation status of each CSD device. If the data fragment to be read is determined to be abnormal, the host generates a degraded read command based on the target read command, thereby instructing the m non-abnormal fragments belonging to the same logical stripe as the abnormal data fragment to recover the abnormal data fragment, and then read the recovered data fragment. As another example, when the target node is a CSD device, the first command can be a degraded read command received by the CSD device from the host, or the first command can be a degraded read command generated by the CSD device based on the target read command issued by the host. For example, a CSD device can pre-detect which data shards are abnormal. Based on this, when the CSD device receives a target read command, if it determines that the data shards to be read contain abnormal data shards, it can generate a degraded read command. Alternatively, the CSD can also detect whether the data shards to be read are abnormal when it receives a target read command, and generate a degraded read command if an abnormality is detected.

[0107] In the case that the first instruction is a downgrade read instruction, in response to the first instruction, the target node can select from the CSD devices included in the storage system a plurality of intermediate task nodes and at least one aggregate task node for performing EC decoding.

[0108] For example, the target node can obtain a second task parameter, which includes the number of intermediate task nodes h2 and the number of aggregation task nodes t2. Where h2 is no greater than m and t2 is no greater than n. Then, the target node can select h2 CSD devices from the storage system as intermediate task nodes and t2 CSD devices as aggregation task nodes.

[0109] It should be noted that the second task parameter can be pre-configured based on the EC ratio and the expected read latency. For example, the number of intermediate task nodes h2 can be configured based on the expected read latency and the number of data shards indicated by the EC ratio, while the number of aggregation task nodes t2 can be configured based on the expected read latency and the number of target shards n to be recovered. Alternatively, the first instruction can carry the user's expected read latency, and the second task parameter can be determined by the target node based on the user's expected read latency and the EC ratio. The shorter the expected read latency, the more intermediate task nodes h2 and / or aggregation task nodes t2 can be. This allows the EC decoding task to be distributed across more nodes for parallel execution, thereby improving the EC decoding speed and reducing read latency. However, considering the overhead of subsequent aggregation of intermediate results, a larger number of intermediate task nodes is not always better.

[0110] After obtaining the number h2 of intermediate task nodes, the target node can select h2 non-faulty CSD devices from the storage system as intermediate task nodes.

[0111] For example, the h2 intermediate task nodes may include nodes that store other fragments belonging to the same logical stripe as the target fragment to be recovered. For instance, the h2 intermediate task nodes may include some or all of the CSD devices containing the m fragments used to calculate the target fragment to be recovered. Alternatively, if the CSD device containing the target fragment to be recovered is faulty and unusable, the h2 intermediate task nodes may include a target CSD device for replacing the faulty CSD device.

[0112] After obtaining the number t2 of aggregation task nodes, the target node can select t2 CSD devices from the storage system as aggregation task nodes.

[0113] In this process, the target node can randomly select t2 CSD devices from the storage system as aggregation task nodes. Alternatively, it can select t2 CSD devices with more idle computing resources as aggregation task nodes to achieve load balancing. Alternatively, if h2 is not less than t2, t2 nodes can be selected from the intermediate task nodes as aggregation task nodes; if h2 is less than t2, h2 intermediate task nodes and (t2-h2) other CSD devices can be used as aggregation task nodes, thus saving some of the transmission overhead when intermediate task nodes transmit intermediate results to the aggregation task node. Alternatively, if the CSD devices containing the n target shards to be recovered fail and are unusable, the target CSD devices used to replace the failed CSD devices can be used as aggregation task nodes.

[0114] 2. Data Reconstruction Scenarios

[0115] In a data reconstruction scenario, the first instruction can be a data reconstruction instruction, which is used to instruct the recovery of data in a faulty CSD device in the storage system.

[0116] For example, when the target node is a host, the first instruction can be a data reconstruction instruction generated by the host after detecting a failure of a CSD device. When the target node is a CSD device, the first instruction can be a data reconstruction instruction issued by the host.

[0117] For example, the first instruction carries the identifiers of the shards to be recovered stored in the faulty CSD device. It should be noted that in a storage system, the CSD devices used to store parity shards from different logical stripes may be different. Therefore, the faulty CSD device may store both data shards belonging to some logical stripes and parity shards belonging to some logical stripes. Thus, the shards to be recovered in the faulty CSD device may include data shards and / or parity shards. Furthermore, some shards to be recovered may belong to the same logical stripe, while others may belong to different logical stripes. Therefore, for n target shards to be recovered belonging to any logical stripe, if the target shards to be recovered are data shards, the first instruction may also include the identifiers of m shards that are not abnormal and belong to the same logical stripe as the n data shards to be recovered. In this case, the m shards include both data shards and parity shards. If the target fragment to be recovered is a parity fragment, the first instruction may further include m data fragments for calculating the number of parity fragments to be recovered from the n parity fragments. Here, n is not greater than the number of parity fragments k indicated by the EC ratio.

[0118] In the case that the first instruction is a data reconstruction instruction, in response to the first instruction, the target node can select from the CSD devices included in the storage system a plurality of intermediate task nodes and at least one aggregation task node for performing EC decoding.

[0119] For example, the target node can obtain a third task parameter, which includes the number of intermediate task nodes h3 and the number of aggregation task nodes t3. Where h3 is no greater than m and t3 is no greater than n. Then, the target node can select h3 CSD devices from the storage system as intermediate task nodes and t3 CSD devices as aggregation task nodes.

[0120] It should be noted that this third task parameter can be pre-configured based on the EC ratio and the expected reconstruction latency. For example, the number of intermediate task nodes h3 can be configured based on the expected reconstruction latency and the number of data shards indicated by the EC ratio, while the number of aggregation task nodes t3 can be configured based on the expected reconstruction latency and the number of target shards n to be recovered. The shorter the expected reconstruction latency, the more intermediate task nodes h3 and / or aggregation task nodes t3 can be. This allows the EC decoding task to be distributed across more nodes for parallel execution, thereby improving EC decoding speed and reducing reconstruction latency. However, considering the overhead of subsequent aggregation of intermediate results, a higher number of intermediate task nodes is not always better.

[0121] After obtaining the number of intermediate task nodes h3 and the number of aggregate task nodes t3, the target node can refer to the method of selecting intermediate task nodes and aggregate task nodes in the downgraded read scenario described above, and select h3 intermediate task nodes and t3 aggregate task nodes, which will not be elaborated here.

[0122] It should be noted that in the write data scenario, degraded read scenario, and data reconstruction scenario described above, the first task parameter, second task parameter, and third task parameter obtained by the target node are different, that is, the target node can be configured with three different task parameters for the latency requirements of the above three different scenarios. Optionally, the first task parameter, second task parameter, and third task parameter can also be the same. In this case, the shortest latency can be selected from the three different scenarios, and the task parameters can be configured based on the shortest latency.

[0123] S302: The target node sends corresponding intermediate task requests to multiple intermediate task nodes respectively, and sends corresponding aggregate task requests to at least one aggregate task node respectively.

[0124] The target node can send a corresponding intermediate task request to each intermediate task node, requesting the intermediate task node to obtain the necessary task data from the m fragments used to compute the n target fragments, and then calculate n intermediate results corresponding one-to-one with the n target fragments based on the obtained task data. Additionally, the target node can also send a corresponding aggregation task request to each aggregation task node, requesting the aggregation task node to compute the specified target fragment based on the multiple intermediate results provided by the multiple intermediate task nodes.

[0125] Specifically, in the embodiments of this application, the implementation methods for the target node to send intermediate task requests and aggregate task requests are different depending on the application scenario.

[0126] Scenario 1: EC Encoding Scenario

[0127] In EC encoding scenarios, such as in a data writing scenario, when the target node is a host on a server, the first instruction can carry m data fragments to be written. In this case, the target node can divide the m data fragments to be written into h1 task data based on the determined number h1 intermediate task nodes. The number of data fragments contained in each task data can be equal or unequal. Then, an intermediate task request is sent to each intermediate task node. This intermediate task request can be an EC encoding request carrying one task data, used to request the intermediate task node to perform EC encoding based on this task data. It should be noted that the task data carried in the intermediate task requests sent to each intermediate task node is different.

[0128] Optionally, the intermediate task request may also carry a mapping relationship between the identifiers of each aggregation task node and the identifiers of the target shards (i.e., verification shards) to be calculated by the corresponding aggregation task node.

[0129] Additionally, the target node can determine the number of target fragments to be computed by each aggregation task node based on the number of aggregation task nodes t1 and the number of target fragments n to be computed. The number of target fragments to be computed by each aggregation task node can be equal or unequal, but the number of target fragments to be computed by each aggregation task node is not less than 1. Then, the target node sends an aggregation task request to each aggregation task node. This aggregation task request may include a fragment write instruction and the identifier of the target fragment to be computed by the aggregation task node. Optionally, the aggregation task request may also include the identifiers of multiple intermediate computing nodes. The fragment write instruction is used to instruct the aggregation task node to persistently store the computed target fragments. Optionally, the fragment write instruction may carry the identifier of the node used to persistently store the target fragments computed by the aggregation task node.

[0130] For example, referring to Figure 4, there are 24 data shards to be written, numbered C1 to C24. There are 4 intermediate task nodes, numbered CSD1 to CSD4. There are 4 target shards to be calculated, and 4 aggregation task nodes, numbered CSD5 to CSD8. The 24 data shards are divided into 4 task data sets, each containing 6 data shards. As shown in Figure 4, C1 to C6 of the 24 data shards are task data 1, which is sent to CSD1 in intermediate task request R1; C7 to C12 are task data 2, which is sent to CSD2 in intermediate task request R2; C13 to C18 are task data 3, which is sent to CSD3 in intermediate task request R3; and C19 to C24 are task data 4, which is sent to CSD4 in intermediate task request R4. The four target fragments to be calculated are identified as P1, P2, P3 and P4. P1 is sent to CSD5 in the aggregation task request A1, P2 is sent to CSD6 in the aggregation task request A2, P3 is sent to CSD7 in the aggregation task request A3 and P4 is sent to CSD8 in the aggregation task request A4.

[0131] It should be noted that in the above example, the number of shards processed by each intermediate task node is equal, and the number of target shards to be calculated by each aggregation task node is also equal. In some possible cases, the number of shards processed by each intermediate task node may be unequal, and the number of target shards to be calculated by each aggregation task node may also be unequal. For example, if there are 24 data shards to be written, and there are 4 intermediate task nodes, one intermediate task node's task data includes 8 data shards, another intermediate task node's task data includes 4 data shards, and the remaining two intermediate task nodes each include 6 data shards. As another example, if the number of target shards to be calculated is 4, and there are 3 aggregation task nodes, then one aggregation task node will calculate 2 target shards, and the other two aggregation task nodes will each calculate 1 target shard.

[0132] When the target node is a CSD device in a server, the first instruction can carry the identifiers of the m data shards to be written. In this case, the target node can determine h1 task indication messages based on the number h1 intermediate task nodes and the identifiers of the m data shards to be written. Then, the target node sends an intermediate task request to each intermediate task node. This intermediate task request is an EC encoding request, carrying one of the h1 task indication messages, used to instruct EC encoding of the task data indicated by the task indication message. In addition, the target node can also send aggregation task requests to each aggregation task node in the manner described above, which will not be elaborated further here.

[0133] It should be noted that each of the h1 task indication messages is used to indicate a set of task data. Based on this, the task indication messages carried in the intermediate task requests sent by the target node to each intermediate task node are different. That is, the task data indicated by the target node to each intermediate task node is different.

[0134] For example, the task indication information may include identifiers of some data fragments among m data fragments, wherein different task indication information contains different fragment identifiers, and the number of fragment identifiers included may be equal or unequal. Optionally, the task indication information may also include a storage address corresponding to the identifier of each data fragment, which is used to indicate the storage location of the corresponding data fragment.

[0135] It should be noted that, in this embodiment of the application, in the data writing scenario, for the m data fragments to be written, not only is the verification fragment calculated using the data processing method provided in this embodiment, but the m data fragments to be written are also sent to multiple data nodes for persistent storage. Based on this, the host can also issue corresponding write requests to the selected multiple data nodes, where these multiple data nodes are different from the multiple intermediate task nodes. The write request carries the data fragments from the m data fragments to be written to the corresponding data node. After receiving the write request, each data node can save the data fragments carried in the write request. After saving the received data fragments, each data node can return a first write response to the host, thereby notifying the host that the corresponding data node has saved the data to be written.

[0136] Scenario 2: EC Decoding Scenario

[0137] In the downgraded read scenario, the first instruction is used to instruct the recovery of n data shards in the target logical stripe that exhibit anomalies. In this case, the target node can obtain the identifiers of m shards in the target logical stripe that have not experienced anomalies, where these m shards include data shards and checksum shards. Then, referring to the implementation method described in the write data scenario above, the target node can determine h2 task instruction messages based on the number h2 intermediate task nodes and the identifiers of the m shards that have not experienced anomalies. An intermediate task request is issued to each intermediate task node; this intermediate task request is an EC decoding request, carrying one of the h2 task instruction messages, used to instruct the recovery of n target shards based on the task data indicated by the task instruction message. For example, the task instruction message may include the identifiers of each shard in the task data. Optionally, it may also include the identifier of the target shard to be recovered.

[0138] Additionally, the target node can determine the number of target fragments to be computed by each aggregation task node based on the number of aggregation task nodes t2 and the number of target fragments to be recovered n. The number of target fragments to be computed by each aggregation task node can be equal or unequal. Then, the target node sends an aggregation task request to each aggregation task node. This request may include a downgrade read instruction and the identifier of the target fragment to be computed by the corresponding aggregation task node. Optionally, the aggregation task request may also include the identifiers of each intermediate task node. The downgrade read instruction instructs the aggregation task node to return the computed target fragments to the target node.

[0139] Optionally, in some possible implementations, the intermediate task request sent by the target node to each intermediate task node may also carry the mapping relationship between the identifier of each aggregate task node and the identifier of the target fragment to be calculated by the corresponding aggregate task node.

[0140] In data reconstruction scenarios, the first instruction can carry the identifier of the shard to be recovered stored in the faulty CSD device. Based on this, for n target shards belonging to a certain logical stripe among the shards to be recovered, the target node can refer to the method of issuing intermediate task requests described in the degraded read scenario above, and issue intermediate task requests to multiple intermediate task nodes.

[0141] In addition, the target node can also send an aggregation task request to at least one aggregation task node, referring to the method of sending aggregation task requests in the degraded read scenario above, in order to request the intermediate task node and the aggregation task node to perform EC decoding in order to recover the n target fragments.

[0142] It is worth noting that, unlike the degraded read scenario described above, in the data reconstruction scenario, if all n target shards to be recovered are parity shards, the intermediate task request can be an EC decoding request. However, in this case, since the EC calculation is based on the data shards used to calculate the corrupted parity shards, the EC calculation process is actually a re-EC encoding process; therefore, the intermediate task request can also be an EC encoding request. Furthermore, the aggregation task request does not carry a degraded read instruction, but rather a shard write instruction. This shard write instruction instructs the aggregation task node to persistently store the calculated target shards. Optionally, this shard write instruction may carry the identifier of the node used to persistently store the target shards calculated by the aggregation task node.

[0143] S303: Each intermediate task node responds to the intermediate task request from the target node and obtains its own task data, which includes a portion of the m shards.

[0144] After receiving an intermediate task request from the target node, each intermediate task node can obtain its own task data based on the received request. The task data obtained by different intermediate task nodes will vary.

[0145] In one possible implementation, the intermediate task request carries task data. In this case, each intermediate task node can directly obtain the task data carried in the intermediate task request it receives.

[0146] In another possible implementation, the intermediate task request carries task indication information to indicate task data. In this case, taking any intermediate task node as an example, such as the first intermediate task node, the first intermediate task node can obtain the identifier of the fragment contained in its first task data based on the first task indication information carried in the first intermediate task request it receives. Then, based on the identifier of the fragment contained in the first task data, it can obtain the corresponding fragment from the node storing each fragment in the first task data, thereby obtaining the first task data.

[0147] In one example, the first task indication information includes the identifiers of each fragment of the first task data, and the first intermediate task node stores the mapping relationship between fragment identifiers and node identifiers. Based on this, the first intermediate task node can determine the identifier of the node storing each fragment based on the identifiers of each fragment in the first task data, and then retrieve the stored fragment from the corresponding node based on the determined node identifier.

[0148] In another example, the first task indication information includes the identifiers of each fragment of the first task data and their corresponding storage addresses. Based on this, the first intermediate task node can retrieve each fragment from the corresponding storage location based on the storage address corresponding to the identifier of each fragment.

[0149] For example, referring to Figure 5, the task instruction information received by intermediate task node 1 contains fragment identifiers C1 to C6. The identifiers of the data nodes corresponding to these 6 fragment identifiers are D1 to D6 in sequence. Then, intermediate task node 1 can obtain fragment C1 from D1, fragment C2 from D2, and so on.

[0150] It should be noted that in some possible cases, one or more shards of the task data that the intermediate task node needs to obtain may be stored on the intermediate task node itself. That is, the intermediate task node is a node that stores one or more shards of the task data. In this case, the intermediate task node can directly obtain the shards it stores.

[0151] For example, referring to Figure 6, the shard identifiers in the task instruction information received by intermediate task node 1 are C1 to C6. The identifiers of the data nodes corresponding to these 6 shard identifiers are D1 to D6 in sequence. D1 is intermediate task node 1. Then, intermediate task node 1 can obtain shard C1 stored by itself, obtain shard C2 from D2, obtain shard C3 from D3, and so on.

[0152] Optionally, in the embodiments of this application, in the data writing scenario, after each intermediate task node obtains its own task data, it can save the task data and return a second write response to the host, thereby notifying the host that the obtained task data has been saved.

[0153] As described in S302 above, in a data write scenario, after multiple data nodes have saved m data fragments, they will return a first write response to the host. Once the host receives the first write responses from each data node and the second write responses from each intermediate task node, it can determine that the writing of the m data fragments is complete. That is, the m data fragments have been saved to the data nodes and intermediate task nodes respectively. In this case, the host can release the resources used to process the write instructions for the m data fragments.

[0154] It should be noted that after receiving the first write response from each data node and the second write response from each intermediate task node, the host can determine that the m data fragments exist not only in multiple data nodes but also in multiple intermediate task nodes. Since the multiple intermediate task nodes and the multiple data nodes are different nodes, the m data fragments held by the multiple intermediate task nodes and the m data fragments held by the multiple data nodes form a backup. Thus, if some data fragments are damaged during the encoding process, data recovery can be achieved by relying on the backup data fragments, improving data reliability during the encoding process. Furthermore, in this embodiment, since transmitting partial data fragments to each intermediate task node to form a backup of the data fragments stored in the data nodes results in a shorter transmission time compared to transmitting all data fragments to a single node to form a backup, and the speed at which each intermediate task node returns the second write response is faster. Based on this, the time required for the host to receive the write response from the start of processing the write instruction for m data fragments is shorter. Correspondingly, the time occupied by the relevant resources used to process the write instruction is also shorter. It can be seen that the embodiments of this application improve the data reliability during EC encoding, while also shortening the foreground IO time and reducing the occupation of host resources.

[0155] S304: Each intermediate task node determines n intermediate results that correspond one-to-one with the n target fragments based on its own task data.

[0156] After obtaining its own task data, each intermediate task node can determine n intermediate results that correspond one-to-one with the n target slices based on the task data it has obtained.

[0157] In the EC encoding scenario, each intermediate task node receives an EC encoding request. After obtaining its own task data, the intermediate task node can use this EC encoding request to obtain the EC encoding matrix, and then generate n intermediate results based on the EC encoding matrix and its own task data. In the write data scenario, n equals k, and the n target fragments to be calculated are the k check fragments obtained through EC encoding.

[0158] It should be noted that, based on the EC ratio m:k configured in the storage system, each intermediate task node can be configured with an EC encoding matrix. This EC encoding matrix consists of (m+k) row vectors, each containing m encoded elements. Specifically, the first m row vectors of this EC encoding matrix correspond one-to-one with the m data shards, and the last k row vectors correspond one-to-one with the k parity shards, each containing m encoded elements. Based on this, the intermediate task node obtains the EC encoding matrix, and then, based on the row vectors in the EC encoding matrix corresponding to each parity shard and each data shard contained in its own task data, calculates k intermediate results.

[0159] For example, assuming the EC ratio is 24:4, there are a total of 24 data fragments to be written and 4 parity fragments to be calculated. Each parity fragment can be calculated using the following formula 1.

[0160] Where i can take values ​​from 1 to 4, P i For the i-th check fragment, S ij C is the j-th encoded element in the row vector corresponding to the i-th parity segment in the EC coding matrix. j For the j-th data segment.

[0161] As can be seen from Formula 1 above, each check segment can be obtained by multiplying the row vector corresponding to that check segment in the EC coding matrix by the first column vector. This first column vector consists of m data segments arranged sequentially, with the j-th data segment located in the j-th row of the first column vector. Therefore, the j-th encoded element in the row vector corresponding to the check segment will be multiplied by the j-th data segment; that is, the j-th encoded element in the row vector is the encoded element corresponding to the j-th data segment.

[0162] Based on this, when there are h intermediate task nodes, the m data fragments are divided into h task data and sent to the h intermediate task nodes. For each verification fragment, each intermediate task node calculates the product of each data fragment it obtains and the corresponding encoded element in the row vector of the verification fragment, and then sums them to obtain the intermediate result corresponding to the verification fragment.

[0163] For example, referring to Figure 7, with 4 intermediate task nodes and 24 data shards, intermediate task node 1 obtains task data including data shards C1 to C6. Then, intermediate task node 1 can calculate the intermediate results corresponding to each verification shard using the following formula 2, which are respectively P 11 P 12 P 13 and P 14Intermediate task node 2 obtains task data including data fragments C7 to C12. Then, intermediate task node 2 calculates the intermediate results corresponding to each verification fragment using the following formula 3, which are P... 21 P 22 P 23 and P 24 Intermediate task node 3 obtains task data including data fragments C13 to C18. Then, intermediate task node 3 calculates the intermediate results corresponding to each verification fragment using the following formula 4, which are P... 31 P 32 P 33 and P 34 Intermediate task node 4 obtains task data including data fragments C19 to C24. Then, intermediate task node 4 calculates the intermediate results corresponding to each verification fragment using the following formula 5, which are P... 41 P 42 P 43 and P 44 .

[0164] Among them, P 1i This refers to the intermediate result corresponding to the i-th verification fragment calculated by intermediate task node 1. P 2i This refers to the intermediate result corresponding to the i-th verification fragment calculated by intermediate task node 2. P 3i This refers to the intermediate result corresponding to the i-th verification fragment calculated by intermediate task node 3. P 4i This refers to the intermediate result corresponding to the i-th verification fragment calculated by intermediate task node 4.

[0165] It is worth noting that in the embodiments of this application, the various addition and multiplication operations involved in the EC encoding and EC decoding processes, such as the multiplication and addition operations involved in the various formulas introduced in the embodiments of this application, are all operations performed on the Galois field.

[0166] In the EC decoding scenario, the intermediate task requests received by each intermediate task node are EC decoding requests. In this case, after obtaining its own task data, the intermediate task node can obtain the EC decoding matrix based on the EC decoding request and the target fragment to be recovered, and then generate n intermediate results based on the EC decoding matrix and its own task data. Among them, the n target fragments to be recovered are n data fragments.

[0167] As described above, each intermediate task node can be configured with an EC encoding matrix. Based on this, each intermediate task node can generate an EC decoding matrix according to the n target fragments to be recovered and the EC encoding matrix. The EC decoding matrix includes n row vectors. Each row vector corresponds to a target fragment to be recovered, and each row vector includes m encoded elements. Then, for any one of the n target fragments to be recovered, the intermediate task node can calculate the intermediate result corresponding to that target fragment based on the row vector corresponding to that target fragment in the EC decoding matrix and the fragments in its own obtained task data.

[0168] It's important to note that in a downgraded read scenario, the n target fragments to be recovered are data fragments; therefore, each row vector in the EC decoding matrix corresponds to a data fragment. However, in a data reconstruction scenario, the n target fragments to be recovered may include data fragments and / or parity fragments. Based on this, if all n target fragments to be recovered are data fragments, then each row vector in the EC decoding matrix corresponds to a data fragment to be recovered. If all n target fragments to be recovered are parity fragments, then each row vector in the EC decoding matrix corresponds to a parity fragment; in this case, the EC decoding matrix is ​​actually the last k row vectors of the EC encoding matrix. If the n target fragments to be recovered include both data fragments and parity fragments, then some row vectors in the EC decoding matrix correspond to data fragments, and some row vectors correspond to parity fragments.

[0169] For example, assuming an EC ratio of 24:4, the target data shards to be recovered are the 1st to 4th data shards out of the 24 data shards, i.e., C1 to C4. Based on this, C1 to C4 can be calculated using the following formula 6.

[0170] Where α takes values ​​from 1 to 4, C α D represents the α-th data fragment out of 24 data fragments, which is also the target fragment to be recovered. αj P is the j-th encoded element in the row vector corresponding to the α-th data fragment in the decoding matrix. j For the j-th verification fragment, C j This refers to the j-th data shard out of 24 data shards.

[0171] As can be seen from Formula 6 above, each target fragment to be recovered can be obtained by multiplying the row vector corresponding to the target fragment in the EC decoding matrix by the second column vector. This second column vector consists of m fragments (including data fragments that have not experienced anomalies, or data fragments that have not experienced anomalies and check fragments) arranged in sequence, and the order of these m fragments is consistent with the order of the encoded elements corresponding to each fragment in the row vector.

[0172] Based on this, when there are h intermediate task nodes, the m fragments are divided into h parts of task data and sent to the h intermediate task nodes. For any target fragment to be recovered, each intermediate task node can calculate the sum of the products of the corresponding encoded elements in the row vectors of its own fragments and the target fragment to be recovered, thereby obtaining the intermediate result corresponding to the target fragment to be recovered.

[0173] For example, referring to Figure 8, assuming there are two intermediate task nodes, the 24 fragments, including 20 data fragments and 4 parity fragments, can be divided into two task data sets and sent to the two intermediate task nodes. For example, if intermediate task node 1 receives task data including parity fragments P1 to P4 and data fragments C5 and C12, then intermediate task node 1 can calculate the intermediate results corresponding to each target fragment to be recovered using the following formula 7, which are C 11 C 12 C 13 and C 14 Intermediate task node 2 obtains task data including data fragments C13 to C24. Then, intermediate task node 2 calculates the intermediate results corresponding to each target fragment using the following formula 8, which are C... 21 C 22 C 23 and C 24 .

[0174] Among them, C 1α C represents the intermediate result corresponding to the α-th data fragment calculated for intermediate task node 1. 2α This is the intermediate result corresponding to the αth data fragment calculated for intermediate task node 2.

[0175] It is worth noting that, as can be seen from Formulas 7 and 8 above, each intermediate task node only needs to use a portion of the encoded elements in each row vector of the EC decoding matrix when performing calculations. For example, in the example above, intermediate task node 1 uses the first 12 encoded elements in each row vector of the EC decoding matrix for calculations, while intermediate task node 2 uses the last 12 encoded elements in each row vector for calculations. Based on this, in this embodiment, after obtaining its own task data, each intermediate task node can also obtain its required decoding sub-matrix based on the EC decoding request, its own task data, and the target fragment to be recovered, and then generate n intermediate results based on the decoding sub-matrix and its own task data. The decoding sub-matrix of each intermediate task node includes a portion of the columns in the EC decoding matrix corresponding to the task data of that intermediate task node.

[0176] For example, continuing with the example in Figure 8, the decoding sub-matrix determined by intermediate task node 1 based on its acquired task data includes the first 12 columns of the EC decoding matrix. The decoding sub-matrix determined by intermediate task node 2 based on its acquired task data includes the last 12 columns of the EC decoding matrix.

[0177] S305: Each aggregation task node responds to the aggregation task request from the target node and obtains multiple intermediate results corresponding to the specified target shard provided by multiple intermediate task nodes.

[0178] As described in S302 above, the aggregation task request issued by the target node to each aggregation task node can be used to indicate the target shard to be computed by the corresponding aggregation task node. For example, the aggregation task request may include the identifier of the target shard to be computed by the aggregation task node. Based on this, each aggregation task node can obtain the intermediate results corresponding to the target shard computed by multiple intermediate task nodes based on the identifier of the target shard carried in the aggregation task request.

[0179] The following section uses any aggregation task node as an example to introduce the process of obtaining the intermediate results corresponding to the target fragment to be calculated. For ease of explanation, this aggregation task node will be referred to as the first aggregation task node.

[0180] After receiving the first aggregation task request from the target node, the first aggregation task node can obtain the identifier of at least one target shard carried in the first aggregation task request. For each of the at least one target shard, for example, the first target shard, the first aggregation task node can obtain the intermediate results corresponding to the first target shard calculated by multiple intermediate task nodes based on the identifier of the first target shard.

[0181] In one possible implementation, the first aggregation task request carries the identifiers of each intermediate task node. Based on this, if the identifiers of each intermediate task node are different from the identifier of the first aggregation task node (i.e., the first aggregation task node is not an intermediate task node), the first aggregation task node can proactively send intermediate result retrieval requests to each intermediate task node based on their identifiers. These intermediate result retrieval requests carry the identifier of the first target fragment. Upon receiving the intermediate result retrieval request, each intermediate task node can return the intermediate result of its calculated first target fragment to the first aggregation task node based on the identifier of the first target fragment.

[0182] Optionally, if the identifier of an intermediate task node is the same as the identifier of the first aggregation task node (i.e., the first aggregation task node is an intermediate task node), the first aggregation task node can obtain the intermediate result corresponding to the first target shard it has calculated, and send an intermediate result retrieval request to other intermediate task nodes besides itself, requesting the other intermediate task nodes to return the intermediate result corresponding to the first target shard. Correspondingly, the first aggregation task node receives the intermediate result corresponding to the first target shard returned by the other intermediate task nodes.

[0183] In another possible implementation, the intermediate task requests sent from the target node to each intermediate task node may carry a mapping relationship between the identifiers of each aggregation task node and the identifiers of the corresponding target shards to be calculated. Based on this, after each intermediate task node calculates the intermediate results corresponding to each target shard through S304, it can determine whether the identifier of the first aggregation task node in the mapping relationship is its own identifier. If an intermediate task node determines that the identifier of the first aggregation task node is not its own identifier, it sends the intermediate result corresponding to the first target shard it calculated to the first aggregation task node based on the identifier of the first target shard corresponding to the identifier of the first aggregation task node. Correspondingly, the first aggregation task node can receive the intermediate result corresponding to the first target shard sent by the intermediate task node.

[0184] Optionally, if an intermediate task node determines that the identifier of the first aggregation task node is its own identifier, then the intermediate task node is the first aggregation task node. In this case, the intermediate task node can obtain the intermediate result of the first target fragment calculated by itself, and receive the intermediate result of the first target fragment sent by other intermediate task nodes as the first aggregation task node.

[0185] S306: Each aggregation task node determines the specified target shard based on multiple intermediate results obtained for the specified target shard.

[0186] After obtaining multiple intermediate results corresponding to the target slice to be calculated, each aggregation task node can sum up the multiple intermediate results corresponding to the target slice to obtain the target slice.

[0187] For example, taking Figure 7 as an example, there are four aggregation task nodes, each used to calculate one target fragment. Aggregation task node 1 is used to calculate the first target fragment, i.e., to calculate the verification fragment P1; aggregation task node 2 is used to calculate the verification fragment P2, and so on. Based on this, according to formulas 1 to 5, the verification fragment P1 is equal to the P calculated by intermediate task node 1. 11 intermediate task node 2 calculates P 21 P calculated by intermediate task node 3 31 And P calculated by intermediate task node 4 41 The sum. Based on this, aggregation task node 1 can obtain P. 11 P 21 P 31 and P 41 The four intermediate results are summed to obtain P1. Similarly, aggregation task node 2 obtains P1 calculated by intermediate task node 1. 12 intermediate task node 2 calculates P 22 P calculated by intermediate task node 3 32 And P calculated by intermediate task node 4 42 Sum these four intermediate results to get P2, and so on.

[0188] For example, taking Figure 8 as an example, there are two aggregation task nodes, each used to calculate a target shard. Aggregation task node 1 is used to calculate the first target shard, i.e., the data shard C1 to be recovered, while aggregation task node 2 is used to calculate the data shard C2. Based on this, according to formulas 6 to 8, the data shard C1 is equal to the C2 calculated by intermediate task node 1. 11 C calculated from intermediate task node 2 21 The sum. Based on this, aggregation task node 1 obtains C. 11 and C 21 The two intermediate results are summed to obtain data shard C1. Similarly, aggregation task node 2 obtains C1 calculated by intermediate task node 1. 12 C calculated from intermediate task node 2 22 The two intermediate results are summed to obtain the data fragment C2.

[0189] It should be noted that after each aggregation task node calculates its own target shard, it can also persist the calculated target shard or return the target shard to the target node according to the aggregation task request.

[0190] For example, in the EC encoding scenario, as described above, the aggregation task request issued by the target node can carry a fragment write instruction. This fragment write instruction can include the identifier of the first verification node used for persistently storing the first target fragment. Based on this, taking the first aggregation task node as an example, after calculating the first target fragment, the first aggregation task node determines whether it is the first verification node based on the identifier of the first verification node in the received aggregation task request. If the first aggregation task node is indeed the first verification node used for persistently storing the first target fragment, then the first aggregation task node can persistently store the first target fragment within itself. If the first aggregation task node is not the first verification node, then the first aggregation task node can send the first target fragment to the first verification node, which will then persistently store the first target fragment.

[0191] In EC decoding scenarios, such as in a degraded read scenario, the aggregation task request issued by the target node can carry a degraded read instruction. Based on this, taking the first aggregation task node as an example, after calculating the first target fragment, the first aggregation task node returns a read response to the target node based on the degraded read instruction, and the read response carries the first target fragment.

[0192] Optionally, if the first data node originally used to store the first target fragment is damaged, the downgrade read instruction issued by the target node may also carry the identifier of the second data node that has been reassigned to store the first target fragment. Based on this, the first aggregation task node may also determine whether it is the second data node based on the identifier of the second data node in the aggregation task request it receives. If the second data node is the first aggregation task node, the first aggregation task node can persistently store the first target fragment in itself. If the second data node is not the first aggregation task node, the first aggregation task node can send the first target fragment to the second data node, which will then persistently store the first target fragment.

[0193] For example, in a data reconstruction scenario, the aggregation task request issued by the target node can carry a shard write instruction. This shard write instruction carries the identifier of a third data node. This third data node is used to replace the faulty fourth data node that originally stored the first target shard, in order to store the data from the recovered fourth data node. Based on this, taking the first aggregation task node as an example, after calculating the first target shard, the first aggregation task node can determine whether it is a third data node based on the identifier of the third data node in the shard write instruction. If the first aggregation task node is indeed a third data node, it can persistently store the first target shard in itself. If the first aggregation task node is not a third data node, it can send the first target shard to the third data node for persistent storage.

[0194] In this embodiment, when a target node in the storage system receives an instruction to perform EC calculation on m shards, it can select multiple intermediate task nodes and at least one aggregation task node from the storage system. Based on this, the target node sends corresponding intermediate task requests to the multiple intermediate task nodes and an aggregation task request to at least one aggregation task node. Thus, each intermediate task node obtains a portion of the m shards to be EC calculated based on the received intermediate task request, and uses its obtained portion of the shards to determine n intermediate results corresponding one-to-one with the n target shards. Different intermediate task nodes obtain different shards from the m shards, resulting in different intermediate results determined by different intermediate task nodes for the same target shard. Based on this, any aggregation task node, based on the aggregation task request and the different intermediate results determined by the multiple intermediate task nodes for calculating the same target shard, can obtain the corresponding target shard. Therefore, in this embodiment, the target node can schedule multiple intermediate task nodes to jointly undertake the EC computation task of calculating the intermediate results of each target shard, and schedule aggregation task nodes to aggregate the intermediate results obtained by the intermediate task nodes. This solves the problem of high node I / O and computational load when a single node performs the EC computation task. Furthermore, multiple intermediate task nodes can execute their respective computation tasks in parallel, thus shortening the time required for EC computation.

[0195] For example, the data processing method provided in this application embodiment can be applied to EC encoding scenarios. For instance, in a write data scenario, the data processing method provided in this application embodiment can enable multiple intermediate task nodes to execute EC encoding tasks in parallel, thereby improving EC encoding speed, shortening the time required for EC encoding, and thus reducing write latency and improving the write performance of the storage system.

[0196] For example, the data processing method provided in this application embodiment can also be applied to EC decoding scenarios. For instance, in a degraded read scenario, the data processing method provided in this application embodiment allows multiple intermediate task nodes to execute EC decoding tasks in parallel, thus improving EC decoding speed, shortening the time required for EC decoding, thereby reducing read latency and improving the read performance of the storage system. As another example, in a data reconstruction scenario, the data processing method provided in this application embodiment allows multiple intermediate task nodes to execute EC decoding tasks in parallel, thus improving EC decoding speed, shortening the time required for EC decoding, thereby shortening data reconstruction latency.

[0197] Based on the data processing method provided in the above embodiments, this application provides three examples of applying the data processing method in a write data scenario, a degraded read scenario, and a data reconstruction scenario, respectively. In the following three examples, the target node is the server's processor. The EC ratio configured in the storage system is 6:2.

[0198] Example 1: Writing Data Scenario

[0199] Referring to Figure 9, the six data shards to be written are C1 to C6, and the target shards to be calculated are two parity shards, P1 and P2. The two intermediate task nodes are CSD1 and CSD2 in the storage system, used to store the parity shards. CSD1 and CSD2 also serve as two aggregation task nodes; that is, CSD1 and CSD2 are both intermediate task nodes and aggregation task nodes. Based on this, the target node can distribute three data shards to each intermediate task node. For example, it can distribute C1 to C3 to CSD1 and C4 to C6 to CSD2.

[0200] CSD1, based on C1 to C3, calculates the intermediate results corresponding to P1 and P2 as follows: P 11 P 12 CSD2 is based on C4 to C6, and the intermediate results corresponding to P1 and P2 are P1, P2, and P3, respectively. 21 P 22 .

[0201] CSD1 acts as an aggregation task node, used to compute P1. Based on this, CSD1 obtains the intermediate result P corresponding to the P1 it computed. 11 At the same time, receive P sent by CSD2 21 Calculate P 11 and P 21 The sum of these two values ​​gives us P1.

[0202] CSD2 acts as an aggregation task node, used to compute P2. Based on this, CSD2 obtains the intermediate result P corresponding to the P2 it computed. 22At the same time, receive P sent by CSD1 12 Calculate P 12 and P 22 The sum of these two values ​​gives P2.

[0203] Since CSD1 and CSD2 are the verification nodes used to store verification fragments in the storage system, CSD1 writes P1 to its own hard disk, and CSD2 writes P2 to its own hard disk.

[0204] In this embodiment, multiple check nodes in the storage system serve as intermediate task nodes and aggregation task nodes to undertake the EC encoding task. These intermediate task nodes also act as aggregation task nodes to aggregate intermediate results. Thus, each aggregation task node holds a portion of the intermediate results, reducing the amount of intermediate results that need to be obtained from other intermediate task nodes and lowering data transmission overhead. Furthermore, this embodiment selects check nodes as aggregation task nodes. After calculating the check fragments, the aggregation task nodes can directly write them to disk without further forwarding, further reducing data transmission overhead.

[0205] Example 2: Downgraded Read Scenario

[0206] Referring to Figure 10, the target shards to be calculated are two data shards C1 and C2 that have experienced anomalies. The six shards used to calculate these two target shards include data shards C3 to C6 and parity shards P1 and P2. The intermediate task nodes are CSD3 (for storing data shard C3), CSD5 (for storing data shard C5), and CSD7 (for storing parity shard P1) in the storage system. The aggregation task nodes are CSD3 and CSD5. That is, CSD3 and CSD5 are both intermediate task nodes and aggregation task nodes. Additionally, data shard C4 is stored in CSD4, data shard C6 is stored in CSD6, and parity shard P2 is stored in CSD8.

[0207] Based on this, CSD3 can obtain C4 from CSD4, and calculate the intermediate results corresponding to C1 and C2 based on its own stored C3 and the obtained C4, which are C1 and C2 respectively. 11 C 12 .

[0208] CSD5 retrieves C6 from CSD6. Based on its own stored C5 and the retrieved C6, it calculates the intermediate results corresponding to C1 and C2 as follows: C 21 C 22 .

[0209] CSD7 retrieves P2 from CSD8. Based on its own stored P1 and the retrieved P2, it calculates the intermediate results corresponding to C1 and C2 as follows: C 31C 32 .

[0210] CSD3 acts as an aggregation task node, used to compute C1. Based on this, CSD3 obtains the intermediate result C corresponding to C1 calculated by itself. 11 At the same time, receive CSD5 sent C 21 And CSD7 sent C 31 Calculate C 11 C 21 And C 31 Sum the results to obtain C1. Return C1 to the target node.

[0211] CSD5 acts as an aggregation task node, used to compute C2. Based on this, CSD5 obtains the intermediate result C corresponding to C2 computed by itself. 22 At the same time, receive C sent by CSD3 12 And CSD7 sent C 32 Calculate C 12 C 22 And C 32 Sum the results to obtain C2. Return C2 to the target node.

[0212] In this embodiment, multiple nodes in the storage system containing the fragments that have not experienced anomalies serve as intermediate task nodes to undertake the EC coding task. Since each intermediate task node stores a portion of the intact fragments used to calculate the target fragment to be recovered, the number of fragments that each intermediate task node needs to read from other data nodes is reduced, thus lowering data transmission overhead. Furthermore, these multiple intermediate task nodes also act as aggregation task nodes to aggregate intermediate results. In this way, each aggregation task node holds a portion of the intermediate results, further reducing the amount of intermediate results that need to be obtained from other intermediate task nodes, further lowering data transmission overhead.

[0213] Example 3: Data Reconstruction Scenario

[0214] Referring to Figure 11, the target shards to be calculated are two data shards, C1 and C2, belonging to the same logical stripe and stored in the two faulty data nodes CSD1 and CSD2. The six shards used to calculate these two target shards include data shards C3 to C6 and parity shards P1 and P2. The intermediate task nodes are CSD0, used to replace CSD1, and CSD9, used to replace CSD2, in the storage system. The aggregation task nodes are CSD0 and CSD9. That is, CSD0 and CSD9 are both intermediate task nodes and aggregation task nodes. Furthermore, data shards C3 to C6 are stored sequentially in CSD3 to CSD6, and parity shards P1 and P2 are stored sequentially in CSD7 and CSD8.

[0215] CSD0 obtains P1, P2, and C3 from CSD7, CSD8, and CSD3 respectively, and calculates the intermediate results corresponding to C1 and C2 based on P1, P2, and C3, which are C1 and C2 respectively. 11 C 12 .

[0216] CSD9 obtains C4, C5, and C6 from CSD4, CSD5, and CSD6 respectively. Based on C4, C5, and C6, it calculates the intermediate results corresponding to C1 and C2 as follows: C 21 C 22 .

[0217] CSD0 acts as an aggregation task node, used to compute C1. Based on this, CSD0 obtains the intermediate result C corresponding to C1 calculated by itself. 11 At the same time, receive CSD9 sent C 21 Calculate C 11 and C 21 The sum of these two values ​​gives us C1.

[0218] CSD9 acts as an aggregation task node, used to compute C2. Based on this, CSD9 obtains the intermediate result C corresponding to C2 calculated by itself. 22 At the same time, receive C sent by CSD0 12 Calculate C 12 and C 22 The sum of these two values ​​gives us C2.

[0219] Since CSD0 is used to replace CSD1 and CSD9 is used to replace CSD2, CSD0 will write C1 to disk and CSD9 will write C2 to disk.

[0220] In this embodiment, multiple nodes in the storage system used to replace the faulty data nodes containing the abnormal shards serve as aggregation task nodes to aggregate intermediate results. This allows the recovered shards obtained after aggregation by the aggregation task nodes to be directly written to disk without further shard forwarding, reducing data transmission overhead. Furthermore, since these aggregation task nodes are also intermediate task nodes, each node holds a portion of the intermediate results. This reduces the amount of intermediate results that need to be obtained from other intermediate task nodes, further lowering data transmission overhead.

[0221] The data processing apparatus provided in the embodiments of this application will be described next.

[0222] Figure 12 is a schematic diagram of a data processing device provided in an embodiment of this application. This data processing device can be deployed in the target node described above. As shown in Figure 12, the data processing device 1200 includes: a node selection module 1201 and a request sending module 1202.

[0223] The node selection module is used to execute S301 in the aforementioned embodiment; the request sending module 1202 is used to execute S302 in the aforementioned embodiment.

[0224] Optionally, the first instruction is a write instruction, the m fragments are m data fragments to be EC encoded, and the n target fragments are n check fragments.

[0225] Optionally, the intermediate task request includes task data, or the intermediate task request includes the identifiers of each fragment in the task data.

[0226] Optionally, the data processing apparatus 1200 is further configured to: send corresponding write requests to multiple data nodes in the storage system, wherein the write request includes at least one data fragment to be written to the corresponding data node from among m data fragments, and the data node and the intermediate task node are different nodes; and determine that the writing of the m data fragments is completed in response to a first write response from each data node and a second write response from each intermediate task node, wherein the first write response is used to indicate that the corresponding data node has saved the received data fragments and the second write response is used to indicate that the corresponding intermediate task node has saved the task data.

[0227] Optionally, the first instruction is a downgrade read instruction or a data reconstruction instruction, the m fragments are fragments to be EC decoded, and the n target fragments are data fragments to be recovered.

[0228] Optionally, the intermediate task request includes the identifiers of each fragment in the task data.

[0229] Optionally, if the first instruction is a degraded read instruction, and the aggregation task request includes the degraded read instruction, the data processing device 1200 is further configured to: receive a read response to the degraded read instruction sent by at least one aggregation task node, the read response including the target fragment calculated by the corresponding aggregation task node.

[0230] Optionally, each of the at least one aggregation task node is also used to persistently store the target shards it has computed.

[0231] Optionally, at least one aggregate task node includes some or all of the nodes among a plurality of intermediate task nodes.

[0232] In this embodiment, when a target node in the storage system receives an instruction to perform EC calculations on m shards, it can select multiple intermediate task nodes and at least one aggregation task node from the storage system. Based on this, the target node sends corresponding intermediate task requests to the multiple intermediate task nodes and an aggregation task request to the at least one aggregation task node. This schedules multiple intermediate task nodes to jointly undertake the EC calculation task of calculating the intermediate results for each target shard, and schedules the aggregation task node to aggregate the intermediate results obtained by the intermediate task nodes. This solves the problems of high node I / O load and high computational load when a single node performs EC calculations due to the need to obtain all m shards. Furthermore, multiple intermediate task nodes can execute their respective calculation tasks in parallel, thus shortening the time required for EC calculations.

[0233] Figure 13 is a schematic diagram of another data processing device provided in an embodiment of this application. This data processing device can be deployed in the aforementioned intermediate task node. As shown in Figure 13, the data processing device 1300 includes: a task data acquisition module 1301, an EC calculation module 1302, and an intermediate result feedback module 1303.

[0234] The task data acquisition module 1301 is used to execute S303 in the aforementioned embodiment; the EC calculation module 1302 is used to execute S304 in the aforementioned embodiment; and the intermediate result feedback module 1303 is used to provide at least one intermediate result to each aggregated task node.

[0235] Optionally, the m fragments are m data fragments to be EC encoded, and the n target fragments are n check fragments.

[0236] Optionally, the storage system further includes a target node and multiple data nodes. Each data node is used to respond to a write request from the target node, persistently store at least one of the m data fragments, and return a first write response to the target node, which indicates that at least one data fragment has been saved. The data nodes and intermediate task nodes are different nodes. The data processing device is also used to send a second write response to the target node, which indicates that the task data has been saved.

[0237] Optionally, m fragments are fragments to be EC decoded, and n target fragments are data fragments to be recovered.

[0238] Optionally, the intermediate task request includes the identifiers of each shard in the task data; the task data acquisition module 1301 is specifically used to: based on the identifiers of each shard in the intermediate task request, obtain each shard from the nodes in the storage system that store each shard, and obtain the task data.

[0239] Optionally, the intermediate task request includes the identifier of each aggregation task node and the identifier of the corresponding target shard. The intermediate result feedback module 1303 is specifically used to: send the first intermediate result among n intermediate results to the first aggregation task node based on the identifier of the first aggregation task node and the identifier of the corresponding first target shard. The first intermediate result is used to calculate the first target shard.

[0240] Optionally, the intermediate result feedback module 1303 is specifically used to: receive a data acquisition request sent by the first aggregation task node, the data acquisition request including the identifier of the first target fragment; based on the identifier of the first target fragment, send the first intermediate result among n intermediate results to the first aggregation task node, the first intermediate result being used to calculate the first target fragment.

[0241] Optionally, the aggregation task node is also used to persistently store the target fragments it computes.

[0242] Optionally, at least one aggregate task node includes some or all of the nodes among a plurality of intermediate task nodes.

[0243] In this application example, each intermediate task node obtains a portion of the m fragments to be computed (EC) based on the received intermediate task request, and uses its obtained portion of fragments to determine n intermediate results corresponding one-to-one with the n target fragments. Different intermediate task nodes obtain different fragments from the m fragments, thus different intermediate task nodes determine different intermediate results for the same target fragment. Since each intermediate task node only needs to obtain a portion of the fragments, the IO load on each intermediate task node is small, and the required data transfer time is also short. Furthermore, each intermediate task node only processes the portion of fragments it obtains, resulting in a small computational load. Based on this, multiple intermediate task nodes complete the EC computation task of calculating intermediate results in parallel, improving the EC computation speed and shortening the EC computation time.

[0244] Figure 14 is a schematic diagram of another data processing device provided in an embodiment of this application. This data processing device can be deployed in the aforementioned aggregation task node. As shown in Figure 14, the data processing device 1400 includes: an intermediate result acquisition module 1401 and an intermediate result aggregation module 1402.

[0245] The intermediate result acquisition module 1401 is used to execute S305 in the above embodiment; the intermediate result aggregation module 1402 is used to execute S306 in the above embodiment.

[0246] Optionally, the intermediate result acquisition module 1401 is specifically used to: receive the intermediate result corresponding to the first target fragment sent by each intermediate task node.

[0247] Optionally, the aggregation task request includes the identifiers of multiple intermediate task nodes and the identifier of the first target shard. The intermediate result acquisition module 1401 is further configured to: send a data acquisition request to multiple intermediate task nodes based on the identifiers of the multiple intermediate task nodes, wherein the data acquisition request includes the identifier of the first target shard.

[0248] Optionally, the m fragments are m data fragments to be EC encoded, and the first target fragment is a verification fragment; or, the m fragments are fragments to be EC decoded, and the first target fragment is a data fragment to be recovered; the data processing device 1400 is also used to persistently store the first target fragment.

[0249] Optionally, the m fragments are fragments to be EC decoded, and the aggregation task request also includes a downgrade read instruction; the data processing device 1400 is further configured to: send a read response to the downgrade read instruction to the target node, the read response including the first target fragment.

[0250] In this embodiment, any aggregation task node can obtain the corresponding target fragment based on the aggregation task request and different intermediate results determined by multiple intermediate task nodes for calculating the same target fragment. Since each of the multiple intermediate task nodes is used to calculate intermediate results based on a partial fragment, the time required for multiple intermediate task nodes to calculate intermediate results is relatively short. Based on this, the aggregation task node can simultaneously obtain multiple intermediate results from multiple intermediate task nodes for aggregation, thus shortening the EC calculation time.

[0251] It should be noted that the module division in the data processing apparatus provided in the above embodiments is illustrative and only represents one logical functional division. In actual implementation, other division methods may also be used. Furthermore, the functional modules in the various embodiments of this application can be integrated into a single processor, exist as separate physical entities, or be integrated into a single module. The integrated modules described above can be implemented in hardware or as software functional modules.

[0252] If the integrated module is implemented as a software functional module and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of this application, in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a server or terminal) or processor to execute all or part of the steps of the methods in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0253] Furthermore, the data processing apparatus and data processing method embodiments provided in the above embodiments belong to the same concept, and their specific implementation process can be found in the method embodiments, which will not be repeated here.

[0254] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium accessible to a computer or a data storage device such as a server or data center that integrates one or more available media. The available media can be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., digital versatile discs (DVDs)), or semiconductor media (e.g., solid-state disks (SSDs)).

[0255] In the various embodiments of this application, unless otherwise specified or logically conflicting, the terminology and / or descriptions between different embodiments are consistent and can be referenced mutually. Technical features in different embodiments can be combined to form new embodiments based on their inherent logical relationships. In the embodiments of this application, "at least one" refers to one or more, and "more than one" refers to two or more. "And / or" describes the association relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone, where A and B can be singular or plural. In the textual description of the embodiments of this application, the character " / " generally indicates that the preceding and following related objects have an "or" relationship. In this application, "first," "second," and various numerical designations are only for ease of description and are not used to limit the scope of the embodiments of this application. For example, they are used to distinguish different messages, rather than to describe a specific order or sequence.

[0256] It is understood that the various numerical designations used in the embodiments of this application are merely for descriptive convenience and are not intended to limit the scope of the embodiments of this application. The order of the process numbers does not imply the order of execution; the execution order of each process should be determined by its function and internal logic.

[0257] Finally, it should be noted that the above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any changes or substitutions within the technical scope disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A data processing method, characterized in that, The method, applied to a target node in a storage system, includes: In response to a first instruction, a plurality of intermediate task nodes and at least one aggregate task node are determined from the storage system, the first instruction being used to instruct EC calculation based on m shards to obtain n target shards, wherein m is not less than 2 and n is not less than 1; Send corresponding intermediate task requests to the plurality of intermediate task nodes respectively, and send corresponding aggregate task requests to the at least one aggregate task node respectively; The intermediate task request is used to request the corresponding intermediate task node to obtain task data, and based on the task data, determine n intermediate results. The task data includes some of the m shards, and different intermediate task nodes obtain different task data. The n intermediate results correspond one-to-one with the n target shards. The aggregation task request is used to request the corresponding aggregation task node to calculate the specified target shard based on the multiple intermediate results provided by the multiple intermediate task nodes.

2. The method according to claim 1, characterized in that, The first instruction is a write instruction, the m fragments are m data fragments to be EC encoded, and the n target fragments are n check fragments.

3. The method according to claim 2, characterized in that, The intermediate task request includes the task data, or the intermediate task request includes the identifiers of each fragment in the task data.

4. The method according to claim 2 or 3, characterized in that, The method further includes: Send corresponding write requests to multiple data nodes in the storage system. Each write request includes at least one data shard from the m data shards to be written to the corresponding data node. The data node is different from the intermediate task node. In response to the first write response of each data node and the second write response of each intermediate task node, it is determined that the writing of the m data fragments is complete. The first write response is used to indicate that the corresponding data node has saved the received data fragments, and the second write response is used to indicate that the corresponding intermediate task node has saved the task data.

5. The method according to claim 1, characterized in that, The first instruction is a downgrade read instruction or a data reconstruction instruction, the m fragments are fragments to be EC decoded, and the n target fragments are data fragments to be recovered.

6. The method according to claim 5, characterized in that, The intermediate task request includes the identifiers of each fragment in the task data.

7. The method according to claim 5 or 6, characterized in that, If the first instruction is a downgraded read instruction, and the aggregation task request includes the downgraded read instruction, the method further includes: Receive a read response to the downgrade read instruction sent by at least one aggregation task node, the read response including the target shard calculated by the corresponding aggregation task node.

8. The method according to any one of claims 1 to 7, characterized in that, Each of the at least one aggregate task node is also used to persistently store the target fragments it has calculated.

9. The method according to any one of claims 1 to 8, characterized in that, The at least one aggregate task node includes some or all of the plurality of intermediate task nodes.

10. A data processing method, characterized in that, The method, applied to intermediate task nodes in a storage system, wherein there are multiple intermediate task nodes, and the storage system further includes a target node and at least one aggregation task node, comprises: In response to an intermediate task request from the target node, task data is obtained. The task data includes a portion of the m shards to be computed in EC. Different intermediate task nodes obtain different task data, and m is not less than 2. Based on the task data, n intermediate results are determined, where n is not less than 1; Each aggregation task node is provided with at least one of the n intermediate results, and the at least one intermediate result corresponds to at least one target shard. Each intermediate result is used to calculate the corresponding target shard.

11. The method according to claim 10, characterized in that, The m fragments are m data fragments to be EC encoded, and the n target fragments are n verification fragments.

12. The method according to claim 11, characterized in that, The storage system further includes a target node and multiple data nodes. Each data node is used to respond to a write request from the target node, save at least one of the m data fragments, and return a first write response to the target node, the first write response being used to indicate that the at least one data fragment has been saved. The data node and the intermediate task node are different nodes; After acquiring the task data, the process also includes: A second write response is sent to the target node, the second write response indicating that the task data has been saved.

13. The method according to claim 10, characterized in that, The m fragments are fragments to be EC decoded, and the n target fragments are data fragments to be recovered.

14. The method according to any one of claims 10 to 13, characterized in that, The intermediate task request includes the identifiers of each fragment in the task data; The acquisition of task data includes: Based on the identifiers of each shard in the intermediate task request, each shard is retrieved from the nodes storing each shard in the storage system to obtain the task data.

15. The method according to any one of claims 10 to 14, characterized in that, The intermediate task request includes the identifier of each aggregate task node and the identifier of the corresponding target shard. Providing at least one intermediate result from the n intermediate results to each aggregate task node includes: Based on the identifier of the first aggregation task node and the identifier of the corresponding first target fragment, the first intermediate result among the n intermediate results is sent to the first aggregation task node. The first intermediate result is used to calculate the first target fragment.

16. The method according to any one of claims 10 to 14, characterized in that, Providing at least one of the n intermediate results to each aggregation task node includes: Receive a data acquisition request sent by the first aggregation task node, wherein the data acquisition request includes the identifier of the first target fragment; Based on the identifier of the first target fragment, the first intermediate result among the n intermediate results is sent to the first aggregation task node. The first intermediate result is used to calculate the first target fragment.

17. The method according to any one of claims 10 to 16, characterized in that, The aggregation task node is also used to persistently store the target fragments it has calculated.

18. The method according to any one of claims 10 to 17, characterized in that, The at least one aggregate task node includes some or all of the nodes among a plurality of intermediate task nodes.

19. A data processing method, characterized in that, The method, applied to a storage system including at least one first aggregate task node in an aggregate task node, the storage system further including multiple intermediate task nodes and a target node, includes: In response to the aggregation task request from the target node, multiple intermediate results corresponding to the first target shard provided by the multiple intermediate task nodes are obtained. Each intermediate result is obtained by the corresponding intermediate task node based on the task data it has obtained. The task data includes a portion of the m shards to be computed by EC. The task data obtained by different intermediate task nodes includes different shards, and m is not less than 2. The first target segment is determined based on multiple intermediate results corresponding to the first target segment.

20. The method according to claim 19, characterized in that, The step of obtaining multiple intermediate results corresponding to the first target shard provided by the multiple intermediate task nodes includes: Receive the intermediate results corresponding to the first target fragment sent by each intermediate task node.

21. The method according to claim 20, characterized in that, The aggregation task request includes the identifiers of the plurality of intermediate task nodes and the identifier of the first target shard. Before receiving the intermediate results corresponding to the first target shard sent by each intermediate task node, it further includes: Based on the identifiers of the plurality of intermediate task nodes, a data acquisition request is sent to the plurality of intermediate task nodes, the data acquisition request including the identifier of the first target fragment.

22. The method according to any one of claims 19 to 21, characterized in that, The m fragments are m data fragments to be EC encoded, and the first target fragment is a verification fragment; or, the m fragments are fragments to be EC decoded, and the first target fragment is a data fragment to be recovered. After determining the first target fragment, the method further includes: The first target fragment is persistently stored.

23. The method according to any one of claims 19 to 22, characterized in that, The m fragments are fragments to be EC decoded, and the aggregation task request also includes a downgrade read instruction; After determining the first target fragment, the method further includes: Send a read response to the downgrade read instruction to the target node, the read response including the first target shard.

24. A data processing apparatus, characterized in that, The data processing device, applied in a target node of a storage system, includes: A node selection module is configured to, in response to a first instruction, determine a plurality of intermediate task nodes and at least one aggregate task node from the storage system, wherein the first instruction is configured to instruct EC calculation based on m shards to obtain n target shards, wherein m is not less than 2 and n is not less than 1; The request sending module is used to send corresponding intermediate task requests to the plurality of intermediate task nodes respectively, and to send corresponding aggregate task requests to the at least one aggregate task node respectively. The intermediate task request is used to request the corresponding intermediate task node to obtain task data, and based on the task data, determine n intermediate results. The task data includes some of the m shards, and different intermediate task nodes obtain different task data. The n intermediate results correspond one-to-one with the n target shards. The aggregation task request is used to request the corresponding aggregation task node to calculate the specified target shard based on the multiple intermediate results provided by the multiple intermediate task nodes.

25. A data processing apparatus, characterized in that, The data processing device is used in intermediate task nodes in a storage system, wherein there are multiple intermediate task nodes, the storage system further includes a target node and at least one aggregation task node, and the data processing device includes: The task data acquisition module is used to acquire task data in response to an intermediate task request from the target node. The task data includes a portion of the m shards to be computed in EC. Different intermediate task nodes obtain different task data, and m is not less than 2. The EC calculation module is used to determine n intermediate results based on the task data, where n is not less than 1; The intermediate result feedback module is used to provide each aggregation task node with at least one of the n intermediate results, wherein the at least one intermediate result corresponds to at least one target shard, and each intermediate result is used to calculate the corresponding target shard.

26. A data processing apparatus, characterized in that, The data processing apparatus, applied to a storage system including at least one first aggregate task node in an aggregate task node, the storage system further including multiple intermediate task nodes and a target node, comprises: The intermediate result acquisition module is used to respond to the aggregation task request from the target node and acquire multiple intermediate results corresponding to the first target shard provided by the multiple intermediate task nodes. Each intermediate result is obtained by the corresponding intermediate task node based on the task data it has obtained. The task data includes a portion of the m shards to be calculated for EC. The task data obtained by different intermediate task nodes includes different shards, and m is not less than 2. The intermediate result aggregation module is used to determine the first target segment based on multiple intermediate results corresponding to the first target segment.

27. A computer device, characterized in that, The computer device includes a processor and a memory, the memory being used to store computer programs and data, and the processor being used to execute the computer programs stored in the memory to implement the data processing method according to any one of claims 1 to 23.

28. A storage system, characterized in that, The storage system includes a target node, a plurality of intermediate task nodes, and at least one aggregate task node. The target node is used to execute the data processing method according to any one of claims 1 to 9. Each of the plurality of intermediate task nodes is used to execute the data processing method according to any one of claims 10 to 18. Each of the at least one aggregate task node is used to execute the data processing method according to any one of claims 19 to 23.